Beyond the 'Done': Unpacking AI Workflow Metrics for Deeper Insights
We just shipped a critical update to our workflow detail pages, adding per-step execution metrics for energy, time saved, and token savings. This session was all about bringing deeper visibility into the true cost and efficiency of AI automation.
Building powerful AI-driven workflows is exhilarating. But the magic often hides the mechanics. How much energy did that LLM call consume? How much human time did it actually save? What's the real token economy at play? These aren't just academic questions; they're crucial for understanding efficiency, optimizing costs, and even quantifying environmental impact.
This past session, our mission was clear: pull back the curtain on our AI workflows. We aimed to integrate granular, per-step execution metrics directly into our workflow detail pages. It was about more than just showing "completed" – it was about showing how it completed, and what it cost.
The Core Mission: A Metric-Rich Workflow View
Our main goal was to introduce three key metrics for each step in a workflow:
- Energy Consumption: Quantifying the real-world energy footprint of LLM inferences.
- Time Saved: Estimating the human effort bypassed by automation.
- Token Savings: Highlighting the efficiency gains, especially from digest steps.
Alongside this, we sneaked in a couple of UX polishes: defaulting to dark mode and refining our badge aesthetics.
1. Building the Brains: workflow-metrics.ts
The heart of this feature is a new utility file, src/lib/workflow-metrics.ts. This pure utility orchestrates the calculations for all our new metrics.
// src/lib/workflow-metrics.ts (simplified)
interface StepData {
duration?: number; // ms
cost?: number; // USD
tokensIn?: number;
tokensOut?: number;
tokensSaved?: number; // from digest steps
model?: string;
}
// Hardcoded energy rates for various LLM families (Wh/MTok)
const LLM_ENERGY_RATES: Record<string, number> = {
'claude-3-opus': 540,
'claude-3-sonnet': 110,
'claude-3-haiku': 110,
'gpt-4o': 540,
'gpt-4o-mini': 110,
'gemini-flash': 110,
'gemini-pro': 540,
'kimi': 540,
'ollama': 110, // Placeholder, depends heavily on local hardware
'default': 110, // Epoch AI estimate for average LLM
};
const HUMAN_WPH = 300; // Words per hour a human can process
const WORDS_PER_TOKEN = 0.75; // Average words per token
const DIGEST_COMPRESSION = 0.30; // Estimated compression ratio for digest steps
export function computeStepMetrics(step: StepData) {
let energyWh = 0;
let timeSavedMinutes = 0;
let digestTokenSavings = 0;
// ... (logic to safely extract tokensIn/Out/Saved, handling nulls/undefined)
if (step.tokensIn || step.tokensOut) {
const totalTokens = (step.tokensIn || 0) + (step.tokensOut || 0);
const modelPrefix = step.model?.split('/')[0] || 'default';
const energyRate = LLM_ENERGY_RATES[modelPrefix] || LLM_ENERGY_RATES['default'];
energyWh = (totalTokens / 1_000_000) * energyRate;
}
if (step.tokensSaved) {
// Estimate downstream savings assuming all output is consumed
digestTokenSavings = step.tokensSaved * DIGEST_COMPRESSION;
}
if (step.tokensOut) {
const wordsGenerated = step.tokensOut * WORDS_PER_TOKEN;
timeSavedMinutes = (wordsGenerated / HUMAN_WPH) * 60;
}
return { energyWh, timeSavedMinutes, digestTokenSavings };
}
// ... formatEnergy(), formatTimeSaved(), isTokenUsage() guards
Key Takeaways from workflow-metrics.ts:
- Model-Specific Energy Rates: We're not just guessing; we're using model family prefixes to pull specific energy rates (Wh/MTok), falling back to Epoch AI's average if a specific model isn't listed. This provides a more accurate, albeit still estimated, energy footprint.
- Human-Centric Savings: The
HUMAN_WPH(words per hour) constant allows us to convert generated tokens into a tangible "time saved" metric. It's a powerful way to frame automation value. - Digest Compression Savings: For steps that summarize or "digest" information, we can estimate downstream token savings. This is an approximation, as we'll discuss in the "Lessons Learned" section.
- Data Robustness: A small but crucial detail was adding an
isTokenUsage()guard. Our PrismaJson?fields can sometimes return malformed data, leading toNaNif not handled. This guard prevents those pesky UI glitches. - Formatting Utilities:
formatEnergy()andformatTimeSaved()ensure these new metrics are presented in a user-friendly way (e.g.,1200 mWhvs1.2 Wh,90 minvs1.5 hrs).
2. Bringing Metrics to Life: The Workflow Detail Page
With the calculation logic in place, the next step was integrating these metrics into src/app/(dashboard)/dashboard/workflows/[id]/page.tsx.
We introduced a new "metrics bar" that renders directly below each completed or failed step header. This bar now proudly displays:
- Tokens (Input/Output)
- Duration
- Cost
- A snazzy
Zapicon alongside the newly computed Energy (Wh/mWh) - An estimated ~Time Saved (min/hrs)
For digest steps, we also show the estimated downstream compression savings, complete with a multiplier estimate to give a sense of its impact.
Lesson Learned: The Duplicate Metadata Dance
Initially, I ran into a classic UI problem: duplicate information. We had an existing metadata line (tokens/cost/duration) inside the expanded step body, and my new metrics bar was outside it. For completed steps, both were visible, leading to an awkward, redundant display.
The Fix: I used an Immediately Invoked Function Expression (IIFE) to conditionally render the old metadata line. If the new metrics bar is present (meaning the step is completed or failed), the old line only shows the retry count (if any). Otherwise, for pending or running steps, the full old metadata line is rendered.
// src/app/(dashboard)/dashboard/workflows/[id]/page.tsx (simplified excerpt)
// ... inside the step rendering logic ...
{/* New Metrics Bar */}
{step.status === 'COMPLETED' || step.status === 'FAILED' && (
<div className="flex flex-wrap items-center gap-x-4 gap-y-2 text-sm text-gray-500 dark:text-gray-400">
{/* Display new metrics: tokens, duration, cost, energy, time saved, digest savings */}
<MetricItem label="Tokens" value={`${step.tokensIn}/${step.tokensOut}`} />
{/* ... other metrics ... */}
<MetricItem label="Energy" icon={ZapIcon} value={formatEnergy(metrics.energyWh)} />
<MetricItem label="Time Saved" icon={ClockIcon} value={formatTimeSaved(metrics.timeSavedMinutes)} />
{metrics.digestTokenSavings > 0 && (
<MetricItem label="Downstream Savings" value={`${metrics.digestTokenSavings} tokens`} tooltip="Estimated savings if output is used in downstream steps" />
)}
</div>
)}
{/* Old Metadata Line - now conditional */}
{(() => {
if (step.status === 'COMPLETED' || step.status === 'FAILED') {
// If metrics bar is shown, only display retry count if applicable
return step.retryCount > 0 && (
<p className="text-sm text-gray-500 dark:text-gray-400 mt-2">
Retries: {step.retryCount}
</p>
);
} else {
// For pending/running steps, show full old metadata
return (
<p className="text-sm text-gray-500 dark:text-gray-400 mt-2">
Tokens: {step.tokensIn}/{step.tokensOut} | Cost: ${step.cost?.toFixed(3)} | Duration: {step.duration}ms
{step.retryCount > 0 && ` | Retries: ${step.retryCount}`}
</p>
);
}
})()}
This was a good reminder that robust UI design often involves carefully managing state and visibility, especially when refactoring or adding new components to existing layouts.
3. Quick Wins: Dark Mode Default & Badge Polish
While the metrics were the star, we also squeezed in some valuable UX improvements:
- Dark Mode by Default: A simple change in
src/app/layout.tsx(line 41) now sets our theme fallback from'system'to'dark'. It's a small detail, but many developers prefer dark themes, and it sets a consistent tone from the get-go. - Badge Border Cleanup: In
src/components/ui/badge.tsx, we removed theborder border-nyx-*/20classes from our colored badge variants (accent,success,warning,danger). This gives them a cleaner, more modern look, letting the color speak for itself without an unnecessary outline.
What's Next? Continuous Refinement
This session wraps up a significant feature, but the journey continues:
- Unit Tests: Critical for
workflow-metrics.ts. We need solid tests forcomputeStepMetrics,formatEnergy, andformatTimeSavedto ensure accuracy and prevent regressions. - Mobile QA: The metrics bar uses
flex-wrap, which should adapt well, but a thorough visual QA on narrow viewports is essential. - Aggregate Metrics: While per-step is great, an aggregate summary at the workflow level (total energy, total time saved, total cost) would provide even higher-level insights.
- Tooltip for Downstream Savings: The
tokensSavedDownstreamis an approximation. Adding a tooltip to explain this nuance and its assumptions (i.e., "assumes all downstream steps reference this step's output via{{steps.Label.content}}") will enhance transparency.
This session was a fantastic step forward in providing meaningful, actionable insights into our AI workflows. By shining a light on energy consumption, time savings, and token efficiency, we're not just building automation; we're building smarter automation.
{
"thingsDone": [
"Changed default theme fallback to 'dark' in src/app/layout.tsx",
"Removed border from colored badge variants in src/components/ui/badge.tsx",
"Created src/lib/workflow-metrics.ts with computeStepMetrics, energy rates, isTokenUsage guard, and formatters",
"Added metrics bar to src/app/(dashboard)/dashboard/workflows/[id]/page.tsx displaying tokens, duration, cost, energy, time saved, and digest compression savings",
"De-duplicated old metadata line on workflow detail page, now showing only retry count for completed steps when metrics bar is present"
],
"pains": [
"Initial duplicate metadata display on workflow detail page for expanded completed steps"
],
"successes": [
"Successfully implemented all 4 phases of the feature (metrics calculation, UI integration, dark mode, badge cleanup)",
"Typecheck passes and code reviewed, ready for commit",
"Developed robust utility functions for metrics calculation and formatting",
"Implemented a clever IIFE workaround for conditional UI rendering to avoid duplication"
],
"techStack": [
"TypeScript",
"Next.js",
"React",
"Tailwind CSS",
"Lucide Icons",
"Prisma (for data context)"
]
}