Building Better UX: Adding Real-Time Metrics to Workflow Execution
How we transformed a basic workflow dashboard into an insightful metrics powerhouse, complete with energy consumption tracking, time savings calculations, and a few hard-learned lessons about UI consistency.
Building Better UX: Adding Real-Time Metrics to Workflow Execution
Ever looked at a workflow execution and wondered: "Was this actually worth running?" or "How much energy did my AI model consume?" We recently tackled this exact problem by adding comprehensive per-step metrics to our workflow dashboard. Here's the story of how we went from basic execution logs to a metrics-rich experience that actually helps users understand the value of their automated workflows.
The Mission: Making Workflows Transparent
Our goal was straightforward but ambitious: show users the real impact of each workflow step. We wanted to display:
- Energy consumption (in Wh/mWh) based on token usage and model type
- Time saved compared to manual execution
- Token compression savings when using digest features
- All while making dark mode the default and cleaning up some visual inconsistencies
The Technical Journey
Phase 1: Setting the Foundation
We started by creating a dedicated utility module workflow-metrics.ts to handle all the heavy lifting:
// Energy rates by model family (Wh per million tokens)
const ENERGY_RATES = {
'claude-3-sonnet': 110,
'claude-3-haiku': 45,
'gpt-4o': 540,
'gpt-4o-mini': 110,
'gemini-1.5-flash': 85,
// ... with fallbacks for unknown models
};
export function computeStepMetrics(step: WorkflowStep) {
const tokenUsage = extractTokenUsage(step);
const energy = calculateEnergyConsumption(tokenUsage, step.model);
const timeSaved = estimateTimeSaved(tokenUsage);
return { energy, timeSaved, tokensSaved: calculateTokenSavings(step) };
}
The beauty here is in the model-specific energy rates. Different AI models have vastly different power consumption profiles—GPT-4o can use nearly 5x more energy per token than Claude Haiku. Users deserve to know this!
Phase 2: Smart Token Savings Calculation
One of the trickier challenges was calculating "downstream token savings" when users employ digest compression:
// When a step compresses 10,000 tokens down to 3,000 tokens,
// and that compressed output gets referenced by 3 downstream steps,
// the total savings = 7,000 tokens × 3 references = 21,000 tokens saved
const compressionSavings = originalTokens * (1 - DIGEST_COMPRESSION);
const downstreamMultiplier = estimateDownstreamReferences(step);
const totalSavings = compressionSavings * downstreamMultiplier;
Phase 3: UI Integration with Smart Conditionals
The metrics bar needed to integrate seamlessly with existing step displays. Here's where we hit our first major challenge.
// New metrics bar for completed steps
{(step.status === 'completed' || step.status === 'failed') && (
<div className="flex flex-wrap items-center gap-4 text-sm text-muted-foreground mt-2">
<span>{formatTokens(tokens)} tokens</span>
<span>{formatDuration(duration)}</span>
<span>{formatCost(cost)}</span>
<div className="flex items-center gap-1">
<Zap className="h-3 w-3" />
<span>{formatEnergy(energy)}</span>
</div>
<span>~{formatTimeSaved(timeSaved)} saved</span>
</div>
)}
Lessons Learned: The Duplicate Display Dilemma
Here's where things got interesting. We initially had two places showing similar information:
- The new metrics bar (outside the expandable step content)
- The old metadata line (inside the expanded step body)
For completed steps, users would see duplicate token counts, costs, and durations. Not great UX!
The Solution: We implemented a conditional rendering strategy using an IIFE (Immediately Invoked Function Expression):
{/* Only show old metadata for non-completed steps, or just retry info for completed ones */}
{(() => {
const showingMetricsBar = step.status === 'completed' || step.status === 'failed';
if (showingMetricsBar) {
// Only show retry count if it exists
return step.retryCount > 0 ? `Retry ${step.retryCount}` : null;
}
// Show full metadata for pending/running steps
return `${tokens} tokens • ${duration} • ${cost}`;
})()}
This keeps the UI clean while preserving information density where it matters.
The Small Wins: Dark Mode and Visual Polish
Sometimes the best improvements are the subtle ones:
// Before: System preference with light fallback
const theme = getTheme() || 'system';
// After: Dark by default (because developers love dark mode)
const theme = getTheme() || 'dark';
We also cleaned up badge borders by removing unnecessary border border-nyx-*/20 classes from colored variants. The colored backgrounds already provide sufficient visual distinction.
Real-World Impact
The metrics now tell a story. Users can see:
- "This GPT-4o step consumed 2.1 Wh of energy" (awareness of environmental impact)
- "Saved ~45 minutes compared to manual work" (ROI justification)
- "Digest compression saved 15,000 tokens downstream" (optimization insights)
What's Next
This foundation opens up exciting possibilities:
- Aggregate workflow metrics (total energy, time, cost across all steps)
- Mobile optimization for the metrics bar layout
- Unit tests for the metrics calculation functions
- Tooltips explaining the downstream savings approximation
Key Takeaways
- Model-specific energy rates matter - different AI models have vastly different power consumption
- Downstream token savings can be significant with digest compression
- UI consistency requires careful planning - avoid duplicate information displays
- Small UX improvements (like default dark mode) compound into better user experience
- Metrics should tell a story - raw numbers are less valuable than contextual insights
The next time you're building workflow tools, consider: what story are your metrics telling? Users don't just want to know what happened—they want to understand the value of what happened.
Want to see more posts about building developer tools and workflow automation? Follow along for more technical deep-dives and UX insights.