Building a Live 'Stats for Nerds' Panel: Real-time Pipeline Metrics That Actually Matter

Ever wondered what's happening under the hood when your AI-powered development tools are churning away? As developers, we love our metrics—especially the nerdy ones that show exactly how our systems are performing. That's why we built a live "Stats for Nerds" panel that gives real-time visibility into our AutoFix and Refactor pipelines.

The Problem: AI Operations in the Dark

Modern development tools increasingly rely on Large Language Models (LLMs) to analyze code, detect issues, and generate fixes. But here's the thing—these operations are often black boxes. Developers using these tools have no idea:

How many tokens are being consumed
What the operations actually cost
Which models are being used
How long each phase takes
Whether the system is being efficient

This lack of visibility makes it impossible to optimize performance, understand costs, or even debug when things go wrong.

The Solution: Real-time Metrics That Matter

We decided to build a collapsible "Stats for Nerds" panel that shows live metrics during pipeline execution. When collapsed, it gives you the essentials at a glance:

12.4k tok · $0.0312 · 5 calls

When expanded, you get the full picture:

Token usage across all LLM calls
Cost estimates for the entire operation
Model information (which LLM provider and version)
Energy consumption estimates
Time saved calculations
Per-phase breakdown with live progress indicators

The Technical Architecture

Data Structure Design

First, we created a shared interface to standardize our metrics:

typescript

// src/types/nerd-stats.ts
export interface NerdStatsData {
  totalTokens: number;
  totalCost: number;
  totalCalls: number;
  model?: string;
  provider?: string;
  phases: {
    [phaseName: string]: {
      tokens: number;
      cost: number;
      calls: number;
      startTime: number;
      endTime?: number;
      isActive: boolean;
    }
  }
}

This structure captures both global totals and per-phase breakdowns, which is crucial for understanding where time and resources are being spent.

Pipeline Integration

The real magic happens in the pipeline orchestrators. We added metric collection at every step without disrupting the existing flow:

typescript

// Simplified example from the AutoFix pipeline
class AutoFixPipeline {
  private nerdStats: NerdStatsData = {
    totalTokens: 0,
    totalCost: 0,
    totalCalls: 0,
    phases: {}
  };

  private markPhaseStart(phaseName: string) {
    this.nerdStats.phases[phaseName] = {
      tokens: 0, cost: 0, calls: 0,
      startTime: Date.now(),
      isActive: true
    };
  }

  private accumulateNerd(phaseName: string, result: LLMCompletionResult) {
    const phase = this.nerdStats.phases[phaseName];
    phase.tokens += result.tokenUsage || 0;
    phase.cost += result.costEstimate || 0;
    phase.calls += 1;
    
    // Update totals
    this.nerdStats.totalTokens += result.tokenUsage || 0;
    this.nerdStats.totalCost += result.costEstimate || 0;
    this.nerdStats.totalCalls += 1;
  }
}

Real-time Updates via Server-Sent Events

The beauty of this system is that it works with our existing Server-Sent Events (SSE) infrastructure. Every event we emit now includes the current nerdStats, so the UI updates in real-time as the pipeline progresses.

Smart Persistence

Here's a neat trick: instead of adding new database columns, we store the final nerdStats inside the existing JSON stats column. This means:

No database migrations required
Backward compatibility with existing runs
Easy to extend with new metrics later

The UI Component

The NerdStats component is designed for both quick glances and deep dives:

tsx

export function NerdStats({ data, className }: NerdStatsProps) {
  const [isExpanded, setIsExpanded] = useState(false);
  
  if (!data) return null;

  return (
    <Card className={className}>
      <CardHeader 
        className="cursor-pointer"
        onClick={() => setIsExpanded(!isExpanded)}
      >
        {/* Collapsed view: essential metrics */}
        <div className="text-sm text-muted-foreground">
          {formatNumber(data.totalTokens)} tok · 
          ${data.totalCost.toFixed(4)} · 
          {data.totalCalls} calls
        </div>
      </CardHeader>
      
      {isExpanded && (
        <CardContent>
          {/* Expanded view: detailed grid + per-phase table */}
          <MetricsGrid data={data} />
          <PhaseBreakdown phases={data.phases} />
        </CardContent>
      )}
    </Card>
  );
}

The per-phase table even shows pulsing indicators for active phases, giving users a clear sense of progress.

Lessons Learned

1. Leverage Existing Infrastructure

The biggest win was realizing we didn't need to build new real-time infrastructure. Our existing SSE system handled metric updates perfectly once we added nerdStats to our event payloads.

2. Design for Extensibility

By using a flexible JSON structure for metrics storage, we can easily add new metrics (memory usage, API latency, etc.) without database changes.

3. Progressive Disclosure Works

The collapsed/expanded pattern means power users get the detail they want without overwhelming casual users. The compact view shows just enough to be useful.

4. Backward Compatibility Matters

Older pipeline runs don't have nerdStats, so we gracefully handle missing data rather than showing broken UI.

The Impact

Since launching this feature, we've seen several benefits:

Debugging is faster: When a pipeline seems slow, we can immediately see which phase is the bottleneck
Cost awareness: Developers can see the real cost of their operations and make informed decisions
Performance optimization: We've identified several opportunities to reduce token usage based on the metrics
User confidence: Seeing real-time progress reduces anxiety during long-running operations

What's Next

This pattern worked so well that we're planning to extend it to our Code Analysis pipeline. The same architecture should work with minimal changes—capture metrics from provider.complete() calls and accumulate them in the orchestrator.

We're also considering adding more advanced metrics like:

API response times
Memory usage per phase
Cache hit rates
Quality scores for generated fixes

Try It Yourself

If you're building AI-powered developer tools, consider adding similar visibility. The key principles are:

Capture metrics at the source (where you call the LLM APIs)
Accumulate progressively throughout your pipeline
Use existing real-time infrastructure when possible
Design for both quick glances and deep dives
Store metrics for historical analysis

Your users will thank you for the transparency, and you'll gain invaluable insights into how your AI systems actually perform in production.

Want to see this in action? Check out our AutoFix pipeline where this system is running in production.