Peeking Behind the Curtain: Building Live 'Nerd Stats' for Our LLM Pipelines

As developers building tools powered by large language models, we often find ourselves staring into a bit of a black box. Our AI agents are doing incredible work, but understanding how they're performing, what they're costing, and where they're spending their time can be opaque. That's why we embarked on a mission: to pull back the curtain and expose the inner workings of our AutoFix and Refactor pipelines with a new, live "Stats for Nerds" panel.

This past session, we pushed a significant update (fa91f2b) that brings real-time token usage, estimated cost, model details, energy consumption, and granular per-phase timing directly to our users. It's about demystifying the magic and empowering developers with the data they need to understand and optimize their AI-driven workflows.

The "Why": Beyond the Black Box

Our AutoFix and Refactor pipelines are complex, multi-stage processes that interact with LLMs multiple times. Before this update, we could see the final outcome, but the journey was largely hidden. We needed answers to questions like:

Cost Efficiency: How much is a particular fix or refactor really costing in terms of tokens and dollars?
Performance Bottlenecks: Which phase of the pipeline is taking the longest? Is it the initial issue detection, or the fix generation itself?
Model Performance: Which LLM is performing well for specific tasks? Are we using the right model for the job?
Resource Consumption: Can we estimate the energy footprint of these operations?

These insights aren't just for curiosity; they're critical for debugging, optimizing, and making informed architectural decisions as we scale.

Designing for Transparency: The Nerd Stats Approach

Our solution involved a full-stack effort, from defining new data structures to integrating with our existing streaming UI.

1. The Data Model: `NerdStatsData`

Everything started with a clear data structure. We introduced src/types/nerd-stats.ts with the NerdStatsData interface. This central interface captures global totals (tokens, cost, calls, energy, time saved, model) and a detailed breakdown for each phase of the pipeline, including its start and end times.

typescript

// src/types/nerd-stats.ts (simplified)
interface PhaseStats {
  phaseName: string;
  startTime: number; // timestamp
  endTime?: number;
  durationMs?: number;
  tokenUsage?: number;
  costEstimate?: number;
  model?: string;
  provider?: string;
  // ... other phase-specific metrics
}

export interface NerdStatsData {
  totalTokens: number;
  totalCost: number;
  totalCalls: number;
  totalEnergyJoules?: number;
  totalTimeSavedMs?: number;
  model?: string; // primary model used
  provider?: string;
  phases: PhaseStats[];
  // ... other global metrics
}

This NerdStatsData object then became an optional field (nerdStats?: NerdStatsData) on our AutoFixEvent and RefactorEvent types, ensuring it could flow through our event-driven system.

2. Capturing Data at the Source: LLM Service Calls

The core challenge was gathering the raw metrics. Our LLM service calls (e.g., provider.complete()) already return LLMCompletionResult which contains tokenUsage, costEstimate, model, and provider. The key was to capture this information at the precise moment an LLM interaction occurred within a specific pipeline phase.

We modified four critical service files:

src/server/services/auto-fix/issue-detector.ts (on batch_complete)
src/server/services/auto-fix/fix-generator.ts (on fix_generated)
src/server/services/refactor/opportunity-detector.ts (on batch_complete)
src/server/services/refactor/improvement-generator.ts (on improvement_generated)

Each of these now extracts the LLM metrics and passes them back up to the pipeline orchestrator.

3. Orchestrating the Metrics: Our Generator Pipelines

Our AutoFix and Refactor pipelines are implemented as generator functions (pipeline.ts files). This pattern, where phases yield events, proved incredibly powerful for integrating our NerdStatsData accumulator.

We introduced helper functions like accumulateNerd(), markPhaseStart(), and markPhaseComplete() within the orchestrators. Crucially, every yield statement was wrapped with a withNerd() helper that attaches a structuredClone(nerdStats) to the yielded event. This ensures that each event flowing to the client (via Server-Sent Events, or SSE) carries a snapshot of the NerdStatsData at that specific point in the pipeline. This is what enables our live updates!

typescript

// src/server/services/auto-fix/pipeline.ts (conceptual snippet)
async function* autoFixPipeline(params: AutoFixParams): AsyncGenerator<AutoFixEvent> {
  let nerdStats: NerdStatsData = initializeNerdStats();

  yield withNerd(markPhaseStart(nerdStats, "IssueDetection"));
  const issueDetectionResult = await detectIssues(params.code);
  accumulateNerd(nerdStats, "IssueDetection", issueDetectionResult.llmMetrics);
  yield withNerd(markPhaseComplete(nerdStats, "IssueDetection", issueDetectionResult));

  yield withNerd(markPhaseStart(nerdStats, "FixGeneration"));
  const fixGenerationResult = await generateFix(issueDetectionResult.issues);
  accumulateNerd(nerdStats, "FixGeneration", fixGenerationResult.llmMetrics);
  yield withNerd(markPhaseComplete(nerdStats, "FixGeneration", fixGenerationResult));

  // ... more phases
}

4. Persisting the Data: A Happy Accident

One of the pleasant surprises was how smoothly the final NerdStatsData could be persisted. Our existing stats column in Prisma was already typed as Json?. This meant we could simply store the complete nerdStats object inside it upon pipeline completion, without requiring any schema migrations! A testament to flexible initial design.

typescript

// Prisma update (conceptual)
await prisma.autoFixRun.update({
  where: { id: runId },
  data: {
    status: 'COMPLETED',
    stats: {
      ...existingStats,
      nerdStats: finalNerdStats as unknown as Prisma.InputJsonValue,
    },
  },
});

5. Bringing it to Life: The UI Component

The frontend needed a way to display this rich data. We created src/components/shared/nerd-stats.tsx, a collapsible card component designed for clarity and real-time updates.

Collapsed View: A concise summary: 12.4k tok · $0.0312 · 5 calls
Expanded View: A detailed 6-cell grid showing tokens, cost, calls, energy (computed using our computeEnergy() helper), estimated time saved, and the primary model used. Below this, a per-phase table breaks down metrics for each stage, complete with a pulsing indicator for the currently active phase.

This component is wired into both the AutoFix and Refactor detail pages. It intelligently uses liveNerdStats from our SSE events for active runs, gracefully falling back to run.stats.nerdStats (from the database) for completed runs.

Key Wins & Lessons Learned

This feature was surprisingly smooth to implement, a testament to some good foundational architecture:

SSE Flexibility: Our existing SSE routes already spread { ...event, timestamp }, meaning the new nerdStats field flowed through automatically without any route changes. This was a huge win for rapid development.
Schema Flexibility: The Json? column in Prisma for stats was a lifesaver. It allowed us to embed a complex, evolving data structure without the overhead of database migrations. This highlights the value of using flexible types for semi-structured data where appropriate.
Generator Pattern Power: Using generators for our pipelines made it incredibly straightforward to inject accumulation logic and yield snapshots of our state at various points.

The only "hiccup" was an unrelated, pre-existing test failure in kimi.test.ts (model name mismatch), which we'll address separately.

What's Next?

With the core functionality in place, our immediate next steps involve thorough manual QA:

Verify live updates for AutoFix runs.
Verify live updates for Refactor runs.
Confirm completed runs correctly display nerdStats from the database.
Ensure old runs (without nerdStats in their DB entry) gracefully show nothing, preventing UI errors.

Looking ahead, we're already considering extending this "Nerd Stats" panel to our Code Analysis pipeline, following the same pattern of capturing metrics from provider.complete() calls and integrating them into its event stream.

This "Stats for Nerds" panel isn't just a new feature; it's a commitment to transparency and observability in our AI-powered development tools. We believe that by shedding light on the underlying processes, we can build more robust, efficient, and understandable systems for everyone.

json

{
  "thingsDone": [
    "Defined shared `NerdStatsData` interface",
    "Added `nerdStats` field to `AutoFixEvent` and `RefactorEvent`",
    "Exported `computeEnergy()` for UI use",
    "Modified 4 service files to capture LLM metrics (`tokenUsage`, `costEstimate`, `model`, `provider`)",
    "Implemented `NerdStatsData` accumulation and phase timing in pipeline orchestrators",
    "Wrapped `yield` events with `structuredClone(nerdStats)` for live updates",
    "Persisted `nerdStats` in existing `stats` JSON column (no migration)",
    "Created `src/components/shared/nerd-stats.tsx` for collapsible, live UI display",
    "Wired `nerd-stats.tsx` into AutoFix and Refactor detail pages, using SSE for live data and DB for completed runs"
  ],
  "pains": [
    "No major issues encountered, implementation was clean."
  ],
  "successes": [
    "SSE routes automatically passed `nerdStats` without changes",
    "Prisma `Json?` column allowed `nerdStats` persistence without schema migration",
    "Effective use of generator pattern for pipeline state management and event streaming"
  ],
  "techStack": [
    "TypeScript",
    "Next.js",
    "Prisma",
    "LLM (Large Language Models)",
    "Server-Sent Events (SSE)",
    "Frontend Components (React)"
  ]
}

The "Why": Beyond the Black Box

Designing for Transparency: The Nerd Stats Approach

1. The Data Model: NerdStatsData

2. Capturing Data at the Source: LLM Service Calls

3. Orchestrating the Metrics: Our Generator Pipelines

4. Persisting the Data: A Happy Accident

5. Bringing it to Life: The UI Component

Key Wins & Lessons Learned

What's Next?

1. The Data Model: `NerdStatsData`