Unveiling the AI's Inner Workings: Introducing 'Stats for Nerds'

As developers, we're naturally curious creatures. We love to peek under the hood, understand the mechanics, and optimize the black boxes we interact with daily. When those black boxes are powerful AI models driving critical development workflows like AutoFix and Refactor, that curiosity turns into a need for transparency and insight.

That's why I'm thrilled to share a recent win: the deployment of our new "Stats for Nerds" panel! This feature provides real-time, granular metrics for our AI-powered pipelines, giving you an unprecedented look into how our systems work, what they're consuming, and how efficiently they're performing.

The Goal: Illuminating the AI Black Box

Our primary objective was clear: empower developers with live, actionable data during AutoFix and Refactor pipeline runs. We wanted to answer questions like:

How many tokens did the LLM consume for this task?
What was the estimated cost of that operation?
Which model was used?
How much energy did it take?
And critically, how much time did each phase of the pipeline take?

This isn't just about satisfying curiosity; it's about providing the data needed for informed decisions, cost optimization, and performance analysis.

Building the Transparency Layer

Bringing the "Stats for Nerds" panel to life involved a thoughtful journey across our stack, from data capture on the backend to a dynamic, interactive UI on the frontend.

1. Defining the Data Contract

First, we needed a consistent way to represent this new wealth of information. We introduced a NerdStatsData interface in src/types/nerd-stats.ts. This interface encapsulates global totals (tokens, cost, calls) and a detailed per-phase breakdown, including timing information. This shared type ensures consistency as data flows from the backend to the frontend.

We then extended our existing AutoFixEvent and RefactorEvent types to include an optional nerdStats field. This allowed us to enrich our event stream without breaking existing consumers.

2. Sourcing the Metrics

The heart of our metrics collection lies where the LLM interactions happen. We modified four key service files—issue-detector.ts, fix-generator.ts (for AutoFix), opportunity-detector.ts, and improvement-generator.ts (for Refactor)—to capture tokenUsage, costEstimate, model, and provider directly from the LLMCompletionResult on relevant events like batch_complete and fix_generated. This ensures we're gathering the most accurate data right at the source.

We also made a small but crucial change by exporting computeEnergy() from src/lib/workflow-metrics.ts, making this utility function available for our frontend UI to calculate estimated energy consumption based on token usage.

3. Orchestrating the Data Flow

Our pipeline orchestrators (auto-fix/pipeline.ts and refactor/pipeline.ts) became central to accumulating and managing the NerdStatsData. We introduced helper functions like accumulateNerd(), markPhaseStart(), and markPhaseComplete() to precisely track metrics and timings across different stages of the pipeline.

A neat trick here involved wrapping every yield operation with a withNerd() helper. This function attaches a structuredClone() of the current nerdStats accumulator to the yielded event. This ensures that each event emitted during a live pipeline run carries a snapshot of the metrics up to that point, enabling our live UI updates.

Finally, upon pipeline completion, the accumulated nerdStats are persisted within the existing stats JSON column in our Prisma database. This was a significant win, as it meant no database schema migrations were required, simplifying deployment!

4. Crafting the User Experience

The frontend component, src/components/shared/nerd-stats.tsx, is where all this data comes to life. We designed it as a collapsible card, offering two views:

Collapsed View: A concise summary showing 12.4k tok · $0.0312 · 5 calls, providing a quick glance at the run's resource consumption.
Expanded View: A detailed, six-cell grid displaying tokens, cost, calls, energy, time saved, and the model used. Below this, a per-phase table breaks down timings and metrics for each stage of the pipeline, complete with a pulsing indicator to highlight the currently active phase during live runs.

This component cleverly uses computeEnergy() and formatting helpers like formatEnergy() and formatTimeSaved() (also from workflow-metrics) to present the data in an easily digestible format.

Finally, we wired this component into both the AutoFix and Refactor detail pages. The UI intelligently subscribes to Server-Sent Events (SSE) for liveNerdStats during active runs, providing real-time updates. For completed runs, it gracefully falls back to displaying the nerdStats retrieved from the run.stats field in the database.

Smooth Sailing and Smart Decisions

One of the most satisfying aspects of this feature's development was the relative lack of major roadblocks. This wasn't just luck; it was a testament to some thoughtful architectural decisions made previously:

Existing SSE Infrastructure: Our Server-Sent Events routes were already designed to spread event data and add a timestamp. This meant our new nerdStats seamlessly passed through without requiring any modifications to the SSE routing layer itself.
Flexible JSON Column: The stats column in our Prisma schema was already defined as Json?. This flexibility allowed us to persist the new nerdStats object directly within it, avoiding the need for a database migration. A simple as unknown as Prisma.InputJsonValue cast handled the TypeScript-Prisma type compatibility.
Clean Implementation: The modular nature of our services and orchestrators allowed for a clean, contained implementation of the metrics capture and aggregation logic.

We did note a pre-existing, unrelated test failure in kimi.test.ts where the expected model kimi-k2-0711 differed from the provider's kimi-k2-0711-preview. This was acknowledged and isolated, confirming it wasn't a regression from our new work.

What's Next?

With the "Stats for Nerds" panel now live for AutoFix and Refactor, our immediate next steps involve thorough manual QA to ensure everything works as expected across various scenarios (live updates, completed runs, and graceful handling of older runs without nerdStats).

Looking ahead, the pattern we've established for capturing and displaying these metrics is highly reusable. We're already considering extending this valuable transparency to our Code Analysis pipeline, bringing the same level of insight to even more of our AI-driven developer tools.

We believe this new feature will not only satisfy your inner nerd but also provide invaluable data for understanding, optimizing, and building even better AI-powered development experiences. Dive in, explore the numbers, and let us know what you discover!