Peeking Behind the AI Curtain: Bringing Live LLM Stats to Our Dev Pipelines

As developers building AI-powered tools, we often interact with large language models (LLMs) as powerful black boxes. They take an input, process it, and deliver an output. But what's happening inside that box? How many tokens are being consumed? What's the estimated cost of a specific operation? And for the truly curious among us, what's the energy footprint or even the time saved by a particular model run?

Today, I'm excited to share our journey into unveiling these "stats for nerds" directly within our AutoFix and Refactor pipeline run detail pages. Our goal is to provide live, streaming insights into LLM usage as our AI pipelines actively process code.

The Quest for Transparency: Why Live Stats?

Imagine you're running an AutoFix pipeline on a large codebase. It's churning through issues, generating fixes, and interacting with LLMs behind the scenes. Currently, you see progress updates, but the underlying resource consumption remains a mystery until the very end, if at all. We want to change that.

Our vision is to offer real-time feedback:

Token Usage: See prompt, completion, and total tokens accumulate live.
Model Info: Know exactly which model is being used for each phase.
Cost Estimate: Get a running estimate of the dollar cost.
Environmental Impact: Track estimated energy consumption (Wh).
Time Saved: Potentially even calculate the human time saved by the AI.

This level of transparency isn't just for curiosity's sake. It's crucial for understanding performance, optimizing costs, debugging unexpected behavior, and ultimately, building more efficient and responsible AI-powered tools.

Diving Deep: Where the Data Lives (and Where it Doesn't)

Our first step was an archaeological dig into our existing codebase. We needed to understand what data was already being captured and where the gaps were.

Good news first! Our core LLM provider layer, specifically the LLMCompletionResult type in src/server/services/llm/types.ts, already captures a wealth of information post-completion:

typescript

interface LLMCompletionResult {
  tokenUsage: { prompt: number; completion: number; total: number };
  costEstimate: number;
  model: string;
  provider: string;
  // ... other fields
}

This was a huge win, confirming that the raw data we needed was indeed being generated.

The challenge, however, lay in how this data flowed (or didn't flow) through our pipelines:

AutoFix Pipelines: Both issue-detector.ts and fix-generator.ts correctly call provider.complete() and receive this token data. However, neither of them were sending this information onward via Server-Sent Events (SSE). This meant the frontend had no way to display it live.
Refactor Pipelines: The improvement-generator.ts does save tokenUsage, costEstimate, and model to the RefactorItem database record. But, again, this data wasn't being pushed out through SSE events during active streaming. Even more, opportunity-detector.ts wasn't capturing token data at all.
Code Analysis: Our pattern-detector.ts does accumulate totalTokens and totalCost, but only sends these in a stats event right at phase_complete. We want live updates!

A pleasant discovery was src/lib/workflow-metrics.ts. This utility already contains logic for calculating energy consumption (Wh) and estimated time saved per model family. This means we can integrate these "bonus" stats with minimal effort once we have the core token data.

Lessons Learned from the Trenches

No development session is complete without a few bumps in the road. Here are a couple of insights from our recent exploration:

Prisma's db execute vs. Data Retrieval:
- Problem: We tried using npx prisma db execute --stdin with a SELECT query to quickly inspect some database records.
- Outcome: It returned no output.
- Lesson: prisma db execute is primarily for executing raw SQL statements that modify the database (like INSERT, UPDATE, DELETE, CREATE TABLE) or for schema-level commands. It's not designed to return query results directly to stdout.
- Workaround: For quick data inspection, the most reliable method is to use a simple TypeScript script with Prisma Client, executed via npx tsx -e "...". This gives you full programmatic control and proper output.
Navigating Next.js --turbopack:
- Problem: Attempting to run our Next.js dev server with --turbopack (a faster Rust-based successor to Webpack) for faster startup.
- Outcome: error: unknown option '--turbopack'.
- Lesson: While Turbopack is incredibly promising, it's still experimental and not always fully supported or integrated into all Next.js versions or project setups. Relying on it for core development might lead to unexpected issues.
- Workaround: Sticking to the standard npm run dev (which uses next dev) or our custom ./scripts/dev-start.sh ensures stability and compatibility with our current Next.js 14.2.35 setup.

The Road Ahead: Our Implementation Plan

With a clear understanding of the data landscape and potential pitfalls, we've mapped out our next steps to bring these live stats to life:

Extend SSE Event Contracts: We'll update our AutoFixEvent and RefactorEvent types to include new fields like tokenUsage, model, provider, costEstimate, and timing (for phase-specific durations).
Capture at the Source (AutoFix): Modify issue-detector.ts and fix-generator.ts to actively capture the LLMCompletionResult data and yield it as part of their respective SSE events.
Capture at the Source (Refactor): Update opportunity-detector.ts to start capturing token data, and ensure improvement-generator.ts also yields its already-captured data via SSE.
Pipeline Aggregation: Our pipeline.ts (for both AutoFix and Refactor) will be extended to accumulate running totals for tokens, cost, and timings across all phases, pushing these aggregated stats in intermediate SSE events.
Persist Final Stats: Extend the AutoFixRun.stats and RefactorRun.stats JSON fields in the database to store the final { totalTokens, totalCost, provider, model, phaseTimings } upon completion.
Build a Shared UI Component: Create a reusable <LiveTokenStats> React component that can display real-time updates for tokens, cost, model, energy (using workflow-metrics.ts), and time-saved.
Wire it Up: Integrate the new component into the AutoFix and Refactor detail pages, connecting it to the SSE stream.
End-to-End Testing: Rigorously test with real AutoFix and Refactor runs to ensure accuracy and a smooth user experience.

This journey promises to make our AI pipelines far more transparent and insightful. We're excited to give developers a clearer window into the powerful LLM operations happening within our tools, empowering them with "stats for nerds" that truly matter. Stay tuned for updates as we build this out!