Peeking Behind the AI Curtain: Bringing Live LLM Stats to Our Dev Pipelines
We're adding real-time token usage, cost estimates, and even environmental impact metrics directly into our AutoFix and Refactor pipeline detail pages, offering unprecedented transparency into our AI operations.
As developers building AI-powered tools, we often interact with large language models (LLMs) as powerful black boxes. They take an input, process it, and deliver an output. But what's happening inside that box? How many tokens are being consumed? What's the estimated cost of a specific operation? And for the truly curious among us, what's the energy footprint or even the time saved by a particular model run?
Today, I'm excited to share our journey into unveiling these "stats for nerds" directly within our AutoFix and Refactor pipeline run detail pages. Our goal is to provide live, streaming insights into LLM usage as our AI pipelines actively process code.
The Quest for Transparency: Why Live Stats?
Imagine you're running an AutoFix pipeline on a large codebase. It's churning through issues, generating fixes, and interacting with LLMs behind the scenes. Currently, you see progress updates, but the underlying resource consumption remains a mystery until the very end, if at all. We want to change that.
Our vision is to offer real-time feedback:
- Token Usage: See prompt, completion, and total tokens accumulate live.
- Model Info: Know exactly which model is being used for each phase.
- Cost Estimate: Get a running estimate of the dollar cost.
- Environmental Impact: Track estimated energy consumption (Wh).
- Time Saved: Potentially even calculate the human time saved by the AI.
This level of transparency isn't just for curiosity's sake. It's crucial for understanding performance, optimizing costs, debugging unexpected behavior, and ultimately, building more efficient and responsible AI-powered tools.
Diving Deep: Where the Data Lives (and Where it Doesn't)
Our first step was an archaeological dig into our existing codebase. We needed to understand what data was already being captured and where the gaps were.
Good news first! Our core LLM provider layer, specifically the LLMCompletionResult type in src/server/services/llm/types.ts, already captures a wealth of information post-completion:
interface LLMCompletionResult {
tokenUsage: { prompt: number; completion: number; total: number };
costEstimate: number;
model: string;
provider: string;
// ... other fields
}
This was a huge win, confirming that the raw data we needed was indeed being generated.
The challenge, however, lay in how this data flowed (or didn't flow) through our pipelines:
- AutoFix Pipelines: Both
issue-detector.tsandfix-generator.tscorrectly callprovider.complete()and receive this token data. However, neither of them were sending this information onward via Server-Sent Events (SSE). This meant the frontend had no way to display it live. - Refactor Pipelines: The
improvement-generator.tsdoes savetokenUsage,costEstimate, andmodelto theRefactorItemdatabase record. But, again, this data wasn't being pushed out through SSE events during active streaming. Even more,opportunity-detector.tswasn't capturing token data at all. - Code Analysis: Our
pattern-detector.tsdoes accumulatetotalTokensandtotalCost, but only sends these in astatsevent right atphase_complete. We want live updates!
A pleasant discovery was src/lib/workflow-metrics.ts. This utility already contains logic for calculating energy consumption (Wh) and estimated time saved per model family. This means we can integrate these "bonus" stats with minimal effort once we have the core token data.
Lessons Learned from the Trenches
No development session is complete without a few bumps in the road. Here are a couple of insights from our recent exploration:
-
Prisma's
db executevs. Data Retrieval:- Problem: We tried using
npx prisma db execute --stdinwith aSELECTquery to quickly inspect some database records. - Outcome: It returned no output.
- Lesson:
prisma db executeis primarily for executing raw SQL statements that modify the database (likeINSERT,UPDATE,DELETE,CREATE TABLE) or for schema-level commands. It's not designed to return query results directly tostdout. - Workaround: For quick data inspection, the most reliable method is to use a simple TypeScript script with Prisma Client, executed via
npx tsx -e "...". This gives you full programmatic control and proper output.
- Problem: We tried using
-
Navigating Next.js
--turbopack:- Problem: Attempting to run our Next.js dev server with
--turbopack(a faster Rust-based successor to Webpack) for faster startup. - Outcome:
error: unknown option '--turbopack'. - Lesson: While Turbopack is incredibly promising, it's still experimental and not always fully supported or integrated into all Next.js versions or project setups. Relying on it for core development might lead to unexpected issues.
- Workaround: Sticking to the standard
npm run dev(which usesnext dev) or our custom./scripts/dev-start.shensures stability and compatibility with our current Next.js 14.2.35 setup.
- Problem: Attempting to run our Next.js dev server with
The Road Ahead: Our Implementation Plan
With a clear understanding of the data landscape and potential pitfalls, we've mapped out our next steps to bring these live stats to life:
- Extend SSE Event Contracts: We'll update our
AutoFixEventandRefactorEventtypes to include new fields liketokenUsage,model,provider,costEstimate, andtiming(for phase-specific durations). - Capture at the Source (AutoFix): Modify
issue-detector.tsandfix-generator.tsto actively capture theLLMCompletionResultdata and yield it as part of their respective SSE events. - Capture at the Source (Refactor): Update
opportunity-detector.tsto start capturing token data, and ensureimprovement-generator.tsalso yields its already-captured data via SSE. - Pipeline Aggregation: Our
pipeline.ts(for both AutoFix and Refactor) will be extended to accumulate running totals for tokens, cost, and timings across all phases, pushing these aggregated stats in intermediate SSE events. - Persist Final Stats: Extend the
AutoFixRun.statsandRefactorRun.statsJSON fields in the database to store the final{ totalTokens, totalCost, provider, model, phaseTimings }upon completion. - Build a Shared UI Component: Create a reusable
<LiveTokenStats>React component that can display real-time updates for tokens, cost, model, energy (usingworkflow-metrics.ts), and time-saved. - Wire it Up: Integrate the new component into the AutoFix and Refactor detail pages, connecting it to the SSE stream.
- End-to-End Testing: Rigorously test with real AutoFix and Refactor runs to ensure accuracy and a smooth user experience.
This journey promises to make our AI pipelines far more transparent and insightful. We're excited to give developers a clearer window into the powerful LLM operations happening within our tools, empowering them with "stats for nerds" that truly matter. Stay tuned for updates as we build this out!