Unveiling the AI's Inner Workings: Bringing Live LLM Observability to Our Pipelines
Ever wished you could see exactly what your AI is 'thinking' and *costing* in real-time? This post details our journey to add live token usage, model info, and 'stats for nerds' to our AutoFix and Refactor pipelines.
As developers building AI-powered tools, we often find ourselves wrestling with black boxes. Our applications leverage powerful Large Language Models (LLMs) to perform complex tasks like automated code fixes or intelligent refactoring, but the actual "work" happening inside those models remains opaque. How many tokens were used? Which specific model handled the request? What was the estimated cost? And crucially, how long did it really take?
These aren't just academic questions. For debugging, performance optimization, cost management, and ultimately, a better developer experience, real-time visibility into these metrics is invaluable. That's precisely the challenge we tackled in our last development session: bringing live, streaming LLM usage statistics to our AutoFix and Refactor pipeline run detail pages.
The Quest for Transparency: Our Goal
Our primary objective was clear: during an active AutoFix or Refactor pipeline run, as Server-Sent Events (SSE) stream updates to the UI, we wanted to display:
- Live Token Usage: Prompt, completion, and total tokens.
- Model Information: Which specific LLM model was invoked.
- Cost Estimates: A running tally of the monetary cost.
- "Stats for Nerds": Beyond the basics, we aimed for insights like energy consumption and estimated time saved, leveraging our existing
workflow-metricslibrary.
This wasn't about adding a new feature per se, but enhancing the observability of existing, critical features.
Diving In: Where the Data Lives (or Doesn't Yet)
The first step was to understand our current state. Where is this data already being captured, and where are the gaps? This exploration revealed some crucial insights:
The Good News: We're Already Capturing Key LLM Data!
A huge win right off the bat: our LLMCompletionResult type (defined in src/server/services/llm/types.ts) already meticulously captures:
interface LLMCompletionResult {
tokenUsage: {
prompt: number;
completion: number;
total: number;
};
costEstimate: number;
model: string;
provider: string;
// ... other fields
}
This means the core data – token counts, cost estimates, model, and provider – is readily available at the point of LLM interaction. This saved us from having to instrument every LLM call from scratch.
The Catch: It's Not Making It to the Client (Yet)
While the data exists, it's not being propagated to the client-side via our SSE streams. Here's a breakdown of the current state across our pipelines:
- AutoFix Pipeline (
issue-detector.ts,fix-generator.ts): These components correctly callprovider.complete()and receive theLLMCompletionResult. However, thetokenUsageand related data are not included in the SSE events that update the UI. - Refactor Pipeline (
improvement-generator.ts): This step does savetokenUsage,costEstimate, andmodelto theRefactorItemdatabase record. But, again, this data isn't actively pushed via SSE events during the run. - Refactor Pipeline (
opportunity-detector.ts): This was a key discovery – this particular step doesn't capture token data at all. A definite gap we need to address. - Code-Analysis Pipeline (
pattern-detector.ts): This pipeline accumulatestotalTokensandtotalCost, but only sends these statistics as part of aphase_completeevent, not as continuous updates. - Reusable Metrics (
src/lib/workflow-metrics.ts): We have existing utilities for calculating energy consumption (Wh) and estimated time saved per model family. This is perfect for our "stats for nerds" section!
This exploration confirmed our hypothesis: the data is there, but the plumbing to stream it live to the UI is missing.
Navigating the Trenches: Lessons from the Dev Server
Even in a planning session, you hit unexpected snags. Here are a couple of "pain points" that turned into immediate lessons learned:
-
Prisma
db execute --stdinDoesn't Return SELECT Results:- The Attempt: I tried using
npx prisma db execute --stdinto quickly run aSELECTquery and inspect some data in the database. - The Fail: No output. Turns out, this command is designed for DDL (Data Definition Language) or DML (Data Manipulation Language) statements, not for returning results from
SELECTqueries. - The Workaround/Lesson: For interactive database querying with Prisma, the correct approach is to use
npx tsx -e(ornode -r tsx) to execute a TypeScript file that uses the Prisma Client. This allows you to write and run arbitrary Prisma queries and log their results. A simpleconsole.log(await prisma.autoFixRun.findMany())can save a lot of head-scratching.
- The Attempt: I tried using
-
Next.js
--turbopackFlag:- The Attempt: Out of curiosity, I tried starting the dev server with
npm run dev -- --turbopack. - The Fail:
error: unknown option '--turbopack'. - The Workaround/Lesson: While Turbopack is a promising next-gen bundler for Next.js, it's still evolving and not always fully integrated with all versions or development setups. For now, sticking to the standard
npm run devor our./scripts/dev-start.shensures a stable development environment. Sometimes, the tried and true is the best path.
- The Attempt: Out of curiosity, I tried starting the dev server with
These small detours are a natural part of the development process and underscore the importance of understanding your tools.
The Blueprint: Crafting the Real-time Experience
With a clear understanding of the data landscape and some practical lessons under our belt, we moved to design the implementation. This is where the "no code changes yet" status becomes exciting – it's all about solidifying the plan before writing a single line of feature code.
Our plan involves a series of coordinated changes across the backend and frontend:
-
Extend SSE Event Types: We'll enhance our
AutoFixEventandRefactorEventtypes to include new fields:tokenUsage,model,provider,costEstimate, andtiming(for phase-specific durations). This provides a standardized contract for the data streaming to the client. -
Instrument Pipeline Steps:
- Modify
issue-detector.tsandfix-generator.ts(AutoFix) to capture theLLMCompletionResultdata and include it when yielding new events. - Modify
opportunity-detector.tsandimprovement-generator.ts(Refactor) to both capture token data (where missing) and yield it in their respective events.
- Modify
-
Accumulate and Stream in
pipeline.ts: The corepipeline.tslogic for both AutoFix and Refactor will be updated to:- Maintain running totals for
totalTokens,totalCost, andphaseTimings. - Include these accumulated stats in the SSE events that are pushed to the client.
- Maintain running totals for
-
Persist Final Stats to DB: Once a run completes, the final aggregated
{ totalTokens, totalCost, provider, model, phaseTimings }will be saved to theAutoFixRun.statsandRefactorRun.statsJSON fields in the database for historical analysis. -
Build a Shared UI Component: A new, reusable
<LiveTokenStats>React component will be created. This component will be responsible for:- Listening to the SSE stream.
- Displaying real-time token usage, cost, model, provider.
- Integrating our
workflow-metrics.tsto show estimated energy consumption and time saved. - Presenting phase-specific timings.
-
Wire It Up: Finally, we'll integrate the
<LiveTokenStats>component into the AutoFix and Refactor detail pages, connecting it to the SSE endpoints.
What's Next?
The planning is complete, the blueprint is drawn. The next session will be all about execution. Here's our immediate action plan:
- Finalize the exact schema for extending
AutoFixEventandRefactorEvent. - Implement the capture and yielding of token data in
issue-detector.tsandfix-generator.ts. - Do the same for
opportunity-detector.tsandimprovement-generator.ts. - Update the
pipeline.tslogic to accumulate and include these running totals in SSE events. - Extend
AutoFixRun.statsandRefactorRun.statsDB schemas to persist final metrics. - Develop the shared
<LiveTokenStats>frontend component. - Integrate the component into the existing detail pages.
- Conduct thorough end-to-end testing with real pipeline runs.
This feature is more than just "stats"—it's about empowering developers with immediate feedback, helping them understand the performance characteristics and cost implications of their AI-driven workflows. It's a significant step towards a more transparent and debuggable LLM-powered development environment. Stay tuned for the implementation details!
{
"thingsDone": [
"Confirmed LLMCompletionResult already captures token usage, cost, model, provider.",
"Identified gaps in SSE streaming for token data across AutoFix and Refactor pipelines.",
"Noted missing token data capture in Refactor's opportunity-detector.ts.",
"Confirmed reusability of workflow-metrics.ts for energy and time-saved calculations.",
"Designed a detailed plan for extending SSE events, instrumenting pipeline steps, accumulating stats, persisting data, and building a shared UI component."
],
"pains": [
"Prisma 'db execute --stdin' does not return SELECT query results.",
"Next.js '--turbopack' flag was not recognized in current environment."
],
"successes": [
"Discovery that core LLM usage data is already captured at the LLM provider layer.",
"Existing workflow-metrics.ts library provides reusable energy and time-saved calculations.",
"Successful workaround for Prisma DB querying using 'tsx -e' with PrismaClient.",
"Developing a comprehensive plan before writing code, ensuring a clear path forward."
],
"techStack": [
"Next.js",
"TypeScript",
"Prisma",
"Server-Sent Events (SSE)",
"Large Language Models (LLMs)",
"React"
]
}