nyxcore-systems
4 min read

Building Real-Time Token Usage Stats: A Late-Night Dev Session Deep Dive

Follow along as I explore adding live token usage and 'stats for nerds' to our AutoFix pipeline. Sometimes the best insights come from those 2 AM development sessions.

llmreal-timessedeveloper-toolstoken-usagenextjs

Building Real-Time Token Usage Stats: A Late-Night Dev Session Deep Dive

It's 2 AM, the coffee's gone cold, and I'm deep in the weeds of our AutoFix pipeline. You know those development sessions where you start with a simple feature request and end up mapping the entire architecture of your system? This is one of those nights.

The Mission: Stats for Nerds

The goal seemed straightforward: add live token usage, model information, and real-time cost tracking to our AutoFix and Refactor pipeline pages. You know, those "stats for nerds" that developers secretly love to obsess over while their code is being processed by LLMs.

But as any seasoned developer knows, "simple" features have a way of revealing the hidden complexity lurking beneath your abstractions.

The Archaeological Dig

First, I needed to understand what we were already capturing. Time for some digital archaeology through our codebase.

The Good News: Our LLM provider layer was already doing the heavy lifting! Buried in src/server/services/llm/types.ts, I found that every LLMCompletionResult already includes:

typescript
{
  tokenUsage: { prompt, completion, total },
  costEstimate: number,
  model: string,
  provider: string
}

The Plot Twist: We were capturing all this rich data but throwing it away at the SSE (Server-Sent Events) layer. Our AutoFix issue-detector.ts and fix-generator.ts were calling provider.complete(), getting back all this juicy metadata, and then... just not sending it to the frontend.

It's like having a treasure chest and only taking out the coins while leaving the gems behind.

The Current State of Affairs

Here's what I discovered about our different pipelines:

  • AutoFix: Captures token data internally but doesn't stream it via SSE
  • Refactor: Partially saves token usage to the database but inconsistently includes it in real-time events
  • Code Analysis: Actually accumulates totalTokens and totalCost but only sends stats when a phase completes

Each pipeline had evolved its own approach to handling LLM metadata. Classic case of organic growth without a unified strategy.

Lessons Learned (The Pain Log)

Command Line Gotchas

The Prisma Trap: I spent way too long trying to use npx prisma db execute --stdin for SELECT queries, wondering why I wasn't getting any output. Turns out this command doesn't return SELECT results—it's designed for mutations.

Lesson learned: Use npx tsx -e with PrismaClient for quick database queries during development.

The Turbopack Mirage: Tried to speed up development with the --turbopack flag, only to be greeted with error: unknown option '--turbopack'. Our Next.js 14.2.35 setup doesn't support it yet.

Lesson learned: Always check compatibility before assuming new features are available in your current setup.

The Architecture Emerges

After mapping out the data flow, here's what I found:

  1. Token data flows from LLM providersPipeline processorsSSE eventsFrontend
  2. We already have cost calculation logic in src/server/services/llm/types.ts
  3. Energy consumption tracking exists in src/lib/workflow-metrics.ts
  4. The missing piece: Consistent streaming of this data to the frontend

The Implementation Strategy

Rather than band-aid solutions for each pipeline, I'm planning a unified approach:

Phase 1: Extend the Event Types

Update AutoFixEvent and RefactorEvent to include:

typescript
{
  tokenUsage: TokenUsage,
  model: string,
  provider: string,
  costEstimate: number,
  timing: PhaseTimingData
}

Phase 2: Pipeline Consistency

Ensure all pipeline processors (issue-detector.ts, fix-generator.ts, opportunity-detector.ts, etc.) capture and forward LLM metadata.

Phase 3: Real-Time Aggregation

Build running totals in the pipeline orchestrators and stream cumulative stats.

Phase 4: The Frontend Magic

Create a shared <LiveTokenStats> component that shows:

  • Real-time token consumption
  • Cost tracking
  • Model information
  • Energy usage estimates
  • Time-saved calculations

Why This Matters

You might wonder: why spend a 2 AM session on token usage stats? Here's why I think it's important:

  1. Transparency: Users should see what resources their requests consume
  2. Cost Awareness: Real-time cost tracking helps users make informed decisions
  3. Performance Insights: Token usage patterns reveal optimization opportunities
  4. Trust Building: Showing the "guts" of the operation builds confidence in the system

Next Steps

The exploration phase is complete. Now comes the fun part—implementation. The plan is to:

  1. Start with the event type extensions
  2. Work through each pipeline systematically
  3. Build the shared stats component
  4. Test with real AutoFix runs
  5. Iterate based on what we learn

The 2 AM Insight

Sometimes the best development happens in these quiet, focused sessions when you can really dig deep into a system. Tonight reminded me that good architecture isn't just about the code you write—it's about understanding the code you already have and finding the elegant path forward.

The token usage data was there all along, waiting to be surfaced. Sometimes the best features are hiding in plain sight, buried in the layers of abstraction we've built over time.

Now, time for some sleep. The actual coding can wait until I've had proper coffee.


Want to follow along with this implementation? I'll be documenting the journey as we build out these real-time stats. Because sometimes the most interesting stories are the ones that happen between the commits.