nyxcore-systems
6 min read

Shrinking AI Prompts: A Deep Dive into Our Workflow Engine's Digest Compression Debugging

We tackled a critical bug in our workflow engine where AI step digests weren't generating, leading to bloated prompts. This post details our debugging journey, the fixes implemented, and the significant prompt size reductions achieved, making our LLM interactions more efficient and cost-effective.

AILLMPrompt EngineeringDebuggingTypeScriptWorkflow EnginePerformanceFrontend

In the world of AI-driven applications, managing the size and context of prompts sent to Large Language Models (LLMs) is paramount. Larger prompts mean higher token costs, slower response times, and an increased risk of hitting context window limits. That's why we invested in a "step digest compression" system for our workflow engine – a clever way to summarize previous workflow steps, drastically reducing the amount of redundant information sent in subsequent prompts.

The problem? It wasn't working. Our workflow engine was still sending bloated prompts, and the crucial digest generation wasn't happening as intended. This post chronicles our recent debugging session, where we unearthed the silent killers, implemented robust fixes, and finally verified our digest compression system end-to-end.

The Case of the Missing Digests: Unearthing the Problem

Our goal was clear: fix the digest generation and prove that our compression system was effectively shrinking prompt sizes. We knew digests should be there, but they weren't. This pointed to a failure in the generation process itself.

Silent Errors and Hidden Paths

Our investigation began in src/server/services/step-digest.ts. It quickly became apparent that errors during digest generation were being swallowed whole. A simple console.error in the catch block of our Haiku (our internal LLM service) calls immediately revealed the underlying issues. This reinforced a golden rule of debugging: always log your errors, especially in critical paths.

With visibility restored, we identified several key areas where digest generation was being bypassed or silently failing:

  1. Workflow Resumption Backfill: When a workflow resumed, any steps that had completed before our digest system was fully operational (or had failed to generate a digest) weren't getting a second chance. We implemented a digest backfill loop in src/server/services/workflow-engine.ts (around line 585, after buildChainContext()) to ensure that on resume, any completed step missing a digest would have one generated before the next step began. This was crucial for robustness and handling legacy data.

  2. Alternatives Selection Bypass: A significant oversight was found in the alternatives selection path (workflow-engine.ts, ~line 652-668). Steps configured with generateCount > 1 (meaning the LLM proposes multiple options) were completely skipping digest generation. This meant some of our most complex, multi-option steps were completely uncompressed. We integrated digest generation directly into this path.

  3. Normal Completion Path Resilience: Even in the standard step completion flow, we added console.error logging to the digest catch block (around line 863) to ensure maximum visibility for any future issues.

The Proof is in the Prompt: Measuring the Impact

After implementing these fixes, it was time for the ultimate test: a full end-to-end workflow run with meticulous prompt size measurements. But first, we applied a crucial fix retroactively. We ran a backfill script to generate digests for two critical steps that had completed before our fixes were in place:

  • Analyze Target Repo: Saw a significant reduction from 8KB to 4KB.
  • Design Features: Reduced from 9KB to 4KB.

This immediate impact was a promising sign. Then, we kicked off a full workflow (4e369e42) and watched the prompt sizes shrink:

  • Design Features: The prompt size was 15.4KB. Crucially, this prompt benefited from a digest-compressed Analyze Target Repo step, which went from its original 10.7KB down to 3.6KB. This single compression saved over 7KB!
  • Review: A lean 3.7KB prompt.
  • Extend & Improve: Another 15.4KB prompt.
  • Implementation Prompts: This step really highlighted the power of compression. Its prompt was 32.2KB, but it intelligently used the .full content for the Extend & Improve step (which was 22.9KB) plus digests for all earlier steps. Without digests, this prompt would have been significantly larger, potentially hitting context limits or incurring massive costs.

The results were clear: the digest compression system was now fully operational and delivering substantial prompt size reductions. We cleaned up our temporary logging and test scripts, restoring our test workflow to its original state, marking this session a resounding success.

Lessons from the Trenches: Overcoming Development Hurdles

No debugging session is without its challenges. Here are a couple of key lessons we learned along the way:

Prisma's db execute vs. Raw psql

We initially tried to query our workflow steps using npx prisma db execute. However, Prisma's db execute command expects model names, not raw table names, and our model-to-table mapping was inconsistent in places. This led to frustrating "table not found" errors.

Lesson Learned: While ORMs like Prisma are fantastic for development speed, sometimes you need to get down to the metal. We quickly pivoted to using psql directly, which gave us the raw power we needed.

bash
PGPASSWORD=nyxcore_dev psql -h localhost -U nyxcore -d nyxcore

A critical gotcha we discovered: Prisma column names in the database are camelCase, not snake_case as is common in many SQL conventions. When querying raw SQL, you must quote these camelCase column names (e.g., "workflowId", "selectedIndex") to avoid syntax errors.

Triggering Workflows from the CLI

To efficiently test our fixes, we needed a way to trigger workflows directly from our development environment without going through the UI. Our first attempt to trigger the SSE workflow endpoint from the CLI failed because authenticateRequest() relies on auth() from NextAuth, which reads browser session cookies – impossible from a simple CLI script.

Lesson Learned: When testing backend logic that's normally behind a UI, sometimes the best approach is to bypass the UI layer entirely. We created a temporary scripts/run-workflow.ts that directly imported and called runWorkflow(), completely bypassing the SSE/authentication stack. Running it with npx tsx provided a rapid iteration loop for testing. We even added auto-selection logic for alternatives steps to streamline our full workflow runs.

What's Next? Continuing the Optimization Journey

With digest compression firmly in place, our immediate next steps involve further optimizations and testing:

  1. Test {{project.wisdom}}: Verify our new project.wisdom feature by linking a project with consolidation data to a workflow.
  2. Token Cost Comparison: Conduct a thorough comparison of total token costs before and after digest compression across multiple workflow runs to quantify the exact savings.
  3. Optional Backfill: Consider making the digest backfill loop optional (via an environment variable or workflow setting) to avoid unnecessary Haiku calls on every resume for workflows that are already fully compressed.
  4. Type Error Fix: Address a pre-existing type error in discussions/[id]/page.tsx:139 related to a Badge variant.
  5. RLS Policies: If cross-tenant access becomes a concern, add the projectId column to our Row-Level Security (RLS) policies.

This session was a crucial step in enhancing the efficiency and cost-effectiveness of our AI-powered workflow engine. By diligently debugging, fixing core issues, and learning from our challenges, we've significantly improved how our system interacts with LLMs, paving the way for even more complex and intelligent applications.