nyxcore-systems
5 min read

Context Crunch: How We Slashed AI Prompt Sizes by Up To 91% with Workflow Digest Compression

We successfully implemented and verified an end-to-end digest compression system for our AI workflow prompts, leading to dramatic reductions in prompt sizes, improved context management, and significant cost savings. Learn how we did it!

AILLMPrompt EngineeringWorkflow AutomationBackendTypeScriptOptimizationCost Savings

In the world of Large Language Models (LLMs), context is king, but size is a silent killer. As our automated AI workflows become more sophisticated, the prompts we feed them grow, consuming more tokens, hitting context window limits, and ultimately, driving up operational costs. We knew we needed a smarter way to manage the conversation, to give our AI agents the essence of prior steps without burdening them with every single detail.

That's why our latest development sprint focused on a critical mission: implementing and verifying an end-to-end digest compression system for our workflow steps. The goal was clear: shrink those sprawling prompts and confirm the reduction on a live workflow run. I'm thrilled to report: mission accomplished. The digest system is fully operational, and the results are even better than we hoped.

The Challenge: Expanding Context, Exploding Costs

Imagine an AI agent building complex software. Each step—analyzing a repo, designing features, writing code—generates a wealth of information. For subsequent steps to build effectively, they need to understand what came before. Traditionally, this meant passing a large chunk of the previous step's output as context. This quickly becomes unsustainable:

  • Token Limits: LLMs have finite context windows. Long prompts mean less room for new instructions or generation.
  • Cost: Every token costs money. Longer prompts, especially in multi-step workflows, lead to rapidly escalating API bills.
  • Noise: Too much raw detail can sometimes obscure the truly important information, making it harder for the AI to focus.

Our solution? Digest compression. Instead of passing the full output of a workflow step, we generate a concise "digest"—a summary that captures the critical information needed for the next step, without all the verbose detail.

The Breakthrough: Live Verification and Jaw-Dropping Reductions

Our focus for this session was verifying this digest system on a live, production-like workflow. We used a real "nyxCore - Kimi K2 v2" workflow (specifically, an Extension Builder pipeline, ID f196e1b6-962d-45b5-b586-646688cd2243) to ensure the system behaved as expected under realistic conditions.

The results speak for themselves:

Workflow StepOriginal Prompt Size (chars)Digested Prompt Size (chars)Reduction (%)
Analyze Target Repo7,4753,73050%
Design Features9,5644,15057%
Extend & Improve26,9663,60687%
Implementation Prompts41,0553,62691%

That's right – we saw a 91% reduction in prompt size for our "Implementation Prompts" step! This isn't just a minor tweak; it's a fundamental shift in how our AI workflows consume and manage context. It means more efficient LLM usage, significantly lower costs, and agents that can operate within tighter context windows without losing critical information.

Under the Hood: Building Robustness

This success wasn't just about the compression algorithm itself. It involved ensuring the system was robust and integrated seamlessly into our existing workflow engine. Several key fixes and additions from previous sessions were confirmed to be working perfectly:

  • Error Logging: Enhanced error logging in src/server/services/step-digest.ts ensures we can quickly diagnose any issues during digest generation.
  • Backfill Loop: A crucial backfill loop in src/server/services/workflow-engine.ts (around line 585) handles scenarios where a workflow might resume, and some completed steps might be missing their digests. This ensures data consistency.
  • Alternatives Selection Path: Digest generation was also integrated into the alternatives selection path within workflow-engine.ts (around line 673), covering all possible execution flows.

To facilitate focused testing, we also created (and subsequently deleted) temporary scripts like scripts/run-workflow.ts for direct workflow execution (bypassing authentication and SSE) and scripts/backfill-digests.ts for one-off digest backfills. These temporary tools proved invaluable for iterating quickly and verifying edge cases without disrupting the main application flow.

Lessons Learned: Practical Tips from the Trenches

While this session was largely smooth sailing thanks to prior foundational work, we did hit a minor hiccup that's worth sharing as a practical tip:

  • npx prisma db execute vs. Raw psql: For executing raw SQL queries, especially when dealing with specific database features or quoting conventions, npx prisma db execute can sometimes be finicky. We found it much more reliable to directly use psql with the correct credentials (e.g., PGPASSWORD=nyxcore_dev psql -h localhost -U nyxcore -d nyxcore). Remember that PostgreSQL column names in camelCase often need to be explicitly quoted in raw SQL queries! For example, workflows.projectId needs to be "workflows"."projectId".

What's Next? Pushing the Boundaries

With digest compression now a core part of our workflow engine, we're already looking ahead to further optimizations and enhancements:

  1. Project Wisdom Integration: Testing {{project.wisdom}} by linking a project to a workflow with consolidated data will unlock even deeper context management capabilities.
  2. Token Cost Comparison: The ultimate metric! We'll be rigorously comparing total token costs across multiple workflow runs, both before and after digest compression, to quantify the financial impact.
  3. Optional Backfill: Considering making the backfill loop optional (via an environment variable or workflow setting) to avoid unnecessary LLM calls (e.g., to Haiku) when digests are guaranteed to exist.
  4. Minor Bug Fixes: Tying up loose ends, like fixing a pre-existing type error in discussions/[id]/page.tsx:139 related to a Badge variant.
  5. Security Hardening: Adding Row-Level Security (RLS) policies for projectId columns to ensure robust cross-tenant access control.

This milestone is a significant step forward in building more efficient, cost-effective, and intelligent AI-powered workflows. By strategically managing context, we're not just saving money; we're enabling our AI agents to perform better, focus on what truly matters, and ultimately, deliver more impactful results.