Building a Smart Compression System: How We Reduced AI Prompt Sizes by 87%

As AI-powered workflows become more complex, we face a growing challenge: prompt bloat. Each step in a multi-stage AI pipeline generates increasingly verbose outputs, leading to exponentially growing context windows and skyrocketing token costs. Today, I'm excited to share how we solved this problem with an automated digest compression system that reduced our prompt sizes by up to 87% without losing critical context.

The Problem: When AI Gets Chatty

Picture this: you're running a 5-step AI workflow to analyze a codebase and generate implementation suggestions. By step 4, your prompts have ballooned from a manageable 7,000 characters to over 41,000 characters. Each subsequent step becomes more expensive and slower, while the AI struggles to focus on what actually matters buried in all that text.

This was exactly the situation we faced with our Extension Builder pipeline. We needed a way to preserve the essential insights from each step while dramatically reducing the noise.

The Solution: Intelligent Step Digests

Our approach was to implement an automated digest system that runs after each workflow step completes. Here's how it works:

1. Automatic Digest Generation

When a workflow step finishes, our system automatically triggers a compression process:

typescript

// After step completion, generate digest
const digest = await generateStepDigest({
  stepOutput: completedStep.output,
  stepType: completedStep.type,
  maxLength: 4000 // Target compression size
});

await updateStepDigest(stepId, digest);

2. Smart Context Replacement

Instead of passing the full verbose output to the next step, we substitute it with the compressed digest, maintaining the workflow's logical flow while dramatically reducing token usage.

3. Backfill for Existing Workflows

We built a smart backfill system that can resume interrupted workflows and generate missing digests on-the-fly:

typescript

// Check for missing digests when resuming workflows
const stepsNeedingDigests = completedSteps.filter(step => 
  !step.digest && step.status === 'COMPLETED'
);

for (const step of stepsNeedingDigests) {
  await generateAndStoreDigest(step);
}

Real-World Results

We tested the system on a live workflow processing the "nyxCore - Kimi K2 v2" Extension Builder pipeline. The results were impressive:

Step	Original Size	Compressed Size	Reduction
Analyze Target Repo	7,475 chars	3,730 chars	50%
Design Features	9,564 chars	4,150 chars	57%
Extend & Improve	26,966 chars	3,606 chars	87%
Implementation Prompts	41,055 chars	3,626 chars	91%

The most dramatic improvements came in the later steps, where verbose outputs were distilled down to their essential insights—exactly where we needed the most help.

Technical Implementation Insights

Database Schema Evolution

We extended our existing schema to support the digest system:

sql

-- Added digest column to workflow steps
ALTER TABLE workflow_steps ADD COLUMN digest TEXT;

-- Enhanced project linking for better context
ALTER TABLE workflows ADD COLUMN "projectId" TEXT;
ALTER TABLE repositories ADD COLUMN "projectId" TEXT;

Error Handling and Resilience

One key lesson was building robust error handling around the digest generation process. AI services can be unpredictable, so we implemented comprehensive logging and fallback mechanisms:

typescript

try {
  const digest = await aiService.generateDigest(content);
  return digest;
} catch (error) {
  console.error(`Digest generation failed for step ${stepId}:`, error);
  // Fallback to truncated version rather than failing completely
  return content.substring(0, 4000) + '...';
}

Lessons Learned

Database Tooling Quirks

Working with different database tools taught us some practical lessons. While ORMs are great for most operations, sometimes you need to go direct:

bash

# This didn't work as expected
npx prisma db execute --file query.sql

# This was more reliable for complex queries
PGPASSWORD=password psql -h localhost -U user -d database

Remember to quote camelCase column names when working with raw SQL!

Performance vs. Accuracy Trade-offs

We discovered that making the backfill process optional (controlled by environment variables) gives teams flexibility to balance between context accuracy and API costs during development vs. production.

Looking Forward

With the core compression system working beautifully, we're excited about the next phase:

Project Wisdom Integration: Linking workflows to project-specific knowledge bases for even smarter context management
Cost Analysis: Comprehensive before/after token cost comparisons across multiple workflow types
Adaptive Compression: Making compression ratios dynamic based on step importance and downstream requirements

The Bottom Line

Building an intelligent prompt compression system isn't just about saving money on AI API calls (though that's nice too). It's about creating more focused, efficient AI workflows that can scale without drowning in their own verbosity.

The 87% reduction in our largest prompts means faster processing, lower costs, and most importantly, AI that can focus on what really matters. Sometimes the best way to say more is to say less.

Want to implement something similar in your AI workflows? The key is starting simple with a basic digest system and iterating based on your specific use cases. Feel free to reach out if you'd like to discuss implementation strategies!