nyxcore-systems
8 min read

Beyond the Context Window: Shrinking a 374KB Prompt in Our AI Workflow Engine

Our AI-powered 'Deep Build Pipeline' was generating a monstrous 374KB prompt, hitting context limits and draining resources. This post dives into how we tackled this beast with intelligent step digests and project-centric knowledge.

LLMPrompt EngineeringContext ManagementWorkflow EnginePrismaTypeScriptAI DevelopmentClaude HaikutRPC

The 374KB Elephant in the Room

In the world of AI-driven development, context is king. But too much context, especially when it's verbose or redundant, can quickly become a monarch demanding an exorbitant tax in terms of cost, latency, and hitting those dreaded context window limits. For our "Deep Build Pipeline," a multi-stage AI workflow engine designed to iterate on complex software features, this wasn't just a theoretical concern – it was a very real, very painful 374KB problem.

Our pipeline, which orchestrates everything from initial idea generation and research to feature implementation, review, and improvement, was accumulating an ever-growing prompt. Each completed step's full output was being fed into the next, leading to a massive cascade of information that was pushing our LLM (specifically, Claude Haiku) to its absolute limits. Not only was this inefficient, but it also risked diluting the model's focus, burying critical instructions under a mountain of previous conversation.

The goal was clear: compress the context without losing valuable information, and introduce a higher-level, project-centric understanding.

Phase 1: Digesting the Details with Step Summaries

Our first attack vector was to introduce intelligent "step digests." Instead of passing the entire verbose output of a completed workflow step, why not summarize it into a concise, actionable digest?

Schema Evolution: The digest Field

The journey began with a simple but crucial database schema modification. We added a digest field to our WorkflowStep model:

typescript
// prisma/schema.prisma
model WorkflowStep {
  id        String    @id @default(uuid())
  workflow  Workflow  @relation(fields: [workflowId], references: [id], onDelete: Cascade)
  workflowId String
  label     String
  // ... other fields
  output    String?   @db.Text
  digest    String?   @db.Text // ✨ New: The summarized output
  createdAt DateTime  @default(now())
  updatedAt DateTime  @updatedAt
}

This digest field would store the LLM-generated summary, ready to be retrieved and injected into subsequent prompts.

The generateStepDigest() Service

Next, we built a dedicated service, src/server/services/step-digest.ts, responsible for creating these summaries. We leveraged claude-haiku-4-5-20251001 for its balance of capability and cost-effectiveness.

typescript
// src/server/services/step-digest.ts
import { resolveProvider } from './llm-provider';

export async function generateStepDigest(
  tenantId: string,
  stepOutput: string,
  maxTokens = 1024
): Promise<string | null> {
  if (!stepOutput || stepOutput.length < 2000) { // Skip very short outputs
    return null;
  }

  const provider = resolveProvider("anthropic", tenantId);
  try {
    const response = await provider.chat.completions.create({
      model: "claude-haiku-4-5-20251001",
      max_tokens: maxTokens,
      messages: [
        { role: "system", content: "You are a helpful assistant tasked with summarizing technical workflow step outputs concisely. Focus on key decisions, outcomes, and actionable information. Omit conversational filler." },
        { role: "user", content: `Please provide a concise digest of the following workflow step output:\n\n${stepOutput}` },
      ],
    });
    return response.choices[0]?.message?.content || null;
  } catch (error) {
    console.error("Failed to generate step digest, falling back to truncation:", error);
    // Fallback: simple truncation if LLM fails
    return stepOutput.substring(0, maxTokens * 4); // ~4 chars per token
  }
}

This service ensures that only sufficiently large outputs are summarized, and it includes a robust fallback to simple truncation in case the LLM call fails, preventing workflow stalls.

Integrating into the Workflow Engine

The WorkflowEngine was the heart of the integration. We updated its ChainContext to hold a map of stepDigests and introduced a new projectWisdom variable (more on this later).

Key changes in src/server/services/workflow-engine.ts:

  • Context Population: When a workflow resumes, buildChainContext() now populates stepDigests from the digest fields of previously completed steps.
  • Post-Completion Digest Generation: After a step successfully completes, generateStepDigest() is called. The result is stored in the database and updated in the ChainContext.
  • Prompt Templating: Our resolvePrompt() function gained the ability to resolve {{steps.Label.digest}}. This means our prompt templates could now explicitly request the summarized version of a previous step's output, dramatically cutting down verbosity. A fallback to the full (truncated) content is also available if no digest exists.

For example, a prompt template might now look like this:

jinja2
You are tasked with extending and improving a feature based on previous work.

Previous Features Added:
{{steps.deepFeatures.digest}}

Research Conducted:
{{steps.deepResearch.digest}}

Your Goal: ...

This simple change allowed us to target the most verbose steps in our "Deep Build Pipeline" (deepFeatures, deepReview1, deepExtend, deepWisdom, deepImprove) and switch their context injections to use .digest, promising a significant reduction in prompt size.

Phase 2: Project-Centric Intelligence

While step digests addressed the immediate problem of sequential context bloat, we identified another opportunity for smarter context management: project-level knowledge. Workflows often belong to a larger project, and that project might have its own "wisdom" – consolidated patterns, code analysis, or architectural guidelines that should inform all related AI tasks.

Extending the Schema for Project Relations

We expanded our prisma/schema.prisma to link Workflow and Repository models directly to a Project:

typescript
// prisma/schema.prisma
model Project {
  id          String       @id @default(uuid())
  name        String
  // ... other project fields
  workflows   Workflow[] // Reverse relation
  repositories Repository[] // Reverse relation
}

model Workflow {
  id        String    @id @default(uuid())
  projectId String?   @db.Uuid // ✨ New: Link to a Project
  project   Project?  @relation(fields: [projectId], references: [id])
  // ... other workflow fields
}

model Repository {
  id        String    @id @default(uuid())
  projectId String?   @db.Uuid // ✨ New: Link to a Project
  project   Project?  @relation(fields: [projectId], references: [id])
  // ... other repository fields
}

This allowed us to establish clear ownership and relationships between our core entities and a Project.

API and UI Integration

Our tRPC routers (src/server/trpc/routers/workflows.ts) were updated to support projectId in create, update, and duplicate mutations, ensuring that workflows could be correctly associated. We also added a byProject query to easily filter workflows.

On the UI front, the dashboard (src/app/(dashboard)/dashboard/workflows/[id]/page.tsx) now features a project selector dropdown. This allows users to link a workflow to an existing project, immediately unlocking the {{project.wisdom}} variable. A helpful hint explains its auto-injection, guiding users to leverage this new capability.

loadProjectWisdom(): The Brains Behind the Context

The src/server/services/workflow-engine.ts gained a new function: loadProjectWisdom(). This function is responsible for gathering all relevant project-level context. This could include:

  • Consolidation patterns: High-level architectural decisions or common solutions defined for the project.
  • Code analysis patterns: Insights derived from linked repositories, such as preferred coding styles, common anti-patterns, or specific library usage.

This aggregated "wisdom" is then made available via the {{project.wisdom}} template variable, providing a consistent, high-level context to any AI-driven task within that project.

Infrastructure & The Unsung Heroes

Behind the scenes, a quick npm run db:push && npm run db:generate ensured our database schema was synced and the Prisma client regenerated, making all these new fields and relations immediately available to our application layer.

Lessons Learned (and Smooth Sailing)

One of the most satisfying aspects of this development session was the relative lack of major issues. The pain log was surprisingly sparse, noting only a pre-existing, unrelated type error. This speaks volumes about:

  • Clear Planning: Breaking down the problem into digestible phases (step digests, project context) allowed for focused execution.
  • Modular Design: Our existing WorkflowEngine and resolvePrompt architecture proved flexible enough to integrate these significant changes with minimal friction.
  • Robust Tooling: Prisma's schema migration capabilities made database changes straightforward and reliable.

It's a testament to the power of a well-structured codebase when complex features can be added with confidence and without introducing new regressions.

The Road Ahead

With the core changes committed (c7716f3), the immediate next steps are all about validation and further enhancement:

  1. Push to origin: Get these changes out there.
  2. Verify Digest Generation: Run a deep pipeline workflow and inspect the digest field in Prisma Studio to ensure summaries are being correctly generated and stored.
  3. Measure Prompt Reduction: Crucially, we need to compare the resolved prompt size for deep pipeline step 9 (one of the most verbose) before and after these changes. Our target is a >60% reduction – a significant win for efficiency and cost.
  4. Test {{project.wisdom}}: Link a project with consolidation data to a workflow and verify that the project.wisdom variable is correctly injected and influences the AI's output.
  5. RLS Policies: Consider adding Row Level Security (RLS) policies for the new projectId columns if cross-tenant data access becomes a concern, though current nullable fields follow existing patterns.
  6. Minor Fix: Tackle that pre-existing type error in discussions/[id]/page.tsx:139 – a small but necessary cleanup.

This session marks a significant step forward in making our AI-powered workflows smarter, more efficient, and ultimately, more capable of handling the complexities of modern software development. By taming the prompt cascade, we're ensuring our AI agents can focus on what truly matters: building great software.

json
{
  "thingsDone": [
    "Added `digest` field to `WorkflowStep` model",
    "Created `generateStepDigest()` service using Claude Haiku",
    "Integrated step digest generation and retrieval into `WorkflowEngine`",
    "Implemented `{{steps.Label.digest}}` prompt template resolution",
    "Updated deep pipeline steps to use digest context",
    "Added `projectId` relations to `Workflow` and `Repository` models",
    "Updated tRPC `workflows` router for project association",
    "Implemented `loadProjectWisdom()` for project-level context",
    "Added `{{project.wisdom}}` prompt template variable",
    "Integrated project selector into dashboard UI",
    "Ran Prisma schema sync and client regeneration"
  ],
  "pains": [
    "No major issues encountered related to this work; only a pre-existing, unrelated type error"
  ],
  "successes": [
    "Successful implementation of context compression strategy",
    "Seamless integration into existing workflow engine architecture",
    "Enhanced project-centric intelligence for AI workflows",
    "Anticipated significant reduction in LLM prompt size and cost",
    "Smooth schema evolution with Prisma"
  ],
  "techStack": [
    "TypeScript",
    "Prisma",
    "tRPC",
    "Next.js",
    "Claude Haiku (Anthropic LLM)",
    "PostgreSQL",
    "Git"
  ]
}