Beyond the Context Window: Scaling LLM Workflows with Fan-Out Execution
Tired of LLM outputs getting truncated or losing quality when tackling complex, multi-feature tasks? We dive into how we implemented 'fan-out execution' to give each sub-task its own dedicated LLM call, ensuring higher quality and reliability, and share a tricky Prisma gotcha along the way.
We've all been there. You're building an incredible AI-powered feature, dreaming of the LLM handling complex, multi-faceted tasks in one glorious swoop. You craft a meticulously detailed prompt, hit send, and... the output is good, but somehow, it feels incomplete. Key details are missing, sections are truncated, or the quality just isn't what you hoped for. The culprit? The dreaded LLM context window.
Our goal was ambitious: allow our AI workflow engine to generate comprehensive implementation plans for multiple MVP features. Initially, we crammed all feature prompts into a single 16k-token LLM call. The result was predictable: valuable context was lost, and the generated plans were often superficial. We needed a better way to scale our LLM interactions, ensuring each feature received the dedicated attention it deserved.
Enter Fan-Out Step Execution.
The Problem: The Context Window Chokehold
Imagine you have a document with ten distinct sections, and you want an LLM to generate a detailed summary or action plan for each one. If you feed the entire document and ask for a combined output, the LLM often prioritizes the beginning, summarizes too broadly, or simply runs out of tokens before it can do justice to every section. It's like asking a chef to cook a ten-course meal with only one pot and limited ingredients – something's going to suffer.
Our "Deep Build Pipeline" workflow suffered from this exact issue. When generating "Implementation Prompts" for multiple features, a single LLM call couldn't handle the depth required for each. We needed to break down the monolithic task into smaller, manageable chunks, giving each its own dedicated LLM call.
Our Solution: Orchestrating Parallel LLM Excellence
The core idea behind fan-out execution is simple: split a large task into smaller sub-tasks, process each sub-task with its own LLM call, and then intelligently combine the results. This approach dramatically increases output quality, reduces truncation, and makes the entire process more robust.
Here's how we brought fan-out to life across our stack:
1. The Section Splitter: Dividing and Conquering
First, we needed a reliable way to break down a large input into individual sections. We created src/server/services/section-splitter.ts with a splitSections() utility. This function uses a configurable regex pattern to identify and extract distinct sections (e.g., ###\s+\d+\. Some Heading).
// src/server/services/section-splitter.ts (simplified)
export function splitSections(content: string, pattern: RegExp): string[] {
const sections: string[] = [];
try {
const matches = [...content.matchAll(pattern)];
// Add content before the first match if it exists
if (matches.length > 0 && matches[0].index > 0) {
sections.push(content.substring(0, matches[0].index).trim());
}
for (let i = 0; i < matches.length; i++) {
const start = matches[i].index;
const end = (i + 1 < matches.length) ? matches[i+1].index : content.length;
sections.push(content.substring(start, end).trim());
if (sections.length > 200) { // Safety cap
console.warn("Too many sections, capping at 200.");
break;
}
}
} catch (error) {
console.error("Error splitting sections:", error);
return [content]; // Fallback to original content
}
return sections.filter(s => s.length > 0);
}
This utility is robust, including a regex safety net (try/catch) and a match cap to prevent runaway processing on malformed input.
2. The Workflow Engine: The Orchestrator's Upgrade
The src/server/services/workflow-engine.ts is the heart of our system, and it received a significant overhaul to handle fan-out logic:
- New Types & State: We introduced
FanOutConfigandSubOutputtypes to manage the configuration for splitting and to store the results of each sub-LLM call. - Eventing for Real-time Feedback: New
fan_out_progressandfan_out_doneevent types onWorkflowEventallow our frontend to display real-time progress. - Dynamic Prompting: The
ChainContextnow includesfanOutSectionandfanOutHeadingfields. OurresolvePrompt()function can now use new template variables like{{fanOut.section}}and{{fanOut.heading}}to inject specific section content into each sub-LLM call. We also added{{steps.Label.sections}}and{{steps.Label.section[N].content}}to allow prompts to reference all sections or a specific one from a previous fan-out step, enabling sophisticated multi-stage processing. - The Fan-Out Execution Branch: The
runWorkflow()function gained a dedicated branch for fan-out steps. This orchestrates:- Iterating through each section.
- Making an individual LLM call for each section.
- Implementing retry logic for failed sub-calls.
- Enabling resume functionality: if a workflow is interrupted mid-fan-out, it picks up exactly where it left off, even verifying heading consistency to prevent stale data issues.
- Combining all individual LLM outputs into a single, comprehensive digest for the overall step.
3. Persistent State: Database Schema Evolution
To store the configuration and results of fan-out steps, we added two new nullable JSON columns to our WorkflowStep model in prisma/schema.prisma:
// prisma/schema.prisma
model WorkflowStep {
// ... existing fields
fanOutConfig Json?
subOutputs Json?
// ... other fields
}
fanOutConfig stores the regex pattern and other fan-out specific settings, while subOutputs holds the array of results from each individual LLM call.
4. Configuring the Fan-Out: Tailoring Prompts
We updated src/lib/constants.ts to define which StepTemplates should utilize fan-out, along with their specific regex patterns and maxTokens for the individual LLM calls. For instance:
// src/lib/constants.ts (simplified)
export const deepPrompt: StepTemplate = {
// ...
fanOutConfig: {
pattern: "###\\s+\\d+\\.", // Example: Matches "### 1. Heading"
maxTokens: 8192,
},
// ...
};
export const extensionPrompt: StepTemplate = { /* ... similar config ... */ };
export const secPrompts: StepTemplate = { /* ... similar config ... */ };
These configurations are propagated through our src/server/trpc/routers/workflows.ts router, ensuring fanOutConfig is handled correctly during create, duplicate, and steps.add mutations. The retry mutation also now intelligently resets subOutputs, digest, and checkpoint for fan-out steps, ensuring a clean restart.
5. A Glimpse into the Future: The Dashboard Experience
A powerful backend needs an equally powerful frontend. Our dashboard received significant updates in src/app/(dashboard)/dashboard/workflows/[id]/page.tsx:
- Real-time Progress: An SSE (Server-Sent Events) handler captures
fan_out_progressandfan_out_doneevents, updating anactiveFanOutTabandfanOutProgressstate. This powers a progress bar that shows the current section being processed (e.g., "Processing section 5 of 10"). - Tabbed Sub-Output Viewer: Once a fan-out step completes, users can explore the individual LLM outputs in a sleek, horizontal scrollable tab interface. Each tab represents one section's output, complete with its own download/copy options and token/cost metadata.
- Visual Cues: A "fan-out (N)" badge on step headers instantly tells users which steps are leveraging this powerful new capability.
This combination of backend orchestration and frontend visibility transforms a complex, multi-LLM interaction into an intuitive user experience.
Lessons Learned: The Nuances of Prisma's Json? Fields
No major feature goes live without a few bumps in the road. Our biggest "gotcha" during this implementation revolved around Prisma's Json? fields.
The Challenge:
When trying to set fanOutConfig to null directly in a Prisma create data object, we hit a type error:
Type 'null' is not assignable to type 'NullableJsonNullValueInput | InputJsonValue | undefined'
This is a common pitfall. Prisma's Json? (nullable JSON) fields don't accept raw JavaScript null in the same way regular nullable fields do. They expect either:
undefined(to omit the field entirely, letting the database default toNULL)Prisma.JsonNull(to explicitly set the JSON column toNULL)- A valid JSON object
The Workaround: Our solution was to use a conditional spread operator:
// Instead of:
// const data = {
// fanOutConfig: step.fanOutConfig || null, // This causes the error
// // ...
// };
// We used:
const data = {
// ... other fields
...(step.fanOutConfig ? { fanOutConfig: step.fanOutConfig } : {}),
// ...
};
This pattern ensures that fanOutConfig is only included in the data object if step.fanOutConfig is truthy (i.e., not null or undefined). If it's null or undefined, the field is omitted, and Prisma correctly lets the database default (NULL) apply.
This is a recurring Prisma gotcha we've even documented internally. Always remember to use Prisma.JsonNull for explicit NULLs or a conditional spread for optional Json? fields!
What's Next?
With the core fan-out functionality implemented and deployed, our immediate next steps include:
- End-to-End Testing: Rigorous testing of a full "Deep Build Pipeline" workflow to verify all 9 steps, especially ensuring the Implementation Prompts step correctly generates individual prompts per feature.
- Resume Verification: Confirming that killing a workflow mid-fan-out and restarting it correctly resumes from the last completed section.
- User Configurability: Considering adding
fanOutConfigto thesteps.updatemutation so users can edit fan-out configurations on existing steps directly from the UI. - Robust Splitter Tests: Adding comprehensive unit tests for
splitSections()to cover edge cases like empty input, no matches, content before the first heading, and overlapping headings. - Cost Estimation: Enhancing
estimateWorkflowCost()to account for the fan-out multiplier, providing more accurate cost projections for steps that involve multiple LLM calls.
Conclusion
Implementing fan-out step execution has been a game-changer for our AI workflow engine. By strategically breaking down complex tasks into dedicated LLM calls, we've dramatically improved the quality, reliability, and comprehensiveness of our AI-generated outputs. This not only solves the immediate problem of context window limitations but also unlocks the potential for our platform to tackle even more sophisticated, multi-faceted AI-driven challenges in the future.
We're excited to see the higher-quality results this enables for our users and continue pushing the boundaries of what our AI workflows can achieve!