Building Fan-Out Execution: When One LLM Call Isn't Enough
How we solved the token limit problem by implementing fan-out execution, allowing complex workflows to split large tasks into focused, parallel LLM calls.
Building Fan-Out Execution: When One LLM Call Isn't Enough
Have you ever hit that frustrating moment where your carefully crafted LLM workflow tries to cram too much into a single API call? You know the feeling—your 16k token limit gets maxed out, your prompt gets truncated, and suddenly your AI assistant is trying to implement 15 different features in one go, producing generic, unhelpful output.
That's exactly the problem we faced with our workflow automation system. Our "Implementation Prompts" step was attempting to generate detailed implementation guidance for every MVP feature in a single call, resulting in shallow, truncated responses that weren't actionable for developers.
The solution? Fan-out execution—a pattern that automatically splits large tasks into focused, individual LLM calls, then intelligently combines the results.
The Problem: Token Limits vs. Quality Output
Picture this scenario: you're building a workflow that takes a product specification and generates detailed implementation prompts for each feature. Your input might include:
- User authentication system
- Real-time chat functionality
- File upload and storage
- Payment processing integration
- Admin dashboard
- Email notifications
- Mobile responsive design
Trying to generate quality implementation guidance for all seven features in a single 16k token call is like asking someone to write detailed instructions for building a house, a car, and a rocket ship all in the same breath. You'll get surface-level advice that doesn't help anyone actually build anything.
The Fan-Out Solution
Fan-out execution works by:
- Splitting: Automatically dividing large content into logical sections
- Processing: Making individual LLM calls for each section with focused prompts
- Combining: Merging the results back into a cohesive output
- Resuming: Handling interruptions gracefully by continuing from the last completed section
Here's how we implemented it:
Database Schema Changes
First, we extended our workflow step model to support fan-out configuration:
-- Added to WorkflowStep table
fanOutConfig Json? -- Configuration for how to split content
subOutputs Json? -- Individual results from each section
The fanOutConfig defines the splitting pattern, while subOutputs stores the individual results from each fan-out call.
Smart Content Splitting
We built a robust section splitter that uses regex patterns to identify logical breakpoints:
// src/server/services/section-splitter.ts
export function splitSections(content: string, pattern: string): string[] {
try {
const regex = new RegExp(pattern, 'gim');
const matches = Array.from(content.matchAll(regex)).slice(0, 200); // Safety cap
if (matches.length === 0) return [content];
// Split content at each match boundary
const sections = [];
let lastIndex = 0;
for (const match of matches) {
if (match.index && match.index > lastIndex) {
sections.push(content.slice(lastIndex, match.index).trim());
}
lastIndex = match.index || 0;
}
// Don't forget the final section
if (lastIndex < content.length) {
sections.push(content.slice(lastIndex).trim());
}
return sections.filter(section => section.length > 0);
} catch (error) {
console.warn('Regex splitting failed, falling back to single section:', error);
return [content];
}
}
Template Variables for Context
We introduced new template variables that give each fan-out call the context it needs:
{{fanOut.section}}- The current section content{{fanOut.heading}}- The section heading for context{{steps.Label.sections}}- Total number of sections{{steps.Label.section[N].content}}- Access to specific section results
Real-World Configuration Examples
Here are the fan-out configurations we implemented for different workflow steps:
// For implementation prompts - split on numbered features
deepPrompt: {
fanOutConfig: {
splitPattern: "###\\s+\\d+\\.", // Matches "### 1.", "### 2.", etc.
maxTokens: 8192
}
}
// For security remediation - split on priority levels
secPrompts: {
fanOutConfig: {
splitPattern: "##\\s+(Critical|High|Medium|Low)",
maxTokens: 8192
}
}
Progress Tracking and Resume Capability
One of the most important aspects of fan-out execution is handling interruptions gracefully. Our implementation includes:
- Real-time progress tracking via Server-Sent Events
- Automatic resume from the last completed section
- Heading consistency checks to ensure context continuity
- Retry logic for individual sections without restarting the entire process
The UI shows a progress bar during fan-out execution:
Processing sections: 3/7 complete
[████████████░░░░░░░░░░░░] 43%
Tabbed Results Interface
Instead of one massive output blob, users now get a clean tabbed interface where they can:
- Browse individual section results
- Download or copy specific sections
- See token usage and cost per section
- Navigate with keyboard shortcuts (prev/next)
Lessons Learned: The Prisma Json Field Gotcha
One technical challenge worth sharing: Prisma's handling of nullable JSON fields can be tricky. We initially tried:
// ❌ This fails with TypeScript error
const data = {
...stepData,
fanOutConfig: null // Type error: null not assignable to Json field
}
The solution is using conditional spreading:
// ✅ This works correctly
const data = {
...stepData,
...(step.fanOutConfig ? { fanOutConfig: step.fanOutConfig } : {})
}
When you omit the field entirely, Prisma uses the column default (null). When you need to explicitly set null, use Prisma.JsonNull.
Results: Quality at Scale
The impact was immediate and dramatic. Instead of getting generic advice like:
"Implement user authentication using a standard library..."
We now get focused, actionable guidance:
"For the user authentication system, implement JWT-based authentication with refresh tokens. Create these specific components: AuthProvider context (handles login state), ProtectedRoute wrapper (guards private pages), LoginForm with email validation, and TokenRefreshService (handles automatic renewal). Consider using bcrypt for password hashing and implement rate limiting on login attempts..."
Each section gets the full attention it deserves, with detailed implementation steps, code examples, and security considerations.
What's Next
Fan-out execution opens up new possibilities:
- Parallel processing for independent sections
- Cost optimization by using different models for different section types
- Quality metrics per section to identify which prompts work best
- User customization of splitting patterns for domain-specific workflows
The pattern isn't limited to our specific use case—any time you're processing large, structured content with LLMs, fan-out execution can help you break through token limits while maintaining output quality.
Sometimes the best solution isn't a bigger hammer—it's knowing when to use multiple focused tools instead of one blunt instrument.
Want to dive deeper into workflow automation patterns? Check out our other posts on LLM prompt engineering and building resilient AI workflows.