Building Fan-Out Execution: When One LLM Call Isn't Enough

Have you ever hit that frustrating moment where your carefully crafted LLM workflow tries to cram too much into a single API call? You know the feeling—your 16k token limit gets maxed out, your prompt gets truncated, and suddenly your AI assistant is trying to implement 15 different features in one go, producing generic, unhelpful output.

That's exactly the problem we faced with our workflow automation system. Our "Implementation Prompts" step was attempting to generate detailed implementation guidance for every MVP feature in a single call, resulting in shallow, truncated responses that weren't actionable for developers.

The solution? Fan-out execution—a pattern that automatically splits large tasks into focused, individual LLM calls, then intelligently combines the results.

The Problem: Token Limits vs. Quality Output

Picture this scenario: you're building a workflow that takes a product specification and generates detailed implementation prompts for each feature. Your input might include:

User authentication system
Real-time chat functionality
File upload and storage
Payment processing integration
Admin dashboard
Email notifications
Mobile responsive design

Trying to generate quality implementation guidance for all seven features in a single 16k token call is like asking someone to write detailed instructions for building a house, a car, and a rocket ship all in the same breath. You'll get surface-level advice that doesn't help anyone actually build anything.

The Fan-Out Solution

Fan-out execution works by:

Splitting: Automatically dividing large content into logical sections
Processing: Making individual LLM calls for each section with focused prompts
Combining: Merging the results back into a cohesive output
Resuming: Handling interruptions gracefully by continuing from the last completed section

Here's how we implemented it:

Database Schema Changes

First, we extended our workflow step model to support fan-out configuration:

sql

-- Added to WorkflowStep table
fanOutConfig Json?    -- Configuration for how to split content
subOutputs Json?      -- Individual results from each section

The fanOutConfig defines the splitting pattern, while subOutputs stores the individual results from each fan-out call.

Smart Content Splitting

We built a robust section splitter that uses regex patterns to identify logical breakpoints:

typescript

// src/server/services/section-splitter.ts
export function splitSections(content: string, pattern: string): string[] {
  try {
    const regex = new RegExp(pattern, 'gim');
    const matches = Array.from(content.matchAll(regex)).slice(0, 200); // Safety cap
    
    if (matches.length === 0) return [content];
    
    // Split content at each match boundary
    const sections = [];
    let lastIndex = 0;
    
    for (const match of matches) {
      if (match.index && match.index > lastIndex) {
        sections.push(content.slice(lastIndex, match.index).trim());
      }
      lastIndex = match.index || 0;
    }
    
    // Don't forget the final section
    if (lastIndex < content.length) {
      sections.push(content.slice(lastIndex).trim());
    }
    
    return sections.filter(section => section.length > 0);
  } catch (error) {
    console.warn('Regex splitting failed, falling back to single section:', error);
    return [content];
  }
}

Template Variables for Context

We introduced new template variables that give each fan-out call the context it needs:

{{fanOut.section}} - The current section content
{{fanOut.heading}} - The section heading for context
{{steps.Label.sections}} - Total number of sections
{{steps.Label.section[N].content}} - Access to specific section results

Real-World Configuration Examples

Here are the fan-out configurations we implemented for different workflow steps:

typescript

// For implementation prompts - split on numbered features
deepPrompt: {
  fanOutConfig: {
    splitPattern: "###\\s+\\d+\\.",  // Matches "### 1.", "### 2.", etc.
    maxTokens: 8192
  }
}

// For security remediation - split on priority levels  
secPrompts: {
  fanOutConfig: {
    splitPattern: "##\\s+(Critical|High|Medium|Low)",
    maxTokens: 8192
  }
}

Progress Tracking and Resume Capability

One of the most important aspects of fan-out execution is handling interruptions gracefully. Our implementation includes:

Real-time progress tracking via Server-Sent Events
Automatic resume from the last completed section
Heading consistency checks to ensure context continuity
Retry logic for individual sections without restarting the entire process

The UI shows a progress bar during fan-out execution:

Processing sections: 3/7 complete
[████████████░░░░░░░░░░░░] 43%

Tabbed Results Interface

Instead of one massive output blob, users now get a clean tabbed interface where they can:

Browse individual section results
Download or copy specific sections
See token usage and cost per section
Navigate with keyboard shortcuts (prev/next)

Lessons Learned: The Prisma Json Field Gotcha

One technical challenge worth sharing: Prisma's handling of nullable JSON fields can be tricky. We initially tried:

typescript

// ❌ This fails with TypeScript error
const data = {
  ...stepData,
  fanOutConfig: null  // Type error: null not assignable to Json field
}

The solution is using conditional spreading:

typescript

// ✅ This works correctly
const data = {
  ...stepData,
  ...(step.fanOutConfig ? { fanOutConfig: step.fanOutConfig } : {})
}

When you omit the field entirely, Prisma uses the column default (null). When you need to explicitly set null, use Prisma.JsonNull.

Results: Quality at Scale

The impact was immediate and dramatic. Instead of getting generic advice like:

"Implement user authentication using a standard library..."

We now get focused, actionable guidance:

"For the user authentication system, implement JWT-based authentication with refresh tokens. Create these specific components: AuthProvider context (handles login state), ProtectedRoute wrapper (guards private pages), LoginForm with email validation, and TokenRefreshService (handles automatic renewal). Consider using bcrypt for password hashing and implement rate limiting on login attempts..."

Each section gets the full attention it deserves, with detailed implementation steps, code examples, and security considerations.

What's Next

Fan-out execution opens up new possibilities:

Parallel processing for independent sections
Cost optimization by using different models for different section types
Quality metrics per section to identify which prompts work best
User customization of splitting patterns for domain-specific workflows

The pattern isn't limited to our specific use case—any time you're processing large, structured content with LLMs, fan-out execution can help you break through token limits while maintaining output quality.

Sometimes the best solution isn't a bigger hammer—it's knowing when to use multiple focused tools instead of one blunt instrument.

Want to dive deeper into workflow automation patterns? Check out our other posts on LLM prompt engineering and building resilient AI workflows.