Beyond the Context Window: Scaling LLM Workflows with Fan-Out Execution

We've all been there. You're building an incredible AI-powered feature, dreaming of the LLM handling complex, multi-faceted tasks in one glorious swoop. You craft a meticulously detailed prompt, hit send, and... the output is good, but somehow, it feels incomplete. Key details are missing, sections are truncated, or the quality just isn't what you hoped for. The culprit? The dreaded LLM context window.

Our goal was ambitious: allow our AI workflow engine to generate comprehensive implementation plans for multiple MVP features. Initially, we crammed all feature prompts into a single 16k-token LLM call. The result was predictable: valuable context was lost, and the generated plans were often superficial. We needed a better way to scale our LLM interactions, ensuring each feature received the dedicated attention it deserved.

Enter Fan-Out Step Execution.

The Problem: The Context Window Chokehold

Imagine you have a document with ten distinct sections, and you want an LLM to generate a detailed summary or action plan for each one. If you feed the entire document and ask for a combined output, the LLM often prioritizes the beginning, summarizes too broadly, or simply runs out of tokens before it can do justice to every section. It's like asking a chef to cook a ten-course meal with only one pot and limited ingredients – something's going to suffer.

Our "Deep Build Pipeline" workflow suffered from this exact issue. When generating "Implementation Prompts" for multiple features, a single LLM call couldn't handle the depth required for each. We needed to break down the monolithic task into smaller, manageable chunks, giving each its own dedicated LLM call.

Our Solution: Orchestrating Parallel LLM Excellence

The core idea behind fan-out execution is simple: split a large task into smaller sub-tasks, process each sub-task with its own LLM call, and then intelligently combine the results. This approach dramatically increases output quality, reduces truncation, and makes the entire process more robust.

Here's how we brought fan-out to life across our stack:

1. The Section Splitter: Dividing and Conquering

First, we needed a reliable way to break down a large input into individual sections. We created src/server/services/section-splitter.ts with a splitSections() utility. This function uses a configurable regex pattern to identify and extract distinct sections (e.g., ###\s+\d+\. Some Heading).

typescript

// src/server/services/section-splitter.ts (simplified)
export function splitSections(content: string, pattern: RegExp): string[] {
  const sections: string[] = [];
  try {
    const matches = [...content.matchAll(pattern)];
    // Add content before the first match if it exists
    if (matches.length > 0 && matches[0].index > 0) {
      sections.push(content.substring(0, matches[0].index).trim());
    }

    for (let i = 0; i < matches.length; i++) {
      const start = matches[i].index;
      const end = (i + 1 < matches.length) ? matches[i+1].index : content.length;
      sections.push(content.substring(start, end).trim());
      if (sections.length > 200) { // Safety cap
        console.warn("Too many sections, capping at 200.");
        break;
      }
    }
  } catch (error) {
    console.error("Error splitting sections:", error);
    return [content]; // Fallback to original content
  }
  return sections.filter(s => s.length > 0);
}

This utility is robust, including a regex safety net (try/catch) and a match cap to prevent runaway processing on malformed input.

2. The Workflow Engine: The Orchestrator's Upgrade

The src/server/services/workflow-engine.ts is the heart of our system, and it received a significant overhaul to handle fan-out logic:

New Types & State: We introduced FanOutConfig and SubOutput types to manage the configuration for splitting and to store the results of each sub-LLM call.
Eventing for Real-time Feedback: New fan_out_progress and fan_out_done event types on WorkflowEvent allow our frontend to display real-time progress.
Dynamic Prompting: The ChainContext now includes fanOutSection and fanOutHeading fields. Our resolvePrompt() function can now use new template variables like {{fanOut.section}} and {{fanOut.heading}} to inject specific section content into each sub-LLM call. We also added {{steps.Label.sections}} and {{steps.Label.section[N].content}} to allow prompts to reference all sections or a specific one from a previous fan-out step, enabling sophisticated multi-stage processing.
The Fan-Out Execution Branch: The runWorkflow() function gained a dedicated branch for fan-out steps. This orchestrates:
- Iterating through each section.
- Making an individual LLM call for each section.
- Implementing retry logic for failed sub-calls.
- Enabling resume functionality: if a workflow is interrupted mid-fan-out, it picks up exactly where it left off, even verifying heading consistency to prevent stale data issues.
- Combining all individual LLM outputs into a single, comprehensive digest for the overall step.

3. Persistent State: Database Schema Evolution

To store the configuration and results of fan-out steps, we added two new nullable JSON columns to our WorkflowStep model in prisma/schema.prisma:

prisma

// prisma/schema.prisma
model WorkflowStep {
  // ... existing fields
  fanOutConfig Json?
  subOutputs   Json?
  // ... other fields
}

fanOutConfig stores the regex pattern and other fan-out specific settings, while subOutputs holds the array of results from each individual LLM call.

4. Configuring the Fan-Out: Tailoring Prompts

We updated src/lib/constants.ts to define which StepTemplates should utilize fan-out, along with their specific regex patterns and maxTokens for the individual LLM calls. For instance:

typescript

// src/lib/constants.ts (simplified)
export const deepPrompt: StepTemplate = {
  // ...
  fanOutConfig: {
    pattern: "###\\s+\\d+\\.", // Example: Matches "### 1. Heading"
    maxTokens: 8192,
  },
  // ...
};

export const extensionPrompt: StepTemplate = { /* ... similar config ... */ };
export const secPrompts: StepTemplate = { /* ... similar config ... */ };

These configurations are propagated through our src/server/trpc/routers/workflows.ts router, ensuring fanOutConfig is handled correctly during create, duplicate, and steps.add mutations. The retry mutation also now intelligently resets subOutputs, digest, and checkpoint for fan-out steps, ensuring a clean restart.

5. A Glimpse into the Future: The Dashboard Experience

A powerful backend needs an equally powerful frontend. Our dashboard received significant updates in src/app/(dashboard)/dashboard/workflows/[id]/page.tsx:

Real-time Progress: An SSE (Server-Sent Events) handler captures fan_out_progress and fan_out_done events, updating an activeFanOutTab and fanOutProgress state. This powers a progress bar that shows the current section being processed (e.g., "Processing section 5 of 10").
Tabbed Sub-Output Viewer: Once a fan-out step completes, users can explore the individual LLM outputs in a sleek, horizontal scrollable tab interface. Each tab represents one section's output, complete with its own download/copy options and token/cost metadata.
Visual Cues: A "fan-out (N)" badge on step headers instantly tells users which steps are leveraging this powerful new capability.

This combination of backend orchestration and frontend visibility transforms a complex, multi-LLM interaction into an intuitive user experience.

Lessons Learned: The Nuances of Prisma's `Json?` Fields

No major feature goes live without a few bumps in the road. Our biggest "gotcha" during this implementation revolved around Prisma's Json? fields.

The Challenge: When trying to set fanOutConfig to null directly in a Prisma create data object, we hit a type error: Type 'null' is not assignable to type 'NullableJsonNullValueInput | InputJsonValue | undefined'

This is a common pitfall. Prisma's Json? (nullable JSON) fields don't accept raw JavaScript null in the same way regular nullable fields do. They expect either:

undefined (to omit the field entirely, letting the database default to NULL)
Prisma.JsonNull (to explicitly set the JSON column to NULL)
A valid JSON object

The Workaround: Our solution was to use a conditional spread operator:

typescript

// Instead of:
// const data = {
//   fanOutConfig: step.fanOutConfig || null, // This causes the error
//   // ...
// };

// We used:
const data = {
  // ... other fields
  ...(step.fanOutConfig ? { fanOutConfig: step.fanOutConfig } : {}),
  // ...
};

This pattern ensures that fanOutConfig is only included in the data object if step.fanOutConfig is truthy (i.e., not null or undefined). If it's null or undefined, the field is omitted, and Prisma correctly lets the database default (NULL) apply.

This is a recurring Prisma gotcha we've even documented internally. Always remember to use Prisma.JsonNull for explicit NULLs or a conditional spread for optional Json? fields!

What's Next?

With the core fan-out functionality implemented and deployed, our immediate next steps include:

End-to-End Testing: Rigorous testing of a full "Deep Build Pipeline" workflow to verify all 9 steps, especially ensuring the Implementation Prompts step correctly generates individual prompts per feature.
Resume Verification: Confirming that killing a workflow mid-fan-out and restarting it correctly resumes from the last completed section.
User Configurability: Considering adding fanOutConfig to the steps.update mutation so users can edit fan-out configurations on existing steps directly from the UI.
Robust Splitter Tests: Adding comprehensive unit tests for splitSections() to cover edge cases like empty input, no matches, content before the first heading, and overlapping headings.
Cost Estimation: Enhancing estimateWorkflowCost() to account for the fan-out multiplier, providing more accurate cost projections for steps that involve multiple LLM calls.

Conclusion

Implementing fan-out step execution has been a game-changer for our AI workflow engine. By strategically breaking down complex tasks into dedicated LLM calls, we've dramatically improved the quality, reliability, and comprehensiveness of our AI-generated outputs. This not only solves the immediate problem of context window limitations but also unlocks the potential for our platform to tackle even more sophisticated, multi-faceted AI-driven challenges in the future.

We're excited to see the higher-quality results this enables for our users and continue pushing the boundaries of what our AI workflows can achieve!

The Problem: The Context Window Chokehold

Our Solution: Orchestrating Parallel LLM Excellence

1. The Section Splitter: Dividing and Conquering

2. The Workflow Engine: The Orchestrator's Upgrade

3. Persistent State: Database Schema Evolution

4. Configuring the Fan-Out: Tailoring Prompts

5. A Glimpse into the Future: The Dashboard Experience

Lessons Learned: The Nuances of Prisma's Json? Fields

What's Next?

Conclusion

Lessons Learned: The Nuances of Prisma's `Json?` Fields