Breaking the Token Barrier: How We Fanned Out LLM Workflows for Smarter AI Agents

When you're building intelligent agents powered by Large Language Models, you quickly run into a familiar wall: token limits. It's a constant dance between providing enough context for quality output and staying within the LLM's window. For our internal workflow engine, this challenge became particularly acute when trying to implement complex features. We were cramming multiple distinct implementation prompts into a single, often truncated, 16k-token LLM call. The result? Subpar output, incomplete features, and a lot of frustration.

Our goal was clear: ditch the monolithic prompt and give each MVP feature its own dedicated LLM call. This led us down the path of implementing "fan-out" step execution, and I'm excited to share how we tackled it.

The Problem: One Prompt to Rule Them All (and Fail)

Imagine you have an AI agent tasked with generating an implementation plan for several distinct features. If you feed all features into one giant prompt, the LLM might:

Truncate: Hit the token limit and simply cut off the later features.
Dilute Focus: Struggle to give adequate attention to each feature, leading to generic or incomplete suggestions.
Generate Inconsistent Output: Mix concerns or fail to follow specific instructions for individual sections.

This was precisely the bottleneck we faced. Our "Improve" and "Extend & Improve" steps, for example, needed to process multiple sections of a document, each requiring specific attention. A single LLM call simply couldn't cut it for the quality and reliability we needed.

The Solution: Fanning Out for Focused Intelligence

Our answer was to introduce a "fan-out" mechanism. Instead of one LLM call per workflow step, a single step could now trigger multiple LLM calls, each focused on a specific sub-section or feature. This required a significant overhaul across our stack, from the database schema to the frontend UI.

Laying the Foundation: Database & Schema

First, we needed to store the configuration for fanning out and the results of each sub-call. We extended our WorkflowStep Prisma model:

typescript

// prisma/schema.prisma
model WorkflowStep {
  // ... existing fields ...
  fanOutConfig Json?    // Configuration for how to fan out (e.g., regex pattern)
  subOutputs   Json?    // Array of outputs from each sub-LLM call
}

This simple addition allowed us to define how a step should be fanned out and store the individual results, which would later be combined into a final digest.

The Brains: `section-splitter.ts` and `workflow-engine.ts`

The core logic resides in two places:

src/server/services/section-splitter.ts: This utility is crucial. Given a large text blob and a regex pattern, it splits the text into an array of sections. We built in safety measures like try/catch for regex failures and a 200 match cap to prevent runaway processes.

typescript

// src/server/services/section-splitter.ts (conceptual)
export function splitSections(content: string, pattern: string): Section[] {
  // ... regex matching, error handling, cap ...
  return matches.map(match => ({ heading: match.heading, content: match.content }));
}

src/server/services/workflow-engine.ts: This is where the magic happens. We introduced new types (FanOutConfig, SubOutput), event types (fan_out_progress, fan_out_done) for real-time updates, and new fields on our ChainContext to manage the fan-out state (fanOutSection, fanOutHeading, stepSubOutputs).

The runWorkflow() function now has a dedicated branch for fan-out execution. Instead of a single LLM call, it:
- Splits the input based on fanOutConfig.
- Iterates through each section.
- Makes a dedicated LLM call for each section, injecting context like {{fanOut.section}} and {{fanOut.heading}} into the prompt.
- Handles retries for individual sub-calls.
- Supports resuming a workflow mid-fan-out, picking up from the last completed section (with a heading consistency check to prevent data drift).
- Combines the individual subOutputs into a final digest for the step.

This means our prompt templates can now be incredibly precise:

// Example prompt snippet
You are tasked with improving the following section of code:
### {{fanOut.heading}}
{{fanOut.section}}

Focus specifically on this section. Do not address other parts of the document.

Prompt Engineering & Configuration

We updated our src/lib/constants.ts to define which steps leverage fan-out and how. For instance, our deepPrompt and extensionPrompt now fan out based on patterns like ###\s+\d+\. (matching markdown headings like ### 1. Feature A). We also bumped the maxTokens for these individual calls to 8192, giving the LLM more room to breathe within its focused task.

API & UI: Bringing it to Life

What good is a powerful backend if the user can't see or control it?

API (src/server/trpc/routers/workflows.ts): We extended our create, duplicate, and steps.add mutations to accept and propagate the fanOutConfig. Crucially, we also ensured subOutputs, digest, and checkpoint are reset when a fan-out step is retried. Input validation for the regex pattern was added using z.string().max(200).refine().
Frontend (src/app/(dashboard)/dashboard/workflows/[id]/page.tsx): This is where the user experience really shines.
- Real-time Progress: An SSE handler captures fan_out_progress and fan_out_done events, powering a dynamic progress bar showing "Processing section N of M."
- Tabbed Sub-Output Viewer: Once a fan-out step completes, users can explore each subOutput individually in a horizontally scrollable tabbed interface. Each tab provides options to download or copy the specific section's output, along with token and cost metadata.
- Visual Cue: A clear "fan-out (N)" badge on step headers immediately tells the user that this step generated multiple outputs.

This level of transparency and control is vital for debugging and understanding complex AI workflows.

Lessons Learned: The Prisma `Json?` Gotcha

No significant feature implementation is without its quirks. Our biggest "aha!" moment came with Prisma's Json? fields.

The Problem: We initially tried to set fanOutConfig: null directly into Prisma create data when a step didn't have a fan-out configuration.

The Failure: Type 'null' is not assignable to type 'NullableJsonNullValueInput | InputJsonValue | undefined'. Prisma's JSON fields, especially nullable ones, are strict. JavaScript's null is not directly compatible with Prisma.JsonNull or undefined when you want to explicitly unset or not set a nullable JSON column.

The Workaround: We adopted a conditional spread pattern:

typescript

// Instead of: { fanOutConfig: step.fanOutConfig || null }
// Use:
...(step.fanOutConfig ? { fanOutConfig: step.fanOutConfig } : {})

This pattern ensures that the fanOutConfig field is only included in the Prisma create or update payload if step.fanOutConfig is truthy. If it's null or undefined, the field is simply omitted, allowing Prisma to apply the column's default (which is null for Json?). This is a recurring Prisma gotcha, now firmly documented in our internal CLAUDE.md!

What's Next?

With the core fan-out functionality committed and type-checked, here's what's on our immediate roadmap:

End-to-End Testing: Create a Deep Build Pipeline workflow, run it through all 9 steps, and verify that "Implementation Prompts" generates individual prompts per feature.
Resume Verification: Test killing a workflow mid-fan-out, restarting it, and confirming it correctly continues from the last completed section.
UI for Editing Fan-Out Config: Allow users to edit fanOutConfig on existing steps via the steps.update mutation.
Unit Tests for splitSections(): Add comprehensive tests for edge cases like empty input, no matches, content before the first heading, and overlapping headings.
Cost Estimation Multiplier: Integrate a fan-out multiplier into estimateWorkflowCost() so our cost estimates accurately reflect the N LLM calls per fan-out step.

Conclusion

Implementing fan-out execution was a significant architectural step, but one that has already paid dividends in the quality and reliability of our LLM-powered workflows. By breaking down complex tasks into focused, manageable sub-tasks for the LLM, we've bypassed token limits, improved output consistency, and provided a much more transparent and controllable experience for our users. This journey underscores the importance of adapting our systems to the unique constraints and capabilities of LLMs, moving beyond simple prompt calls to building truly intelligent, robust agents.

json

{"thingsDone":["Implemented fan-out step execution","Added fanOutConfig and subOutputs to WorkflowStep Prisma model","Created section-splitter utility with regex safety","Extended workflow-engine with fan-out logic, events, and template variables","Updated prompt definitions to use fan-out","Integrated fan-out config into API mutations","Developed frontend UI for fan-out progress and tabbed sub-output viewing"],"pains":["Prisma Json? field type incompatibility with JS null"],"successes":["Achieved dedicated LLM calls per feature/section","Improved LLM output quality and reliability","Implemented retry and resume for fan-out steps","Enhanced user experience with real-time progress and detailed sub-output viewing","Documented Prisma Json? workaround for future reference"],"techStack":["TypeScript","Node.js","Prisma","Next.js","React","tRPC","SSE","LLMs"]}

The Problem: One Prompt to Rule Them All (and Fail)

The Solution: Fanning Out for Focused Intelligence

Laying the Foundation: Database & Schema

The Brains: section-splitter.ts and workflow-engine.ts

Prompt Engineering & Configuration

API & UI: Bringing it to Life

Lessons Learned: The Prisma Json? Gotcha

What's Next?

Conclusion

The Brains: `section-splitter.ts` and `workflow-engine.ts`

Lessons Learned: The Prisma `Json?` Gotcha