Scaling AI Code Generation: Fanning Out Prompts and Taming the Wisdom Filter
Dive into how we scaled our AI code generation engine to handle complex group workflows and eliminated 'wrong-stack' code pattern pollution, ensuring contextually relevant implementation prompts every time.
Building AI-powered development tools is a wild ride. Every session brings new challenges, often exposing subtle cracks in your assumptions. This past week, we hit a significant milestone: our workflow engine can now fan out complex group tasks into discrete, targeted implementation prompts and intelligently filter out irrelevant code patterns that could lead to "wrong-stack" code generation.
It wasn't a straightforward path. Let's dig into the challenges and solutions from our latest development sprint.
The Challenge: From Monolithic to Multi-Task AI Guidance
Our AI workflow engine is designed to take high-level development notes and break them down into actionable implementation prompts. Initially, it worked great for single, focused tasks. But what about when a note describes a group of related, yet distinct, items? For instance, "Implement CRUD operations for User, Product, and Order." A single, monolithic prompt wouldn't cut it. We needed a way to:
- Fan-out: Transform a single group workflow into multiple, individual implementation prompts. Each item (User, Product, Order) should get its own dedicated prompt.
- Contextualize: Ensure each fanned-out prompt receives highly relevant context, specific to its item, and only to its item.
Solution 1: Fanning Out the Genius
Our first step was to enable the "fan-out" mechanism. This involved a significant refactor within our workflow-engine.ts.
When our group-analysis step identifies a group workflow, the engine now iterates over each item-* step. For every item, it dynamically generates a distinct implementation prompt. These individual prompts are then exposed as subOutputs, ready for the developer to consume.
Here's a glimpse of the key changes:
workflow-engine.ts:2527-2662: The core logic to detect group workflows and loop over items, generating per-item prompts.implementation-prompt-generator.ts: We added new helpers likedetectTargetStack()andbuildGroupItemPromptInput()to intelligently craft these individual prompts.workflow-engine.ts:3008-3012: A small but mighty helpergetOutputContent()was introduced to simplify single-step output extraction, avoiding confusion with our existingextractStepContentwhich is designed for arrays.
The immediate result was a much better developer experience. Instead of one giant block of text, we now present multiple, distinct tabs – one for each item – making it far easier to digest and act upon.
The Unexpected Villain: Codebase Contamination (The "Go" Problem)
As we scaled, a more insidious problem emerged: our AI was sometimes generating implementation prompts with code patterns entirely alien to the target codebase. Specifically, we observed Python-focused tasks receiving a healthy dose of Go code examples. This wasn't just a minor annoyance; it was a fundamental threat to the utility of our tool. Why was our AI "hallucinating" in the wrong language?
The root cause was a subtle, yet critical flaw in how we were feeding "project wisdom" to the AI. Our code_patterns database, intended to provide relevant examples, contained over 232 Go entries, originally pulled from a separate CodeMCP repository. When our loadProjectWisdom() and note-enrichment.ts modules loaded these patterns unfiltered, the sheer volume of Go examples drowned out any Python-specific references, polluting the context given to the LLM.
The AI, trying its best to be helpful, was simply reflecting the dominant patterns it was being fed.
Solution 2: The Wisdom Filter – Guarding Against Stack Mismatch
To combat this, we implemented a multi-layered defense strategy:
-
Stack-Aware Pattern Labeling: We introduced a brand new
stack-detector.tsmodule. This module candetectRepoStack()from file paths and, crucially,formatCodePatternsWithStack(). Now, instead of raw code dumps, our project wisdom patterns are labeled, e.g.,[Go],[Python],[TypeScript], along with an advisory note. This happens at the enrichment stage (note-enrichment.ts) and when loading project wisdom into the workflow engine (workflow-engine.ts:loadProjectWisdom()). -
Context Suppression on Mismatch: The most critical layer of defense resides in
implementation-prompt-generator.ts:buildGroupItemPromptInput(). Before generating the final prompt for an item, we now compare the detected target stack for that item with the overall codebase stack. If there's a mismatch (e.g., a Python task in a Go codebase, or vice-versa), we take drastic action:- Strip out potentially misleading context like
claudeMdandfileTreereferences. - Add a prominent
WARNINGto the prompt, alerting the developer to the potential stack discrepancy. - Keep only a truncated version of the
projectWisdomthat is explicitly relevant or generic.
- Strip out potentially misleading context like
The Proof is in the Prompts: Verification Results
We ran three versions of a complex workflow to verify our solutions:
| Metric | v1 (monolithic) | v2 (fan-out, no filter) | v3 (fan-out + filter) |
|---|---|---|---|
| Action Points | 10 | 10 | 16 |
| Impl Prompts | 1 block | 10 tabs | 15 tabs |
| Output size | 22K | 124K | 166K |
| Python blocks | 0 | 26 | 55 |
| Go blocks | 8 (WRONG) | 5 | 0 |
The results speak for themselves. Version 3, with both fan-out and the wisdom filter, generated significantly more action points and implementation prompts (15 tabs!), leading to a much richer and more granular output. Crucially, the "Go contamination" was completely eliminated, with Python blocks skyrocketing from 0 in v1 to 55 in v3.
This confirms our multi-pronged approach to filtering irrelevant wisdom was highly effective.
Lessons Learned & Key Takeaways
- Context is King, but Filtered Context is Emperor: Unfiltered knowledge bases are not just unhelpful; they can actively mislead your AI and degrade the quality of its output. Always apply stringent filtering and relevance checks.
- Multi-Layered Defense is Crucial: Solving complex AI output issues often requires interventions at multiple stages of the pipeline – from data ingestion and enrichment to prompt generation and final output. A single fix is rarely enough.
- Trace Root Causes to Your Data: When something goes wrong with AI output, don't just tweak prompts. Dig into your data sources and how they're being processed. Our "Go contamination" was a data problem at its core.
- Developer Experience Matters: Fanning out prompts into discrete tabs significantly improves usability and reduces cognitive load, making the AI's output far more actionable.
What's Next?
With these core features deployed and verified, we're now moving to a consistency check, comparing the generated prompts against the original requirements to ensure nothing was missed. We also have some minor cleanup and optimization tasks on the horizon, like removing unused tRPC procedures and resolving TypeScript diagnostics.
The journey to building truly intelligent and helpful AI development agents is ongoing, but with each session, we're getting closer to a system that not only understands but also effectively guides developers through complex tasks, in the right language, every time.
{"thingsDone":["Implemented fan-out for group workflows, generating per-item implementation prompts.","Developed a stack-aware wisdom filter to prevent wrong-stack code pattern pollution.","Introduced stack labeling for code patterns ([Go], [Python], etc.).","Implemented context suppression in implementation prompts when target stack mismatches codebase stack.","Increased action point limit for enrichment prompts.","Verified full pipeline functionality and improved output quality across 3 workflow versions."],"pains":["Difficulty extracting single-step output content from workflow engine.","Root cause of Go code contamination in Python prompts traced to unfiltered 'project wisdom' database."],"successes":["Achieved 0 Go blocks in Python-targeted implementation prompts.","Increased relevant Python blocks from 0 to 55.","Generated significantly more granular and actionable implementation prompts (15 tabs vs 1 block).","Successfully deployed and verified all new features."],"techStack":["TypeScript","AI/LLM (implied)","Workflow Engine","Code Generation","System Architecture"]}