Taming the LLM: From Monolithic Prompts to Surgical Precision in Code Generation

It was late, session four, and the digital clock on my IDE glared 2026-03-14. The kind of late where the code starts to feel less like a problem and more like a conversation. Tonight, the conversation was with our AI workflow engine, specifically about why it sometimes spoke the wrong language.

Our goal was clear: fix group workflow implementation prompts to generate per-item fan-out prompts with the correct target language, and add crucial wisdom filtering to prevent irrelevant code patterns from polluting every stage of the workflow. The first part? Done. The second? Still on the horizon.

The Problem: When Your AI Learns the Wrong Language

Imagine you're building a multi-stage AI-driven development workflow. At some point, an LLM needs to generate actual code. In our system, we had a project.wisdom component – essentially a collection of best practices and code patterns gleaned from analyzing existing repositories (like CodeMCP). This wisdom, a treasure trove of 16K characters of Go-specific patterns (Cobra CLI, slog, sync.Once, build tags), was being injected into every single workflow stage.

The result? Our LLM, trying its best to be helpful, would sometimes churn out Go code blocks even when the task at hand was clearly Python. It was like asking a chef to bake a cake, handing them a cookbook full of Italian recipes, and getting a lasagna. Delicious, but not what we asked for.

Initially, our group workflows generated a single, monolithic prompt for all action points. This meant a single, often enormous, context block for the LLM. It was a firehose, not a focused instruction.

The Breakthrough: Fan-Out Prompts and Surgical Context

Our first major step was to break down that monolithic prompt. We refactored workflow-engine.ts to generate one implementation prompt per action point. Each action point now gets its own subOutput, which the UI renders as individual tabs. This alone was a huge win for clarity and focus.

typescript

// Simplified pseudo-code illustrating the fan-out concept
async function generateGroupWorkflowImplementationPrompts(groupWorkflow: GroupWorkflow): Promise<SubOutput[]> {
    const actionPoints = await extractActionPoints(groupWorkflow.input);
    const subOutputs: SubOutput[] = [];

    for (const actionPoint of actionPoints) {
        // Each action point now gets its own focused prompt input
        const promptInput = await buildGroupItemPromptInput(actionPoint, groupWorkflow.context);
        const implementationPrompt = await generateImplementationPrompt(promptInput);
        subOutputs.push({
            id: actionPoint.id,
            title: actionPoint.title,
            content: implementationPrompt,
            // ... other metadata for UI tabs
        });
    }
    return subOutputs;
}

Detecting the Target Stack

Next, we needed to ensure the LLM understood the target language for each specific action point. In implementation-prompt-generator.ts, we introduced detectTargetStack(). This function scans the action point's description for common file extensions (.py for Python, .go for Go, .ts for TypeScript). If it finds two or more references to a dominant language, that becomes our target stack.

This was crucial. Now, instead of guessing, the LLM had a clear directive.

Taming Irrelevant Context: Stack Mismatch Suppression

Even with individual prompts and target stack detection, the project.wisdom was still lurking. If an action point needed Python code, but the project.wisdom was full of Go patterns, the LLM could still get confused.

Our solution: stack mismatch suppression. In buildGroupItemPromptInput(), if the detected target stack didn't match the codebase's primary stack (e.g., a Python task in a TypeScript project), we became ruthless:

We stripped claudeMd and fileTree from the context. These contain detailed codebase context that, while usually helpful, could be misleading if the task was in a different language.
We added an explicit WARNING to the LLM, making it abundantly clear that it should not rely on the codebase context for this particular task.
We kept only a truncated version of projectWisdom as a high-level reference, but without the specific, language-bound patterns.

This was a game-changer. It forced the LLM to focus purely on the action point's requirements and its own general knowledge for the specified language, unburdened by potentially conflicting project-specific details.

The New System Prompt: `GROUP_ITEM_IMPLEMENTATION_SYSTEM`

We also refined our system prompt, GROUP_ITEM_IMPLEMENTATION_SYSTEM. It's now scoped precisely to single action points, aiming for 200-400 lines of output, explicitly respects the target stack, and notes any prerequisites. This provides a tight, focused instruction set for the LLM.

Verification: A Taste of Success

The results were immediate and satisfying. Using workflow 20b14663 as our test case:

10 out of 10 action points now received individual implementation prompts (a massive improvement from 1/10 previously).
We saw 26 Python code blocks generated (up from 0 before).
Go blocks were correctly limited to specific action points (#6 Denial-of-Wallet, CLI and #10 Evaluation Suite, CLI) that genuinely required Go.

This confirmed our "fan-out" and "stack detection" mechanisms were working beautifully for the implementation stage.

The Lingering Ghost: Unfiltered Wisdom

However, the verification also revealed the next battlefront. Even in the correctly identified Go blocks, the code patterns seemed to heavily lean on the CodeMCP Go patterns we knew were in project.wisdom.

This highlighted a critical gap: while we'd fixed the prompts for implementation, the project.wisdom was still injecting 16K characters of Go-specific patterns into ALL workflow stages: enrichment, per-item steps, synthesis, and then implementation prompts.

The root cause was clear: our code_patterns table contains 20+ Go-specific entries, loaded indiscriminately via loadProjectWisdom() in workflow-engine.ts and note-enrichment.ts.

Impact on enrichment: Minimal, as the LLM's system prompt here focuses on extracting items, not coding.
Impact on workflow steps: Moderate, as per-item plans might still incorporate Go-flavored architectural suggestions, even for Python tasks.
Impact on implementation prompts: Severe (now largely mitigated by our recent changes, but still a factor if the explicit WARNING isn't triggered).

The lesson is stark: Context is king, but relevant context is emperor. An LLM needs a focused diet, not a firehose of potentially conflicting information, especially when it comes to specific code patterns.

What's Next: Taming the Wisdom Stream

Our immediate next steps are clear:

Build a wisdom filter: Add stack-aware filtering to loadProjectWisdom() in workflow-engine.ts. We'll detect the target stack (from action point descriptions or workflow input) and filter out code_patterns that don't match.
Filter enrichment too: Apply the same filtering logic to note-enrichment.ts where code_patterns are loaded.
Comprehensive comparison: After deployment, we'll create a new enrichment from an original note and conduct a full 0-100 comparison test: old (Go-contaminated) enrichment vs. new (filtered) enrichment, tracing its impact through action points, workflow steps, and final implementation prompts.

This journey is a reminder that building intelligent systems isn't just about throwing more data at an LLM. It's about careful curation, surgical precision in prompt engineering, and an iterative approach to context management. We're teaching our AI to speak the right language, one filtered wisdom nugget at a time.

json

{
  "thingsDone": [
    "Fan-out implementation prompts for group workflows (one prompt per action point)",
    "Target stack detection (Python, Go, TypeScript) based on file extensions in action points",
    "Stack mismatch suppression: stripping irrelevant codebase context and adding explicit LLM warnings for language mismatches",
    "Refined GROUP_ITEM_IMPLEMENTATION_SYSTEM system prompt for focused, scoped output",
    "Deployment of fan-out and stack mismatch fixes to production"
  ],
  "pains": [
    "Monolithic prompts leading to incorrect language generation by LLM",
    "Indiscriminate injection of 16K chars of Go-specific CodeMCP patterns into ALL workflow stages via project.wisdom",
    "Go patterns polluting enrichment and per-item workflow steps, even after implementation prompt fixes"
  ],
  "successes": [
    "10/10 action points receiving individual implementation prompts",
    "Correct generation of Python code blocks (26 vs 0 before)",
    "Go code blocks correctly limited to specific Go-requiring action points",
    "Successful deployment and verification of fan-out and stack mismatch fixes"
  ],
  "techStack": [
    "TypeScript",
    "LLM (ClaudeMd)",
    "Workflow Engine",
    "Go (for specific project context/patterns)",
    "Python (for specific project context/patterns)"
  ]
}