Context is King, But Filtered Context Reigns: Taming LLMs for Multi-Language Code Generation

Building intelligent developer agents is a fascinating journey, often filled with subtle challenges. One of the most persistent hurdles we face is ensuring our Language Models (LLMs) operate with the right context at the right time. Imagine an AI assistant meant to help you build a Python application, but it keeps suggesting Go patterns because it's been "over-educated" on Go projects. Frustrating, right?

That's precisely the challenge we recently tackled in our AI-powered development workflow. Our goal is to empower developers by automating complex tasks, from planning to code implementation. But a key piece of that puzzle was breaking down monolithic prompts and ensuring language fidelity.

The Problem: When "Wisdom" Becomes a Burden

Our platform uses a concept called project.wisdom – a curated knowledge base of code patterns, architectural styles, and best practices specific to a codebase or project. It's meant to be a helpful guide for the LLM at various stages of a workflow.

However, we discovered a critical flaw: this wisdom, while generally useful, was being injected unfiltered into all stages of our workflow. Specifically, our project.wisdom contained a significant amount of Go-specific code patterns (think Cobra CLI, slog, sync.Once) derived from analyzing a Go codebase (CodeMCP).

The impact was subtle at first, then became glaring:

Enrichment: Minimal impact, as the LLM focused on extracting action points from the user's note, not from the wisdom patterns.
Workflow Steps: Moderate. Per-item plans generated by the LLM might subtly incorporate Go-flavored architectural suggestions, even for a Python project.
Implementation Prompts (CRITICAL): Severe. When it came time to generate actual code, the LLM was often defaulting to Go, even when the task clearly implied Python. This meant wasted tokens, incorrect outputs, and a broken developer experience. Our agent, despite being given a Python task, was "thinking" in Go.

The root cause was clear: project.wisdom was a firehose, not a filtered stream.

Our Solution: Precision Context and Language Scoping

To combat this, we rolled out a series of targeted improvements, focusing on two main areas: breaking down complex tasks and intelligently filtering context.

1. Fan-Out Implementation Prompts: From Monolith to Micro-Tasks

Previously, our group workflows generated a single, monolithic implementation prompt for an entire set of actions. This made it difficult for the LLM to focus and often led to generic or incorrect outputs.

Our first step was to refactor this. Now, group workflows generate one distinct implementation prompt per action point. These are presented to the user as individual tabs in the UI, making the output manageable and highly focused. This change alone dramatically improved the LLM's ability to tackle specific tasks.

2. Intelligent Target Stack Detection

With individual action points, we could now apply more granular intelligence. We implemented a detectTargetStack() function that scans the action point's description and referenced files (e.g., .py for Python, .go for Go, .ts for TypeScript). If a dominant language emerges (e.g., two or more Python file references), we tag that action point with its target stack.

3. Stack Mismatch Suppression: The Context Stripper

This was a game-changer. When our system identifies that an action point's target stack does not match the primary codebase stack (e.g., a Go-based action point in a predominantly Python project), we take drastic measures:

We strip irrelevant context like claudeMd (general markdown descriptions) and the full fileTree from the LLM's input.
We add an explicit WARNING to the LLM, making it clear that it's operating outside the main codebase's language.
We keep only a truncated version of projectWisdom as a high-level reference, but without the problematic code patterns.

This ensures that even if a Go task appears in a Python project, the LLM isn't deluged with Python-specific context that would confuse it, and vice-versa.

4. Scoped System Prompts

We also refined our LLM system prompt for these granular tasks. The new GROUP_ITEM_IMPLEMENTATION_SYSTEM is specifically designed for single action points, guiding the LLM to produce focused outputs (200-400 lines), respect the target stack, and note any prerequisites.

The Results: A Leap Forward

After deploying these changes, the improvements were immediate and measurable:

10/10 action points in a test workflow now received individual, focused implementation prompts (compared to 1/10 before).
We saw a significant increase in correct Python code blocks generated (26 vs. 0 previously).
Crucially, Go code blocks were correctly limited to the specific action points that truly required Go (e.g., CLI tools written in Go), rather than appearing everywhere.

While the Go blocks in those specific action points still seemed to draw from our project.wisdom (injecting Go patterns), the suppression mechanism prevented them from polluting other, non-Go tasks. This validated our approach: we successfully localized the "Go thinking" to where it was actually needed.

The Next Frontier: Proactive Wisdom Filtering

While we've fixed the implementation prompt stage, our journey isn't over. We identified a critical gap: the project.wisdom firehose still flows unfiltered into earlier stages, specifically the loadProjectWisdom() function that enriches initial notes and the per-item workflow steps.

This means that while our final code generation is now much cleaner, the LLM might still receive Go-flavored architectural advice during its planning stages, even for a Python project.

Our immediate next steps are to implement a stack-aware wisdom filter directly at the source:

Filter loadProjectWisdom(): Detect the overall target stack of the workflow and filter code_patterns from project.wisdom to only include those relevant to the identified stack.
Filter note-enrichment.ts: Apply the same filtering logic to the code_patterns loaded during the initial note enrichment phase.

Once these filters are in place, we'll run a full end-to-end comparison, generating a new enrichment from an original note and observing its impact on action points, workflow steps, and implementation prompts. The goal is a truly clean, language-specific context from start to finish.

Lessons Learned

This session reinforced a crucial principle in building LLM-powered tools: context is king, but filtered context reigns supreme. Providing an LLM with a vast amount of potentially relevant information without intelligent filtering can be more detrimental than providing too little. Precision in context management, especially across different programming languages, is paramount for building reliable and effective AI developer agents.

We're excited about these advancements and the even smarter, more context-aware AI agents they're helping us build!