Unleashing Our AI Workflow: Conquering Context, Compatibility, and Code Pollution
A deep dive into how we debugged and deployed critical fixes to our AI-powered development workflow, addressing cross-stack code pattern pollution, LLM reasoning model quirks, and crucial token limit constraints, culminating in a successful 18-action implementation plan generation.
The past 24 hours have been a whirlwind of focused development, culminating in a significant upgrade to our AI-powered code generation pipeline. We tackled some persistent architectural challenges, refined our interaction with large language models, and supercharged our workflow's capacity to deliver comprehensive implementation plans. The good news? All critical fixes are now live in production, and our latest 18-action workflow has successfully generated a massive 179,000-character implementation plan – a true testament to the power of these improvements.
Let's break down how we got here.
The Core Challenges We Faced
Our primary goals for this session were threefold:
- Eliminate Cross-Stack Code Pattern Pollution: Our system was sometimes suggesting code patterns from irrelevant tech stacks, leading to noise and reduced AI effectiveness.
- Optimize OpenAI Reasoning Model Compatibility: We discovered subtle differences in how OpenAI's specialized reasoning models handle parameters, impacting their performance.
- Break Through Workflow Step Token Limits: Complex projects require extensive context, and our previous token limits were a bottleneck for generating detailed plans.
With these targets in sight, here's how we engineered the solutions.
Engineering the Solutions
1. Intelligent Stack Detection and Code Pattern Filtering
The problem of "Go-pattern pollution" (where our Go-based CodeMCP would sometimes surface irrelevant Go code patterns in, say, a TypeScript project) was a nagging issue. Our AI workflow relies on relevant code patterns to inform its suggestions, and mismatched patterns dilute its intelligence.
Our solution involved introducing a new layer of intelligence:
detectTargetStackFromContent(): This new utility insrc/server/services/stack-detector.tsnow intelligently scans the content of a project note or text for tech keywords (e.g., "TypeScript," "React," "Go," "Python"). It then determines the dominant tech stack.filterReposByTargetStack(): A generic helper was added tostack-detector.tsto drop repositories whose primary language doesn't match the detected target stack.- Integrated Filtering: We updated
src/server/services/note-enrichment.tsto detect the target stack from note content and filter mismatched repos before loading code patterns. Similarly,src/server/services/workflow-engine.ts'sloadProjectWisdom()now applies the same filtering, detecting the stack from the five most recent project notes.
Impact: This ensures our AI is fed highly relevant code patterns, drastically improving the quality and precision of its output. We even documented this fix in our internal reports (docs/reports/2026-03-15-image-to-implementation-pipeline-poc.md) as a critical enhancement.
Commit: f12153b
2. Taming LLM Reasoning Models: A Nuance in Parameters
Working with various LLM providers means understanding their unique quirks. We discovered that OpenAI's specialized "reasoning models" (like o1, o3, o4-mini) behave differently when it comes to token limits and generation parameters.
isReasoningModel()Detection: We added a simple regex (/^o\d/) tosrc/server/services/llm/adapters/openai.tsto identify these models.- Parameter Refinement: For reasoning models, we now explicitly use
max_completion_tokensinstead of the more generalmax_tokens. Crucially, we also skiptemperatureandtop_pparameters for these models, as they are often counterproductive for tasks requiring precise, logical reasoning rather than creative generation.
Impact: This fine-tuning ensures we're interacting with OpenAI's specialized models in the most effective way, optimizing their output for reasoning tasks and avoiding unnecessary overhead.
Commit: bc45174
3. Supercharging Context: Bumping Workflow Token Limits
One of the most immediate and impactful changes was increasing our workflow step token limits. Our previous default of 4096 tokens was simply not enough to capture the depth and detail required for complex project implementation plans.
- Increased Default: We updated the step schema default in
src/server/trpc/routers/workflows.tsfrom4096to a robust16384. - Production Patch: To apply this to our ongoing workflow, we directly patched all 20 steps of workflow
20919e29via a direct SQL command on production, resetting them topendingfor a re-run.
Impact: This expanded context window directly translated into the ability to generate far more comprehensive and detailed implementation plans, moving from an average of 145K characters to nearly 180K characters in our latest run.
Commit: e421244
The Grand Verification: A Successful 18-Action Workflow
The ultimate test of these fixes was to run a demanding, real-world workflow. We selected workflow 20919e29, a "feat: project-onboarding, improvement" task with 18 distinct actions.
Here's how it performed:
- Full Execution: The workflow, comprising 21 steps (Group Analysis, 18 implementation steps, Synthesis, and Implementation Prompt generation), completed flawlessly.
- Model of Choice: All steps were executed using
google/gemini-2.5-pro, which proved to be a reliable and capable model for this scale. - Efficiency: The entire process cost a mere $0.51 and completed in approximately 14 minutes.
- Massive Output: The final implementation prompt (step 20) generated an impressive 179,427 characters of detailed plans – a significant leap from the 145K characters generated in previous runs before the token limit increase.
- Complete Coverage: Crucially, all 18 action points identified in the initial analysis had a corresponding implementation plan heading, confirming 100% coverage.
- Intelligent Planning: The Group Analysis step produced a clean dependency graph with four distinct phases and clearly identified parallel tracks, while the Synthesis step consolidated these into a cohesive plan with a phased timeline and cross-cutting concerns.
This successful run validates all our recent efforts, proving that our AI workflow is now more robust, intelligent, and capable than ever.
Challenges and Lessons Learned
No development session is without its bumps. Here are a few critical lessons from this round:
-
Navigating LLM Rate Limits: The OpenAI o3 Encounter: We initially attempted to use OpenAI's
o3model for a Synthesis review step (step 20). It quickly failed with a429 rate limiterror: "Request too large for o3: Limit 30000 TPM, Requested 33687."- Lesson: OpenAI's "reasoning" models like
o3often have lower Tokens Per Minute (TPM) limits, making them unsuitable for steps with very large contexts. Always be mindful of model-specific rate limits and context window capabilities. - Workaround: We seamlessly switched to
google/gemini-2.5-proand retried the step, which handled the large context without issue.
- Lesson: OpenAI's "reasoning" models like
-
Database Migration Discipline: The
db pushDanger: This is a recurring, critical reminder: NEVER usedb pushordb push --accept-data-losson production. It has a nasty habit of droppingpgvectorembeddingcolumns, leading to data loss.- Lesson: Always use controlled, safe migration scripts like
./scripts/db-migrate-safe.shfor production database updates to preserve data integrity.
- Lesson: Always use controlled, safe migration scripts like
-
Resource Management: Anthropic API Credits: Our Anthropic API credits ran out during the session, causing Haiku digest generation to fall back to truncation.
- Lesson: Keep a close eye on API credit usage for all external services to ensure uninterrupted workflow execution. A top-up is needed!
What's Next? Full Speed Ahead!
With the core workflow issues addressed and verified, our immediate focus shifts to implementation:
- Start Implementing from the 18 Plans: The priority order from our Group Analysis is clear: Phase 1 (Centralize authentication, Persist onboarding state, File scanning pipeline, SSE streaming) and parallel tracks (Git filesystem explorer, Letters submenu, Demo apps).
- Top Up Anthropic Credits: Restore full functionality for digest generation and review steps.
- Fix Pre-existing
.imagesTypeScript Errors: A quicknpx prisma generateor a localdb pushmight be needed to resolve some lingering LSP errors related to theNoteImagemodel. - Extract Implementation Prompt: For easy reference, we'll pull the massive implementation plan directly from the database to
/tmp/workflow-impl-v2.mdon the production server.
This session marks a significant leap forward in our ability to leverage AI for complex software development. By meticulously addressing technical debt and optimizing our LLM interactions, we've built a more reliable, intelligent, and productive pipeline. The future of AI-assisted development is looking brighter than ever!