nyxcore-systems
6 min read

Unleashing Our AI Workflow: Conquering Context, Compatibility, and Code Pollution

A deep dive into how we debugged and deployed critical fixes to our AI-powered development workflow, addressing cross-stack code pattern pollution, LLM reasoning model quirks, and crucial token limit constraints, culminating in a successful 18-action implementation plan generation.

AILLMWorkflow AutomationBackendTypeScriptOpenAIGeminiCode GenerationDeveloper Productivity

The past 24 hours have been a whirlwind of focused development, culminating in a significant upgrade to our AI-powered code generation pipeline. We tackled some persistent architectural challenges, refined our interaction with large language models, and supercharged our workflow's capacity to deliver comprehensive implementation plans. The good news? All critical fixes are now live in production, and our latest 18-action workflow has successfully generated a massive 179,000-character implementation plan – a true testament to the power of these improvements.

Let's break down how we got here.

The Core Challenges We Faced

Our primary goals for this session were threefold:

  1. Eliminate Cross-Stack Code Pattern Pollution: Our system was sometimes suggesting code patterns from irrelevant tech stacks, leading to noise and reduced AI effectiveness.
  2. Optimize OpenAI Reasoning Model Compatibility: We discovered subtle differences in how OpenAI's specialized reasoning models handle parameters, impacting their performance.
  3. Break Through Workflow Step Token Limits: Complex projects require extensive context, and our previous token limits were a bottleneck for generating detailed plans.

With these targets in sight, here's how we engineered the solutions.

Engineering the Solutions

1. Intelligent Stack Detection and Code Pattern Filtering

The problem of "Go-pattern pollution" (where our Go-based CodeMCP would sometimes surface irrelevant Go code patterns in, say, a TypeScript project) was a nagging issue. Our AI workflow relies on relevant code patterns to inform its suggestions, and mismatched patterns dilute its intelligence.

Our solution involved introducing a new layer of intelligence:

  • detectTargetStackFromContent(): This new utility in src/server/services/stack-detector.ts now intelligently scans the content of a project note or text for tech keywords (e.g., "TypeScript," "React," "Go," "Python"). It then determines the dominant tech stack.
  • filterReposByTargetStack(): A generic helper was added to stack-detector.ts to drop repositories whose primary language doesn't match the detected target stack.
  • Integrated Filtering: We updated src/server/services/note-enrichment.ts to detect the target stack from note content and filter mismatched repos before loading code patterns. Similarly, src/server/services/workflow-engine.ts's loadProjectWisdom() now applies the same filtering, detecting the stack from the five most recent project notes.

Impact: This ensures our AI is fed highly relevant code patterns, drastically improving the quality and precision of its output. We even documented this fix in our internal reports (docs/reports/2026-03-15-image-to-implementation-pipeline-poc.md) as a critical enhancement.

Commit: f12153b

2. Taming LLM Reasoning Models: A Nuance in Parameters

Working with various LLM providers means understanding their unique quirks. We discovered that OpenAI's specialized "reasoning models" (like o1, o3, o4-mini) behave differently when it comes to token limits and generation parameters.

  • isReasoningModel() Detection: We added a simple regex (/^o\d/) to src/server/services/llm/adapters/openai.ts to identify these models.
  • Parameter Refinement: For reasoning models, we now explicitly use max_completion_tokens instead of the more general max_tokens. Crucially, we also skip temperature and top_p parameters for these models, as they are often counterproductive for tasks requiring precise, logical reasoning rather than creative generation.

Impact: This fine-tuning ensures we're interacting with OpenAI's specialized models in the most effective way, optimizing their output for reasoning tasks and avoiding unnecessary overhead.

Commit: bc45174

3. Supercharging Context: Bumping Workflow Token Limits

One of the most immediate and impactful changes was increasing our workflow step token limits. Our previous default of 4096 tokens was simply not enough to capture the depth and detail required for complex project implementation plans.

  • Increased Default: We updated the step schema default in src/server/trpc/routers/workflows.ts from 4096 to a robust 16384.
  • Production Patch: To apply this to our ongoing workflow, we directly patched all 20 steps of workflow 20919e29 via a direct SQL command on production, resetting them to pending for a re-run.

Impact: This expanded context window directly translated into the ability to generate far more comprehensive and detailed implementation plans, moving from an average of 145K characters to nearly 180K characters in our latest run.

Commit: e421244

The Grand Verification: A Successful 18-Action Workflow

The ultimate test of these fixes was to run a demanding, real-world workflow. We selected workflow 20919e29, a "feat: project-onboarding, improvement" task with 18 distinct actions.

Here's how it performed:

  • Full Execution: The workflow, comprising 21 steps (Group Analysis, 18 implementation steps, Synthesis, and Implementation Prompt generation), completed flawlessly.
  • Model of Choice: All steps were executed using google/gemini-2.5-pro, which proved to be a reliable and capable model for this scale.
  • Efficiency: The entire process cost a mere $0.51 and completed in approximately 14 minutes.
  • Massive Output: The final implementation prompt (step 20) generated an impressive 179,427 characters of detailed plans – a significant leap from the 145K characters generated in previous runs before the token limit increase.
  • Complete Coverage: Crucially, all 18 action points identified in the initial analysis had a corresponding implementation plan heading, confirming 100% coverage.
  • Intelligent Planning: The Group Analysis step produced a clean dependency graph with four distinct phases and clearly identified parallel tracks, while the Synthesis step consolidated these into a cohesive plan with a phased timeline and cross-cutting concerns.

This successful run validates all our recent efforts, proving that our AI workflow is now more robust, intelligent, and capable than ever.

Challenges and Lessons Learned

No development session is without its bumps. Here are a few critical lessons from this round:

  1. Navigating LLM Rate Limits: The OpenAI o3 Encounter: We initially attempted to use OpenAI's o3 model for a Synthesis review step (step 20). It quickly failed with a 429 rate limit error: "Request too large for o3: Limit 30000 TPM, Requested 33687."

    • Lesson: OpenAI's "reasoning" models like o3 often have lower Tokens Per Minute (TPM) limits, making them unsuitable for steps with very large contexts. Always be mindful of model-specific rate limits and context window capabilities.
    • Workaround: We seamlessly switched to google/gemini-2.5-pro and retried the step, which handled the large context without issue.
  2. Database Migration Discipline: The db push Danger: This is a recurring, critical reminder: NEVER use db push or db push --accept-data-loss on production. It has a nasty habit of dropping pgvector embedding columns, leading to data loss.

    • Lesson: Always use controlled, safe migration scripts like ./scripts/db-migrate-safe.sh for production database updates to preserve data integrity.
  3. Resource Management: Anthropic API Credits: Our Anthropic API credits ran out during the session, causing Haiku digest generation to fall back to truncation.

    • Lesson: Keep a close eye on API credit usage for all external services to ensure uninterrupted workflow execution. A top-up is needed!

What's Next? Full Speed Ahead!

With the core workflow issues addressed and verified, our immediate focus shifts to implementation:

  1. Start Implementing from the 18 Plans: The priority order from our Group Analysis is clear: Phase 1 (Centralize authentication, Persist onboarding state, File scanning pipeline, SSE streaming) and parallel tracks (Git filesystem explorer, Letters submenu, Demo apps).
  2. Top Up Anthropic Credits: Restore full functionality for digest generation and review steps.
  3. Fix Pre-existing .images TypeScript Errors: A quick npx prisma generate or a local db push might be needed to resolve some lingering LSP errors related to the NoteImage model.
  4. Extract Implementation Prompt: For easy reference, we'll pull the massive implementation plan directly from the database to /tmp/workflow-impl-v2.md on the production server.

This session marks a significant leap forward in our ability to leverage AI for complex software development. By meticulously addressing technical debt and optimizing our LLM interactions, we've built a more reliable, intelligent, and productive pipeline. The future of AI-assisted development is looking brighter than ever!