nyxcore-systems
6 min read

From AI Review Glitches to Workflow Guardrails: A Deep Dive into Our Latest System Enhancements

A look behind the scenes at a recent development sprint, where we tackled AI review reliability, fortified our workflow engine with a new consistency check, and learned some crucial lessons along the way.

AIWorkflowEngineeringDebuggingPrismaPostgreSQLBackendFrontendDeveloperExperience

Ever had one of those development sessions where you're juggling multiple critical tasks, from squashing elusive AI bugs to architecting entirely new system safeguards? That was our Tuesday morning. We embarked on a mission to harden our AI review processes, elevate the quality of our automated workflows, and introduce a crucial new layer of intelligence to our core workflow engine.

By the time the dust settled, we'd pushed significant improvements to production, ready for our users to experience a more robust and intelligent system. Let's break down what happened.

Fortifying Our AI Review System

Our AI review functionality is a cornerstone of our development process, helping us catch issues early. However, we noticed some areas for improvement, particularly around reliability and clarity.

The Challenge: Users were occasionally encountering "mutated" AI reviews or unclear error messages when the AI couldn't provide suggestions. Under the hood, we found that sometimes our AI providers would fail, or diffs were simply too large for a single prompt.

Our Solution:

  1. Enhanced Provider Fallback: We implemented a more resilient fallback mechanism for our AI review providers. If Google's model struggles, we now gracefully fall back to Anthropic, and then to OpenAI, ensuring we always try our best to get a review. This significantly boosts the reliability of the feature.
  2. Diff Size Cap: Large code diffs can overwhelm AI models, leading to truncated or failed reviews. We introduced a hard cap of 60,000 characters for diffs sent to the AI, preventing these issues and ensuring more focused, actionable feedback.
  3. Clearer Error Propagation: We refined our error handling in parseReviewSuggestions and updated the frontend. Now, instead of ambiguous failures, users will see clear messages like "No issues found" or specific error details if the AI encountered a problem. This transparency is crucial for a smooth developer experience.

These changes, primarily in src/server/trpc/routers/reviews.ts and src/app/(dashboard)/dashboard/projects/[id]/reviews/[prNumber]/page.tsx, mean a more stable and informative AI review experience for everyone. We also confirmed our OpenAI integration now correctly handles max_completion_tokens and temperature for reasoning models, ensuring optimal performance.

Introducing the Workflow Consistency Check: A New Era of Reliability

One of the most exciting developments from this session was the introduction of an auto-generated Consistency Check step within our workflow engine. This feature is a game-changer for maintaining high-quality, reliable automated code generation.

The Problem It Solves: We've seen instances where automated workflows, especially those generating multiple code components, could introduce subtle issues:

  • File Path Conflicts: Two generated components might try to write to the same file.
  • Duplicate Implementations: Redundant code being generated.
  • Pattern Inconsistencies: New code not adhering to existing project standards.
  • Dependency Violations: Generated code requiring libraries or components that don't exist or aren't properly configured.

Our analysis of an older workflow (a781148f) revealed these exact issues – wrong codebase context injected, duplicates, and file conflicts. This underscored the critical need for proactive validation.

How the Consistency Check Works:

  1. Pre-Fan-Out Analysis: This new step runs before our workflow engine "fans out" to generate individual implementation prompts. This means it analyzes the plans for each component, not the final generated code. This allows us to catch potential issues much earlier.
  2. Intelligent Review: Using Claude Sonnet (with fallbacks to Gemini and GPT), the check meticulously reviews each planned implementation for the issues listed above.
  3. Prompt Injection: The results of this consistency check are then injected directly into each subsequent fan-out prompt via a consistencyCheck parameter. This provides critical context to the AI generating the actual code, allowing it to self-correct and avoid introducing known issues.

This feature, built into src/server/services/workflow-engine.ts and src/server/services/implementation-prompt-generator.ts, acts as an intelligent guardrail, significantly reducing the likelihood of errors and improving the overall quality of generated code. We also increased maxTokens from 4096 to 8192 for all steps of our critical BRbase workflow (cce03fe4) to accommodate more complex generation tasks.

Lessons Learned: Navigating Production Database Waters

No deep dive is complete without a few bumps in the road. These "pain points" quickly became invaluable "lessons learned," especially when dealing with production environments.

1. The Peril of prisma db push on Production

The Attempt: To apply some minor schema changes, we initially tried to use prisma db push directly on our production database.

The Failure: A user immediately interrupted, recalling a critical warning in our internal CLAUDE.md documentation: prisma db push drops pgvector columns when making certain schema changes that it can't safely migrate. This would have meant significant data loss for our embeddings.

The Lesson: NEVER use prisma db push on production for anything but initial schema setup or environments where data loss is acceptable. For production, always use safe migration scripts generated by Prisma Migrate, or, in a pinch for very specific surgical changes, direct SQL commands. Our workaround involved logging into the PostgreSQL container via docker exec nyxcore-postgres-1 psql and executing raw ALTER TABLE statements.

2. Prisma camelCase vs. DB snake_case in Raw SQL

The Attempt: When writing a raw SQL query, we tried to reference a column as "isPersonal".

The Failure: The query failed. A quick check of the database confirmed the column's actual name was is_personal. Prisma's client uses camelCase in our TypeScript code, but the underlying database schema uses snake_case.

The Lesson: When executing raw SQL, always use the actual database column names (which are typically snake_case in PostgreSQL). Don't let Prisma's ORM abstraction lead you astray when you're going directly to the metal.

3. Reserved Keywords and Column Naming

The Attempt: We tried to query a column named position on the workflow_steps table.

The Failure: Another SQL error. It turns out the column was actually named order (confirmed via the Prisma schema).

The Lesson: Be mindful of reserved keywords in SQL. ORDER is a reserved word, which means if you do have a column named order, you must always quote it in your SQL queries (e.g., "order"). It's generally good practice to avoid using reserved words as column names when possible to prevent such issues.

Looking Ahead

All these changes are now live! A critical BRbase workflow (cce03fe4-a5a0-4933-ae3d-5de5199fb971) is about to be executed by a user, and we'll be closely monitoring it to ensure:

  • The correct BRbase patterns (e.g., ~/ imports, Clerk auth, MySQL, Jest, Chakra UI) are injected into the AI's implementation prompts.
  • The new consistency check proactively catches any potential issues.

Our next steps also involve resolving some existing merge conflicts in a key branch and preparing for future feature development.

This session was a testament to the continuous cycle of building, refining, and learning in software development. By addressing critical AI reliability, enhancing workflow quality with intelligent checks, and internalizing crucial database lessons, we're making our system more robust, intelligent, and a better experience for our users.