Taming the AI's Implementation Prompt: From Project Plans to Perfect Code (and Making it Visible!)
We tackled a dual challenge with our AI's auto-implementation prompt: fixing its tendency to generate project management plans instead of code, and then making sure its brilliant (now code-focused) output was actually visible in the UI. A journey through prompt engineering, data storage nuances, and database wizardry.
Every developer knows the thrill of building an AI-powered feature. It promises to automate, accelerate, and elevate. But sometimes, these powerful tools have a mind of their own, leading to unexpected detours and head-scratching moments. Recently, we faced such a scenario with our "Auto Implementation Prompt" feature. The goal was simple: provide a high-level business plan, and our AI would generate a detailed, code-level implementation plan – thinking through API routes, database schemas, UI changes, and more.
However, our AI had other ideas. And when it did produce something useful, it played hide-and-seek in the UI. This is the story of how we wrestled with a wayward AI and a subtle data bug to get our feature back on track.
The Dual Dilemma: Misguided AI and Invisible Output
We had two critical problems to solve:
- The PM Plan Predicament: Instead of spitting out elegant
Prismaschemas ortRPCroute definitions, our AI was generating… project management artifacts. Think Gantt charts, risk registers, and timelines. While valuable in their own right, these were not the code-level instructions we needed. Our AI was acting more like a project manager than a senior engineer. - The Vanishing Act: Even when the AI did manage to produce something remotely useful, its output was frustratingly invisible in our user interface. It was like shouting into the void – the AI was responding, but our users couldn't see or interact with it.
This dual challenge meant our "Auto Implementation Prompt" was failing on both quality and usability. Time to dive deep.
Part 1: Reining in the AI – From PM to PR
The root of the "PM Plan Predicament" lay in our IMPLEMENTATION_PROMPT_SYSTEM. Large Language Models (LLMs) are incredibly flexible, and without precise instructions, they can interpret a request in many ways. Our prompt, it turned out, was a little too vague, allowing the AI to drift into project management territory.
Our fix involved a significant overhaul of the system prompt in src/server/services/implementation-prompt-generator.ts. We made several key changes:
- Hard Ban on PM Artifacts: We explicitly forbade any mention of Gantt charts, risk registers, timelines, or other project management deliverables.
- Explicit Requirement Extraction: We instructed the AI to rigorously extract technical requirements directly from the business plan.
- Richer, Code-Focused Output Sections: We added explicit sections for the AI to fill, such as:
API/Routes: Detailing new endpoints and their functionality.UI Changes: Specifying necessary frontend modifications.Database Schema: Suggesting new models, fields, and relations (e.g.,Prismasyntax).Commands to Run: Practical steps for setup or execution.
- Better Empty-Context Handling: Ensuring the AI still provided a structured response even with minimal input.
This refined prompt, committed in 577e9e2, was our first step towards guiding the AI back to its engineering roots.
Part 2: Unveiling the Output – The Case of the Invisible content
With the AI now (theoretically) generating better output, the next hurdle was making it visible. We discovered a subtle but critical bug in how we were storing the LLM's response for the implementation prompt step.
In src/server/services/workflow-engine.ts, around lines 2552-2562, we were storing the raw string content of the AI's response:
// Old (buggy) way:
// step.output = result.content; // 'result.content' is just a string
However, our UI component expected a specific JSON structure from the LLMCompletionResult object, specifically looking for (step.output as any)?.content. When step.output was a plain string, (step.output as any)?.content would, of course, return undefined, making the output disappear from the UI.
The fix was straightforward: store the entire LLMCompletionResult object, which contains content, model, provider, tokenUsage, and costEstimate, mirroring how all other workflow steps already handled LLM outputs.
// New (correct) way:
// Assuming 'result' is the full LLMCompletionResult object
step.output = result; // Stores the object: { content: "...", model: "...", ... }
This change, committed in 4dfbf90, brought consistency to our data storage and ensured the UI could correctly parse and display the AI's output.
The Retroactive Fix: Patching Production Data
Fixing the code was only half the battle. We had existing workflows in production where the implementation prompt step had stored its output as a raw string. To make these historical outputs visible and correctly structured, we had to perform a database patch:
UPDATE workflow_steps
SET output = jsonb_build_object(
'content', output::text,
'model', 'unknown_model', -- Placeholder, as original didn't store this
'provider', 'unknown_provider', -- Placeholder
'tokenUsage', jsonb_build_object('prompt', 0, 'completion', 0, 'total', 0), -- Placeholder
'costEstimate', 0.0 -- Placeholder
)
WHERE workflow_id = 'b71f6f2e' -- Target specific workflow for immediate verification
AND step_type = 'implementation_prompt'
AND jsonb_typeof(output) = 'string'; -- Only update if it's still a raw string
This SQL command converted the raw string output into the expected LLMCompletionResult JSON object, allowing the UI to display it correctly. We focused on a specific, recently run workflow (b71f6f2e) for immediate verification.
Lessons Learned from the Trenches
Debugging always offers valuable insights. Here are a couple from this session:
1. The Importance of Consistent Data Shape
Our LLMCompletionResult object contained crucial metadata alongside the content. Storing only result.content for one specific step, while all others stored the full result object, was an inconsistency that led to a UI display bug. Always strive for consistency in data structures, especially when multiple parts of your system (backend storage, frontend display) rely on them.
2. SQL's Type Tantrums
When trying to inspect the jsonb output directly in PostgreSQL, my initial attempt to use substring failed:
-- Failed attempt:
SELECT substring(output from 1 for 3000) FROM workflow_steps WHERE id = ...;
-- ERROR: function pg_catalog.substring(jsonb, integer, integer) does not exist
PostgreSQL's substring function doesn't directly operate on jsonb types. The solution was to explicitly cast the jsonb column to text first:
-- Correct approach:
SELECT substring(output::text from 1 for 3000) FROM workflow_steps WHERE id = ...;
A good reminder that while jsonb is powerful, it still has specific rules for interaction with string functions. Always remember your type casting!
The Outcome: Success and a Clear Path Forward
After deploying both code fixes to production and patching the relevant database entries, we verified the results:
- The new
IMPLEMENTATION_PROMPT_SYSTEMsuccessfully guided the AI to produce detailed, code-level implementation plans, completely free of PM artifacts. We sawPrismaschemas,tRPCroutes,Stripeintegration details, and model definitions – exactly what we wanted! - The
LLMCompletionResultstorage fix ensured that these brilliant outputs were now fully visible and structured correctly in the UI.
The Auto Implementation Prompt feature is now delivering on its promise, providing actionable technical guidance.
What's Next?
Our journey doesn't stop here. We'll be:
- Monitoring Output Quality: Continuously refining the
IMPLEMENTATION_PROMPT_SYSTEMbased on real-world outputs across diverse workflow types. - Exploring New Features: Ideas like a "rent-a-persona" feature are on the horizon, potentially leveraging these refined prompts.
- Improving Observability: Adding
completedAttracking for better workflow step monitoring.
This session was a fantastic reminder that building AI-powered features is an iterative process – a blend of prompt engineering, robust backend development, and meticulous debugging. By addressing both the "brain" (AI's instructions) and the "body" (data storage and UI display), we've made our feature significantly more powerful and reliable.