Debugging Our AI's Identity Crisis: From PM to Dev, and Making it Visible

The promise of AI in development is immense: automate boilerplate, generate insightful code, and accelerate feature delivery. We're building a system that leverages large language models (LLMs) to generate detailed implementation prompts from high-level business plans. The idea is to have the AI output a structured plan outlining API routes, UI changes, database schemas – essentially, a developer-ready spec.

But as any developer working with LLMs will tell you, the journey from "idea" to "reliable production feature" is rarely a straight line. This week, we hit a couple of interesting bumps: our AI developed an identity crisis, preferring project management plans over actual code plans, and then, to add insult to injury, its brilliant (and now corrected) output became completely invisible in our UI.

Here's the story of how we debugged and fixed these issues, offering some insights into both prompt engineering and the nitty-gritty of data handling in an AI-driven workflow.

Part 1: The AI's Identity Crisis – When Code Becomes PM

Our Auto Implementation Prompt feature is a cornerstone of our workflow. It takes a business plan and, ideally, spits out a structured, actionable technical plan. Lately, however, we noticed a disturbing trend: instead of Prisma schemas, tRPC routes, and Stripe integration details, we were getting Gantt charts, risk registers, and timelines. Our code-generating AI had apparently decided it was a project manager.

The culprit? Our IMPLEMENTATION_PROMPT_SYSTEM in src/server/services/implementation-prompt-generator.ts. While it had good intentions, it lacked the necessary guardrails and explicit instructions to keep the LLM focused on code.

The Fix: Prompt Engineering with a Heavy Hand

To steer our AI back to its developer roots, we significantly strengthened the system prompt. This wasn't just a tweak; it was an overhaul:

Hard Ban on PM Artifacts: We explicitly forbade any mention of "Gantt charts, risk registers, timelines, project plans, PM artifacts." Sometimes, telling an LLM what not to do is as important as telling it what to do.
Explicit Requirement Extraction: We emphasized the need to extract explicit technical requirements directly from the business plan.
Richer Output Sections: We refined the desired output structure to include specific, actionable sections like:
- API/Routes
- UI Changes
- Database Schema (e.g., Prisma models)
- Commands to Run (e.g., migrations, seed data)
Better Empty-Context Handling: Ensuring the prompt behaves gracefully even with minimal input.

This refined prompt (committed in 577e9e2) immediately started yielding the kind of code-level output we expected. We saw detailed Prisma schemas, tRPC route definitions, and concrete steps for integrating services like Stripe. The AI was back on track, generating zero PM artifacts. Success! Or so we thought...

Part 2: The Invisible Output – If a Tree Falls in the Forest...

Even with the AI generating perfect code plans, a new problem emerged: the output wasn't visible in our UI. Users would trigger the workflow, the AI would generate its brilliant plan, but the "Implementation Prompt" step in the UI would appear empty.

This was a classic data storage and retrieval mismatch. Our workflow engine stores the output of each step in the database. For most steps, we store the full LLMCompletionResult object, which includes content, model, provider, tokenUsage, costEstimate, etc. Our UI then expects to read (step.output as any)?.content to display the actual text.

However, for the Implementation Prompt step, a subtle bug had crept in: we were storing output: result.content (a raw string) instead of output: result (the full LLMCompletionResult object).

typescript

// The bug: storing only the content string
// In src/server/services/workflow-engine.ts:
// ...
// implementationPromptStep.output = result.content; // BAD!

// The fix: storing the full result object, matching other steps
// ...
implementationPromptStep.output = result as unknown as Record<string, any>; // GOOD!
// (Used 'Record<string, any>' for type compatibility with existing schema)

When the UI tried to access (step.output as any)?.content on a plain string, it naturally returned undefined, making the output invisible. All other step types correctly stored the full object, which made this particular bug a bit of an outlier and harder to spot initially. This fix was committed in 4dfbf90.

Lessons Learned from the Pain Log:

This debugging session wasn't without its own mini-challenges:

Data Structure Consistency is King: This bug hammered home the importance of consistent data structures. If your UI expects a certain shape ({ content: string, ... }), ensure your backend always provides that shape. Deviations, even seemingly minor ones, can lead to frustratingly invisible data.

Database Type Gotchas: While inspecting the production database to understand the malformed output, I ran into a common PostgreSQL error:

sql

-- Tried to inspect the raw jsonb output
SELECT substring(output FROM 1 FOR 3000) FROM workflow_steps WHERE id = 'b71f6f2e';
-- ERROR: function pg_catalog.substring(jsonb, integer, integer) does not exist

The fix was simple but highlights that even experienced developers can hit these:

sql

-- Correct way to inspect jsonb content
SELECT substring(output::text FROM 1 FOR 3000) FROM workflow_steps WHERE id = 'b71f6f2e';

Always remember to explicitly cast jsonb to text if you're using string functions on its content.

Retroactive Patching and Verification

After deploying both code fixes to production, we still had existing workflows with the old, malformed output. To ensure immediate user satisfaction for the affected workflow (b71f6f2e), we patched the production database retroactively:

sql

UPDATE workflow_steps
SET output = jsonb_build_object(
    'content', output::text,
    'model', 'gpt-4',
    'provider', 'openai',
    'tokenUsage', jsonb_build_object('prompt', 0, 'completion', 0, 'total', 0),
    'costEstimate', 0.00
)
WHERE id = 'b71f6f2e';

Note: The model, provider, tokenUsage, and costEstimate were placeholders since the original result.content string didn't contain them. For future workflows, the full object is stored correctly.

Finally, we verified everything:

The system prompt fix produced proper code-level output: Prisma schemas, tRPC routes, Stripe integration, SubscriptionPlan/UserSubscription/UsageMetric models – zero PM artifacts.
The output storage bug was resolved, and the content was now perfectly visible in the UI.

What's Next? Continuous Improvement

With these critical fixes in place, we're back to building and refining. Our immediate next steps include:

UI Verification: Confirming the Implementation Prompt output is now visible for our patched workflow b71f6f2e (a simple user refresh should do the trick).
End-to-End Test: Running another full workflow to ensure both fixes work seamlessly for new generations.
Feature Exploration: Considering a "rent-a-persona" feature, potentially leveraging the generated prompts or handcrafted ones as a starting point.
Prompt Refinement: Continuously refining IMPLEMENTATION_PROMPT_SYSTEM based on real-world output quality across diverse workflow types.
Metadata Tracking: Adding completedAt tracking to the implementation prompt step for better auditing and analytics.

Conclusion

This session was a stark reminder that building AI-driven systems involves a dual challenge: not only mastering the art of prompt engineering to guide the LLM effectively but also ensuring the underlying software infrastructure correctly handles, stores, and presents that AI-generated data. Every bug is a lesson, and in this case, we learned valuable ones about explicit prompt instructions, data consistency, and database type handling. Onwards to smarter, more visible AI-powered development!

json

{
  "thingsDone": [
    "Strengthened IMPLEMENTATION_PROMPT_SYSTEM to ban PM artifacts and demand code-level plans.",
    "Fixed output storage bug: storing full LLMCompletionResult object instead of raw content string.",
    "Patched existing production workflow's output in the database to correct JSON structure.",
    "Deployed both code fixes to production.",
    "Verified new workflows produce correct, visible code-level implementation prompts."
  ],
  "pains": [
    "AI generated project management plans instead of code plans.",
    "Correct AI output was not visible in the UI due to incorrect data storage.",
    "Encountered PostgreSQL 'substring(jsonb) does not exist' error during debugging."
  ],
  "successes": [
    "Achieved precise, code-focused output from the LLM.",
    "Restored visibility of AI-generated prompts in the UI.",
    "Successfully retroactively fixed production data.",
    "Gained deeper understanding of prompt engineering nuances and data handling best practices."
  ],
  "techStack": [
    "TypeScript",
    "Node.js",
    "PostgreSQL",
    "LLM (GPT-4 via OpenAI)",
    "tRPC",
    "Prisma"
  ]
}