Beyond Stubs: Our AI Assistant Now Writes Gold-Standard TypeScript Implementations
We tackled the challenge of getting an AI to generate complete, production-ready TypeScript code, moving past generic stubs to match and even exceed our hand-crafted gold standard for a 'rent-a-persona' feature.
The dream of AI-assisted development often conjures images of perfectly formed, production-ready code springing forth from a simple prompt. The reality, however, can sometimes be a little less magical: stubs, partial implementations, or even completely off-topic suggestions. Our recent development sprint was all about closing this gap, pushing our internal AI workflow to generate not just code, but gold-standard, complete TypeScript implementations that truly understood the bigger picture.
The Challenge: Bridging the Vision Gap
Our internal development workflow leverages powerful LLMs to auto-generate implementation prompts – essentially, detailed plans for our developers to build specific features. The goal was ambitious: make these AI-generated prompts indistinguishable from (or even better than) a meticulously hand-crafted one. Specifically, we aimed for complete TypeScript code, with codebase-grounded paths and no frustrating stubs.
However, we hit a roadblock. Previous workflow runs for our new "rent-a-persona" feature kept producing implementation prompts for a billing system. While billing is crucial, it definitely wasn't the core feature we were trying to build!
The root cause became clear: our implementation prompt generator, while excellent at analyzing individual workflow steps, was suffering from a kind of LLM myopia. It saw the trees (individual tasks like "handle authentication" or "manage data"), but it couldn't see the forest – the overarching feature goal of "build a rent-a-persona API." It would analyze supporting infrastructure (like security or audit logs) and infer that the primary task was something generic, rather than the specific, user-facing feature.
The Breakthrough: Injecting Intent
The solution, once identified, felt elegantly simple: we needed to explicitly inject the workflow's goal directly into the implementation prompt generator.
Here's how we did it:
- Model Upgrade: First, we ensured our system had access to the most capable models for this task. We added
gemini-2.5-proto ourMODEL_CATALOG, recognizing its advanced reasoning capabilities would be crucial. - Explicit Goal Injection: We introduced a new
workflowGoalfield into ourPromptInputParams. This field takes theworkflow.nameandworkflow.descriptionand renders them prominently at the top of the user message to the LLM as a dedicated# FEATURE GOALsection. - Workflow Integration: We hooked this up in our
workflow-engine.ts, ensuring that the overall workflow's name and description were passed down to the implementation prompt builder. - Testing & Verification: Naturally, all relevant unit tests were updated to reflect this new
workflowGoalparameter, ensuring robustness.
This change meant the LLM now received a clear, unambiguous directive: "Synthesize ALL steps into a single cohesive plan for this specific feature."
The Results: Gold Standard (and Beyond!)
The impact was immediate and dramatic. With workflow ddd599a1, the auto-generated implementation prompt clocked in at a staggering 610 lines of complete, high-quality TypeScript code. This wasn't just a plan; it was a near-complete blueprint covering:
- A full persona completion API
- Robust authentication middleware
- Streaming capabilities
- Prompt injection defense mechanisms
- tRPC router integration
- Dashboard UI considerations
This output not only matched our hand-crafted gold standard reference (which was 477 lines) but exceeded it in completeness and detail. We had moved beyond generic stubs to a truly codebase-grounded, production-ready implementation plan.
Our primary implementation prompt provider for this success was google (gemini-2.5-pro), leveraging its large context window (maxTokens: 16384 for the API call, MAX_TOTAL_CONTEXT: 60000 for input assembly) to process the extensive workflow details.
Lessons Learned: Navigating the Bumps
Even with a breakthrough, development always comes with its share of practical challenges:
- Context is King: The most significant lesson was the critical importance of providing explicit, high-level context to LLMs. Without the
workflowGoal, even the most advanced models can get lost in the weeds. This reinforced the power of thoughtful prompt engineering. - Operational Hiccups: During deployment, we encountered a
docker container name conflict. A quickdocker rm -ffollowed bydocker compose up -d appresolved it, reminding us that even sophisticated AI systems rely on solid DevOps fundamentals. - Security First: A critical reminder came when a user's Google API key was inadvertently exposed in chat. This highlighted the continuous need for vigilance and user education on securing sensitive credentials. (Action: user was reminded to rotate their key immediately).
- Resource Management: Our Anthropic credits were depleted, causing some ancillary services like step digest compression and consistency checks to fail. This underscored the dependency on a balanced multi-model strategy and consistent resource provisioning for all workflow components.
What's Next?
With this major milestone achieved, our immediate next steps include:
- Security Action: The user needs to rotate their exposed Google API key.
- Real-World Application: We're excited to use the auto-generated implementation prompt from workflow
ddd599a1to actually build out the "rent-a-persona" feature. - Restore Services: Top up Anthropic credits to restore full functionality for our step digest and consistency checks.
- Model Expansion (Optional): Explore adding more cutting-edge models like
gemini-3-pro-previewandgemini-3.1-pro-previewto ourMODEL_CATALOGto further enhance capabilities.
This journey has been a testament to the power of targeted prompt engineering and the continuous refinement of our AI-assisted development workflows. We're excited to continue pushing the boundaries of what's possible, generating not just code, but truly intelligent and complete solutions.