From Digest Disasters to Prompt Precision: A Workflow Developer's Log
Join us as we dive into a recent development session, tackling elusive workflow bugs, wrestling with LLM prompt nuances, and bringing critical performance insights to life in a complex system.
It was a late afternoon session, the kind where the clock seems to speed up, but the bugs only get more stubborn. My goal for this sprint was multifaceted: squash a critical workflow digest bug, refine our Ipcha Mistabra (our internal adversarial analysis workflow) LLM prompts for better output, integrate NerdStats for much-needed visibility, and finally, get our cost rates fully covered. By the time the dust settled, all major fixes were deployed, and a new, smarter Ipcha Mistabra run was underway.
Here's a breakdown of the challenges we faced and the solutions we implemented.
The Case of the Missing Content: Taming the Fan-Out Digest
One of our core workflows involves a "fan-out" pattern: a single step might trigger 12 parallel adversarial analyses. The output of these analyses then needs to be aggregated and passed downstream to a Synthesis step. This is where our first major headache began.
The Problem: We were using {{steps.Label.content}} to access the output of these fan-out steps. Intuitively, you'd expect this to give you the full combined content. Instead, our workflow engine's digest mechanism, designed to provide a concise summary for single-output steps, was kicking in. It would lossy-compress all 12 adversarial analyses into a mere haiku-like summary, effectively starving the downstream Synthesis step of critical data.
Pain Point: {{steps.Adversarial Analysis.content}} was returning a Haiku, not the full 12-analysis output. The digest was too aggressive.
The Fix: The root cause lay in src/server/services/workflow-engine.ts around line 403-413. The digest auto-preference was applied universally. My change now explicitly skips this auto-preference for steps that have subOutputs (our indicator for fan-out steps). This ensures that when you ask for .content from a fan-out step, you get the complete, uncompressed, combined output. The .digest accessor still works if you explicitly want a summary.
Lesson Learned: When dealing with complex workflow patterns like fan-out, be extremely explicit about content access. Default summarization can be a silent killer of downstream data integrity.
Whispering to the Oracle: Mastering LLM Prompt Engineering
Our Ipcha Mistabra workflow relies heavily on LLMs for both arbitration (judging adversarial analyses) and synthesizing final results. Getting these prompts just right is less about coding and more about the art of clear communication with a non-human intelligence.
Arbitration: Judging the Subject, Not the Process
The Problem: Our initial arbitration prompt, "Judge the following adversarial analysis process," was leading the LLM astray. Instead of evaluating the subject of the analysis (e.g., an OFFPAD AS product), the LLM was critiquing the methodology of the adversarial analysis itself. To make matters worse, it was often returning raw JSON, mimicking an internal dual-provider judge format, which was not what we wanted for a human-readable judgment.
Pain Point: LLM evaluated the methodology, not the product. Output was often JSON.
The Fix: In src/server/trpc/routers/workflows.ts (line 660), I refined the prompt to: "Judge the SUBJECT of the adversarial analyses. Do NOT evaluate the analysis process itself. Provide a human-readable markdown response; do NOT output JSON or code blocks."
Lesson Learned: Precision in prompt engineering is paramount. Explicitly state both what the LLM should do and, crucially, what it should NOT do. Specifying desired output format (markdown) and forbidding undesired ones (JSON/code blocks) is key.
Results: From Structured JSON to Executive Summary
The Problem: Similarly, our Results prompt, designed to classify findings as "pain_point" or "strength," was causing our gemini-2.5-pro model to output structured JSON arrays. While technically correct according to the prompt, it wasn't the human-friendly executive summary we needed.
Pain Point: Gemini-2.5-pro output structured JSON array instead of readable markdown.
The Fix: At src/server/trpc/routers/workflows.ts (line 673), the prompt was rewritten: "Write a human-readable executive summary of the findings, including sections for Strengths, Critical Risks, Rejected Claims, and an Overall Assessment. Do NOT output JSON or code blocks."
Lesson Learned: Even with structured prompts, LLMs will sometimes default to what they perceive as "structured" (JSON). Always explicitly ask for human-readable formats and reiterate constraints.
Unwanted Steps: Disabling generatePrompt
The Problem: A subtle issue was that our createIpcha function was inheriting a schema default of generatePrompt: true. This was causing an unwanted Implementation Prompt step to be appended to our Ipcha workflows, cluttering the process.
The Fix: In src/server/trpc/routers/workflows.ts (line 615), createIpcha now explicitly sets generatePrompt: false.
Lesson Learned: Always review default schema values and hardcode overrides when a specific behavior is consistently required for a given workflow type.
Illuminating the Black Box: Introducing NerdStats
Understanding the performance and cost of our complex, LLM-driven workflows is crucial. Previously, this was a murky area.
The Fixes:
- Workflow Page NerdStats: I added a
NerdStatscomponent tosrc/app/(dashboard)/dashboard/workflows/[id]/page.tsx. This component provides a per-phase and per-provider breakdown, aggregating data from all steps and fan-outsubOutputs. Now, users can see where time and money are being spent within a workflow. - Per-Provider Table in Summary Export: For our workflow bundle exports (
src/server/services/workflow-bundle.ts), I added a "Per Provider" section to thesummary.md. This gives a clear overview of LLM provider usage and associated costs in a portable format. - Complete Cost Rates: A critical piece for accurate
NerdStatswas ensuring we had up-to-date cost rates. Insrc/server/services/llm/types.ts, I addedgemini-2.5-pro(at 10 per 1M tokens) and all Ollama models (free, for our internal/local testing). Previously, Gemini Pro costs were showing as$0.000000, rendering any cost analysis useless.
Lesson Learned: Visibility isn't just a nice-to-have; it's essential for debugging, optimization, and demonstrating value. Accurate cost attribution is foundational for any system leveraging external APIs.
The Current State and What's Next
All these fixes (across commits 22f855c, 05c0992, 9280162, 5a4e268, e3dd795) are now deployed to production. Workflow 2758e8c9-3416-446a-ac03-a1889f226e09 is our latest Ipcha test run, now benefiting from the improved prompts.
Our immediate next steps involve:
- Verifying that the
Resultsstep in2758e8c9indeed outputs clean markdown. - Confirming Gemini cost calculations are now accurate in
NerdStats. - Considering a UI toggle for
generatePrompton theIpchapage for future flexibility. - A long-standing item: adding Stripe environment variables to production and restarting the container.
- Investigating if we can retroactively calculate costs for old, completed workflows.
This session was a great reminder that building robust, intelligent systems is an iterative dance between identifying subtle bugs, wrestling with the nuances of LLM interaction, and continuously improving observability. Each "pain point" transformed into a valuable "lesson learned," making our system more reliable and transparent.
{
"thingsDone": [
"Fixed workflow fan-out digest bug (skipping digest for subOutputs)",
"Rewrote Arbitration LLM prompt for subject-focused evaluation and markdown output",
"Rewrote Results LLM prompt for executive summary format and markdown output",
"Disabled generatePrompt for Ipcha workflow creation to remove unwanted steps",
"Added NerdStats component to workflow pages for per-phase/provider breakdown",
"Added Per Provider section to workflow bundle export summary.md",
"Completed LLM cost rates for gemini-2.5-pro and Ollama models"
],
"pains": [
"Fan-out digest lossy-compressed 12 analyses into a brief summary, starving downstream steps",
"Arbitration prompt led LLM to evaluate methodology instead of subject, outputting JSON",
"Results prompt caused Gemini-2.5-pro to output structured JSON array instead of human-readable markdown"
],
"successes": [
"All critical fixes deployed to production",
"New Ipcha Mistabra workflow running with improved prompts and correct outputs",
"Enhanced visibility into workflow performance and costs with NerdStats",
"Accurate cost calculation for previously mispriced LLM models"
],
"techStack": [
"TypeScript",
"Next.js",
"tRPC",
"LLMs (Gemini, Ollama)",
"Workflow Engine (custom)",
"Markdown"
]
}