Human in the Loop: Unleashing Smart Review Steps in Our LLM Workflow Engine
We've just shipped a major update to our LLM workflow engine: Smart Review Steps. This feature empowers developers to integrate human oversight, analysis, and decision-making directly into AI-powered processes, ensuring quality and control.
In the rapidly evolving world of AI-driven development, the power of Large Language Models (LLMs) is undeniable. They can automate complex tasks, generate creative content, and accelerate development cycles. However, even the most sophisticated LLMs benefit from a crucial ingredient: human intelligence. That's why we've been laser-focused on implementing Smart Review Steps in our workflow engine – a system designed to seamlessly integrate human oversight into LLM-powered processes.
This past session was a sprint to bring this vision to life, culminating in a robust first phase of our Smart Review Steps. We've gone from concept to a fully functional, pausing LLM workflow with a comprehensive review UI.
The "Why" Behind Smart Review Steps
Imagine an LLM-driven workflow generating code, designing features, or analyzing security vulnerabilities. While incredibly efficient, there are critical junctures where a human expert needs to step in.
- Validation: Is the LLM's output accurate, ethical, or aligned with project goals?
- Guidance: Does the LLM need a course correction based on new information or a nuanced understanding?
- Accountability: Who takes responsibility for the final output?
Smart Review Steps address these needs head-on. They transform a passive checkpoint into an active, analytical pause where an LLM first performs a detailed review, then presents its findings to a human for approval, feedback, or re-evaluation.
What We Shipped: Phase 1 Complete!
Our recent development push focused on building the core infrastructure for these intelligent checkpoints. Here’s a breakdown of what we've accomplished:
Workflow Engine & Backend Enhancements
- Introducing
review_readyEvents: We added a newreview_readyevent type to ourWorkflowEventunion. This is the signal that tells our workflow engine: "Hold on, human input required!" When an LLM-powered review step completes its analysis, it now yields this event, effectively pausing the workflow.typescript// src/server/services/workflow-engine.ts:59 type WorkflowEvent = | { type: 'step_completed'; ... } | { type: 'review_ready'; ... } // New event! // ... other events - Persistent Review Notes: User-provided notes during a review are crucial for context. We've extended our
ChainContextinterface to includestepReviewNotes: Map<string, string>, ensuring these notes are extracted from completed steps and persist throughout the workflow. - Dynamic Prompt Templating with Notes: One of the most powerful additions is the
{{steps.Label.notes}}template variable. This allows downstream LLM steps to dynamically incorporate user-written review notes from previous steps directly into their prompts. This creates a truly adaptive workflow where human feedback directly influences subsequent AI actions.typescript// Example prompt fragment "Based on the user's review notes for the 'Design Features' step: {{steps.Design Features.notes}}, refine the implementation plan." - Intelligent LLM Execution & Pausing: The heart of the review step now involves full LLM execution. When a review step is encountered, our
executeStep()function now:- Runs a specialized LLM prompt (e.g.,
deepReview1,secReview). - Stores the LLM's output, digest, token usage, and cost.
- Sets the step status to
"pending". - Yields the
review_readyevent, pausing the workflow. - Crucially, even if the LLM encounters an error during its analysis, the workflow still pauses, allowing a human to review the error and decide on the next action.
- Runs a specialized LLM prompt (e.g.,
- Robust Workflow Control:
- The
resumemutation now accepts optionalreviewNotes(up to 5000 characters), which are stored directly in the checkpoint. - We implemented a
retryFromReviewmutation. This allows users to "rewind" the workflow, optionally update the prompt of a target step, reset that step and all subsequent ones, and restart the process from the chosen point. This provides unparalleled flexibility and error correction capabilities.
- The
User Interface & Experience
- Comprehensive Review Panel UI: We built a dedicated review panel within our dashboard. This UI (located at
src/app/(dashboard)/dashboard/workflows/[id]/page.tsx) provides:- A
MarkdownRendererto beautifully display the LLM's detailed analysis. - A collapsible viewer for the original review prompt.
- A notes textarea, complete with a hint for
{{steps.Review.notes}}to encourage structured feedback. - Clear action buttons: "Approve & Continue" to proceed.
- A "Retry a previous step" dropdown, allowing users to select any prior step to retry, even offering the ability to edit its prompt before restarting.
- A
- Historical Context: Completed review steps now clearly display both the prompt that was used and the stored review notes, providing a full audit trail.
- Specialized Review Prompts: We integrated
systemPromptinto four key review templates (deepReview1,deepReview2,extensionReview,secReview) insrc/lib/constants.ts. This ensures our LLMs receive precise instructions for their analytical tasks.
All these changes passed npm run typecheck with flying colors, maintaining our commitment to type safety and code quality.
Navigating the Rapids: Lessons Learned
No significant development push is without its challenges. Here are a few "pains" we encountered and the valuable lessons we extracted:
- TypeScript Execution in Node:
- Challenge: Initially, I tried to run workflow engine logic directly via
node -e "require('./src/server/services/workflow-engine')"for quick testing. Node.js doesn't natively understand TypeScript. - Lesson: For running TypeScript files directly,
npx tsxis an absolute lifesaver. It transparently compiles and executes TypeScript, making script development much smoother.npx tsx scripts/test-review-step.tsbecame our go-to.
- Challenge: Initially, I tried to run workflow engine logic directly via
- Shell Escaping Woes:
- Challenge: Attempting inline
npx tsx -e "..."commands, especially when dealing with complex strings containing special characters like!in bash heredocs, led to frustrating shell escaping errors and esbuild syntax issues. - Lesson: For anything beyond the simplest inline script, writing the code to a
.tsfile and then executing it withnpx tsxis far more robust and less prone to shell-specific escaping headaches. Simplicity often wins over clever one-liners.
- Challenge: Attempting inline
- Dev Server Management:
- Challenge: During rapid development, it's easy to end up with multiple dev servers occupying port 3000, or stale
.nextcaches causing unexpected behavior. - Lesson: A robust development startup script is essential. Our
scripts/dev-start.shhandles this beautifully: it kills old processes on port 3000 (and other common dev ports), clears the.nextcache, and regenerates the Prisma client. This ensures a clean slate every time, saving valuable debugging time.
- Challenge: During rapid development, it's easy to end up with multiple dev servers occupying port 3000, or stale
What's Next? The Road Ahead
With the core Smart Review Steps in place, our immediate focus shifts to enhancing the actionable insights derived from LLM analysis:
- Actionable Key Points: We'll be extracting structured lists of key points from the LLM's review analysis. Each point will come with suggested actions: "Recreate with hints," "Keep as is," "Edit manually," or "Discard."
- Multi-Actions & Granular Control: We plan to implement multi-actions like "Accept All," "Recreate All (from review criteria)," and "Discard All & Recreate from Design Features." This requires new engine logic to parse review output, new tRPC mutations for item-level actions, and sophisticated UI components.
- Workflow Overview Enhancements: To provide better visibility and control, we'll be adding workflow overview enhancements, including total costs, step counts, token usage, creation time, and iteration history directly in the workflow list view.
- Beyond the Horizon: Our long-term vision includes a "team-based approach with expert agents" for implementation and testing, pushing the boundaries of autonomous yet human-guided development.
The journey to blend AI's power with human precision is an exciting one. Smart Review Steps are a significant leap forward in creating intelligent, accountable, and truly collaborative LLM-powered workflows. Stay tuned for more updates as we continue to build the future of AI-assisted development!