Human in the Loop: Unleashing Smart Review Steps in Our LLM Workflow Engine

In the rapidly evolving world of AI-driven development, the power of Large Language Models (LLMs) is undeniable. They can automate complex tasks, generate creative content, and accelerate development cycles. However, even the most sophisticated LLMs benefit from a crucial ingredient: human intelligence. That's why we've been laser-focused on implementing Smart Review Steps in our workflow engine – a system designed to seamlessly integrate human oversight into LLM-powered processes.

This past session was a sprint to bring this vision to life, culminating in a robust first phase of our Smart Review Steps. We've gone from concept to a fully functional, pausing LLM workflow with a comprehensive review UI.

The "Why" Behind Smart Review Steps

Imagine an LLM-driven workflow generating code, designing features, or analyzing security vulnerabilities. While incredibly efficient, there are critical junctures where a human expert needs to step in.

Validation: Is the LLM's output accurate, ethical, or aligned with project goals?
Guidance: Does the LLM need a course correction based on new information or a nuanced understanding?
Accountability: Who takes responsibility for the final output?

Smart Review Steps address these needs head-on. They transform a passive checkpoint into an active, analytical pause where an LLM first performs a detailed review, then presents its findings to a human for approval, feedback, or re-evaluation.

What We Shipped: Phase 1 Complete!

Our recent development push focused on building the core infrastructure for these intelligent checkpoints. Here’s a breakdown of what we've accomplished:

Workflow Engine & Backend Enhancements

Introducing review_ready Events: We added a new review_ready event type to our WorkflowEvent union. This is the signal that tells our workflow engine: "Hold on, human input required!" When an LLM-powered review step completes its analysis, it now yields this event, effectively pausing the workflow.
typescript
```
// src/server/services/workflow-engine.ts:59
type WorkflowEvent =
  | { type: 'step_completed'; ... }
  | { type: 'review_ready'; ... } // New event!
  // ... other events
```
Persistent Review Notes: User-provided notes during a review are crucial for context. We've extended our ChainContext interface to include stepReviewNotes: Map<string, string>, ensuring these notes are extracted from completed steps and persist throughout the workflow.
Dynamic Prompt Templating with Notes: One of the most powerful additions is the {{steps.Label.notes}} template variable. This allows downstream LLM steps to dynamically incorporate user-written review notes from previous steps directly into their prompts. This creates a truly adaptive workflow where human feedback directly influences subsequent AI actions.
typescript
```
// Example prompt fragment
"Based on the user's review notes for the 'Design Features' step: {{steps.Design Features.notes}}, refine the implementation plan."
```
Intelligent LLM Execution & Pausing: The heart of the review step now involves full LLM execution. When a review step is encountered, our executeStep() function now:
- Runs a specialized LLM prompt (e.g., deepReview1, secReview).
- Stores the LLM's output, digest, token usage, and cost.
- Sets the step status to "pending".
- Yields the review_ready event, pausing the workflow.
- Crucially, even if the LLM encounters an error during its analysis, the workflow still pauses, allowing a human to review the error and decide on the next action.
Robust Workflow Control:
- The resume mutation now accepts optional reviewNotes (up to 5000 characters), which are stored directly in the checkpoint.
- We implemented a retryFromReview mutation. This allows users to "rewind" the workflow, optionally update the prompt of a target step, reset that step and all subsequent ones, and restart the process from the chosen point. This provides unparalleled flexibility and error correction capabilities.

User Interface & Experience

Comprehensive Review Panel UI: We built a dedicated review panel within our dashboard. This UI (located at src/app/(dashboard)/dashboard/workflows/[id]/page.tsx) provides:
- A MarkdownRenderer to beautifully display the LLM's detailed analysis.
- A collapsible viewer for the original review prompt.
- A notes textarea, complete with a hint for {{steps.Review.notes}} to encourage structured feedback.
- Clear action buttons: "Approve & Continue" to proceed.
- A "Retry a previous step" dropdown, allowing users to select any prior step to retry, even offering the ability to edit its prompt before restarting.
Historical Context: Completed review steps now clearly display both the prompt that was used and the stored review notes, providing a full audit trail.
Specialized Review Prompts: We integrated systemPrompt into four key review templates (deepReview1, deepReview2, extensionReview, secReview) in src/lib/constants.ts. This ensures our LLMs receive precise instructions for their analytical tasks.

All these changes passed npm run typecheck with flying colors, maintaining our commitment to type safety and code quality.

Navigating the Rapids: Lessons Learned

No significant development push is without its challenges. Here are a few "pains" we encountered and the valuable lessons we extracted:

TypeScript Execution in Node:
- Challenge: Initially, I tried to run workflow engine logic directly via node -e "require('./src/server/services/workflow-engine')" for quick testing. Node.js doesn't natively understand TypeScript.
- Lesson: For running TypeScript files directly, npx tsx is an absolute lifesaver. It transparently compiles and executes TypeScript, making script development much smoother. npx tsx scripts/test-review-step.ts became our go-to.
Shell Escaping Woes:
- Challenge: Attempting inline npx tsx -e "..." commands, especially when dealing with complex strings containing special characters like ! in bash heredocs, led to frustrating shell escaping errors and esbuild syntax issues.
- Lesson: For anything beyond the simplest inline script, writing the code to a .ts file and then executing it with npx tsx is far more robust and less prone to shell-specific escaping headaches. Simplicity often wins over clever one-liners.
Dev Server Management:
- Challenge: During rapid development, it's easy to end up with multiple dev servers occupying port 3000, or stale .next caches causing unexpected behavior.
- Lesson: A robust development startup script is essential. Our scripts/dev-start.sh handles this beautifully: it kills old processes on port 3000 (and other common dev ports), clears the .next cache, and regenerates the Prisma client. This ensures a clean slate every time, saving valuable debugging time.

What's Next? The Road Ahead

With the core Smart Review Steps in place, our immediate focus shifts to enhancing the actionable insights derived from LLM analysis:

Actionable Key Points: We'll be extracting structured lists of key points from the LLM's review analysis. Each point will come with suggested actions: "Recreate with hints," "Keep as is," "Edit manually," or "Discard."
Multi-Actions & Granular Control: We plan to implement multi-actions like "Accept All," "Recreate All (from review criteria)," and "Discard All & Recreate from Design Features." This requires new engine logic to parse review output, new tRPC mutations for item-level actions, and sophisticated UI components.
Workflow Overview Enhancements: To provide better visibility and control, we'll be adding workflow overview enhancements, including total costs, step counts, token usage, creation time, and iteration history directly in the workflow list view.
Beyond the Horizon: Our long-term vision includes a "team-based approach with expert agents" for implementation and testing, pushing the boundaries of autonomous yet human-guided development.

The journey to blend AI's power with human precision is an exciting one. Smart Review Steps are a significant leap forward in creating intelligent, accountable, and truly collaborative LLM-powered workflows. Stay tuned for more updates as we continue to build the future of AI-assisted development!