Beyond Text: Structuring LLM Review Feedback for Smarter Workflows
Turning unstructured LLM review notes into actionable, editable key points was our latest challenge. Dive into how we built a system for extracting, persisting, and acting on AI-generated feedback in our workflow engine.
The promise of large language models in automation is immense. They can summarize, generate, and even critically review complex outputs. But there's a common hurdle: how do you turn a beautifully articulated, free-form LLM response like "This looks good, but consider X and Y for improved clarity" into something your system can act upon? Something that users can edit, accept, or use to guide the next step in a workflow?
That was the core problem we tackled in our latest development sprint. Our goal: to transform unstructured LLM review feedback into discrete, actionable key points within our workflow engine, making our automated processes far more interactive and intelligent.
The Challenge: From Prose to Precision
Our existing workflow engine already incorporated LLM-powered review steps. An LLM would evaluate a generated output (e.g., a code snippet, a document draft) and provide feedback. While valuable, this feedback was often a block of text, requiring manual parsing by the user to identify actionable items. We wanted to:
- Automatically extract specific, actionable suggestions.
- Present these suggestions in an editable, structured format.
- Allow users to interact with them (keep, discard, edit).
- Use these points to inform subsequent workflow steps, closing the feedback loop.
Phase 1: Taming the LLM — Structured Extraction with Haiku
The first hurdle was getting the LLM to consistently output structured data. We turned to Claude Haiku (claude-haiku-4-5-20251001) for this, leveraging its excellent performance with JSON mode.
We created src/server/services/review-key-points.ts with an extractKeyPoints() function. This function takes the raw LLM review output and prompts Haiku to return a JSON array of key point objects. Each object includes fields like id, text, details, and severity.
// src/server/services/review-key-points.ts (conceptual snippet)
import Anthropic from '@anthropic-ai/sdk';
// import { ReviewKeyPoint } from '~/types/review'; // Future shared type
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
export async function extractKeyPoints(reviewOutput: string): Promise<any[]> { // Using 'any' for brevity
const prompt = `Given the following review output, extract up to 50 actionable key points. Each key point should be a JSON object with 'id' (UUID), 'text' (max 200 chars), 'details' (max 2000 chars), and 'severity' (enum: 'CRITICAL', 'HIGH', 'MEDIUM', 'LOW', 'INFO').
Review Output: "${reviewOutput}"
Return only a JSON array of ReviewKeyPoint objects.`;
const response = await anthropic.messages.create({
model: 'claude-haiku-4-5-20251001',
max_tokens: 4000, // Ample tokens for potential 50 items
messages: [{ role: 'user', content: prompt }],
});
const rawJson = response.content[0].text;
// ... add validation, UUID assignment, truncation, and cap at 50 items
// This ensures data integrity and prevents runaway storage/display issues.
return JSON.parse(rawJson) as any[]; // Cast to any for now
}
This function is called immediately after a review step's LLM output is generated in src/server/services/workflow-engine.ts (around line 777), with the results stored in checkpoint.keyPoints.
Phase 2: Persistence and User Interaction
Once extracted, these key points needed to be stored and then presented to the user for interaction.
- Storing Key Points: We leveraged an existing
Jsonfield on ourWorkflowStep'scheckpointin Prisma. This allowed us to store theReviewKeyPoint[]array directly.- Lesson Learned: Prisma
JsonTyping: A minor but common hurdle surfaced here. Directly assigningRecord<string, unknown>[]to a PrismaJsonfield often throws aTS2322error (Type '{ keyPoints: Record<string, unknown>[]; }' is not assignable to type 'InputJsonValue'). The standard workaround, which we've adopted across our codebase for flexibility with untyped JSON fields, is to cast it:as unknown as Prisma.InputJsonValue.
- Lesson Learned: Prisma
- User Actions: We added a new tRPC mutation,
updateKeyPointsinsrc/server/trpc/routers/workflows.ts. This mutation allows the client to send updates for specific key points — marking them askept,discarded, or applying inline edits. The server then merges these changes into thecheckpoint.keyPointsarray, ensuring persistence.
Phase 3: Closing the Loop — Actionable Feedback
This is where the extracted key points truly shine: guiding subsequent workflow steps.
- Recreate with Hints: The
recreateFromKeyPointmutation is a powerful addition. If a user identifies a critical key point, they can choose to "Recreate with Hints." This mutation:- Resets the current and all subsequent workflow steps.
- Takes the selected key points.
- Injects them as a structured "hint block" directly into the next LLM prompt for the target step.
- Lesson Learned: Preventing Hint Accumulation: An important code review catch prevented a potential disaster here. Simply appending hints to a prompt would lead to infinite accumulation if a step was retried multiple times. Our solution: before appending new hints, we strip any previous hint blocks using a defined
HINT_SEPARATOR. This ensures the prompt remains clean and relevant. The logic looks something likeprompt.split(HINT_SEPARATOR)[0]before appending new hints.
- Preserving State: We also fixed existing mutations like
resumeandretryFromReview. Initially, these were overwriting the entirecheckpointwith just newreviewNotes, inadvertently destroying thekeyPointswe had just extracted.- Lesson Learned: State Merging vs. Overwriting: A classic state management pitfall. When updating a subset of fields on an existing object (like a workflow checkpoint), it's crucial to merge rather than overwrite. The fix was simple but crucial: always spread the existing checkpoint when updating specific fields:
{ ...existingCheckpoint, reviewNotes }. This ensures all other properties, including our preciouskeyPoints, are preserved.
- Lesson Learned: State Merging vs. Overwriting: A classic state management pitfall. When updating a subset of fields on an existing object (like a workflow checkpoint), it's crucial to merge rather than overwrite. The fix was simple but crucial: always spread the existing checkpoint when updating specific fields:
Phase 4: The User Interface — Bringing it to Life
No feature is complete without a user-friendly interface. We built src/components/workflow/review-key-points-panel.tsx to present the extracted data.
This panel features:
- A severity summary bar (e.g., "3 Critical, 5 High") for a quick overview.
- A grouped list of key points, allowing users to quickly scan and filter.
- Per-item actions: Keep, edit (with inline editing), and discard.
- Bulk actions: "Accept All," "Recreate All (Review Criteria)" (injecting all kept points as hints), and "Discard All & Recreate from Source."
This interactive panel is seamlessly integrated into src/app/(dashboard)/dashboard/workflows/[id]/page.tsx, appearing on pending review steps and as a read-only view on completed ones.
Beyond Key Points: Workflow Metadata Enhancements
While focused on key points, we also took the opportunity to enhance our main workflow list (src/app/(dashboard)/dashboard/workflows/page.tsx). Each workflow card now displays aggregated metadata like:
- Total cost and tokens consumed by the workflow.
- Step progress (e.g., "5/10 steps completed").
- Total duration.
- Creation date.
These small additions provide a much richer overview at a glance, improving the overall user experience of the dashboard.
What's Next?
This sprint delivered a significant leap forward in making our LLM-powered workflows truly actionable. As always, development is an ongoing process. Our immediate next steps include:
- Thorough end-to-end testing of the new features.
- Consolidating the
ReviewKeyPointtype into a sharedsrc/types/review.tsto reduce duplication. - Considering
rehype-sanitizefor ourMarkdownRendereras a defense-in-depth against potential XSS from LLM outputs. - Refactoring our monolithic workflow detail page (
~1640 lines!) into smaller, more manageable components.
Building robust, intelligent systems with LLMs involves more than just prompt engineering. It requires careful consideration of data extraction, persistence, user interaction, and closing the loop to make that AI intelligence truly actionable. This session was a solid step in that direction.
{
"thingsDone": [
"Implemented LLM-driven key point extraction from review outputs using Claude Haiku",
"Integrated key point storage into Prisma-backed workflow step checkpoints",
"Developed a comprehensive React UI panel for interactive key point management (keep, edit, discard)",
"Created a tRPC mutation for recreating workflow steps, injecting key point suggestions as hints",
"Fixed critical checkpoint merging logic in existing `resume` and `retryFromReview` mutations",
"Enhanced workflow list metadata display (cost, tokens, duration, progress, creation date)"
],
"pains": [
"Navigating Prisma Json field type incompatibility (TS2322) with `Record<string, unknown>[]`",
"Designing a mechanism to prevent infinite hint accumulation in LLM prompts during retries",
"Debugging and fixing accidental checkpoint overwriting instead of merging in state updates"
],
"successes": [
"Achieved robust structured JSON extraction from LLM outputs with Claude Haiku",
"Developed a highly interactive and functional client-side key point management panel",
"Successfully implemented seamless hint injection for guided, iterative workflow progression",
"Significantly improved workflow dashboard overview with enhanced metadata displays"
],
"techStack": [
"TypeScript",
"Next.js",
"tRPC",
"Prisma",
"Anthropic (Claude Haiku)",
"React",
"Zod (for validation, implied)"
]
}