Beyond Text: Structuring LLM Review Feedback for Smarter Workflows

The promise of large language models in automation is immense. They can summarize, generate, and even critically review complex outputs. But there's a common hurdle: how do you turn a beautifully articulated, free-form LLM response like "This looks good, but consider X and Y for improved clarity" into something your system can act upon? Something that users can edit, accept, or use to guide the next step in a workflow?

That was the core problem we tackled in our latest development sprint. Our goal: to transform unstructured LLM review feedback into discrete, actionable key points within our workflow engine, making our automated processes far more interactive and intelligent.

The Challenge: From Prose to Precision

Our existing workflow engine already incorporated LLM-powered review steps. An LLM would evaluate a generated output (e.g., a code snippet, a document draft) and provide feedback. While valuable, this feedback was often a block of text, requiring manual parsing by the user to identify actionable items. We wanted to:

Automatically extract specific, actionable suggestions.
Present these suggestions in an editable, structured format.
Allow users to interact with them (keep, discard, edit).
Use these points to inform subsequent workflow steps, closing the feedback loop.

Phase 1: Taming the LLM — Structured Extraction with Haiku

The first hurdle was getting the LLM to consistently output structured data. We turned to Claude Haiku (claude-haiku-4-5-20251001) for this, leveraging its excellent performance with JSON mode.

We created src/server/services/review-key-points.ts with an extractKeyPoints() function. This function takes the raw LLM review output and prompts Haiku to return a JSON array of key point objects. Each object includes fields like id, text, details, and severity.

typescript

// src/server/services/review-key-points.ts (conceptual snippet)
import Anthropic from '@anthropic-ai/sdk';
// import { ReviewKeyPoint } from '~/types/review'; // Future shared type

const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

export async function extractKeyPoints(reviewOutput: string): Promise<any[]> { // Using 'any' for brevity
  const prompt = `Given the following review output, extract up to 50 actionable key points. Each key point should be a JSON object with 'id' (UUID), 'text' (max 200 chars), 'details' (max 2000 chars), and 'severity' (enum: 'CRITICAL', 'HIGH', 'MEDIUM', 'LOW', 'INFO').
  Review Output: "${reviewOutput}"
  Return only a JSON array of ReviewKeyPoint objects.`;

  const response = await anthropic.messages.create({
    model: 'claude-haiku-4-5-20251001',
    max_tokens: 4000, // Ample tokens for potential 50 items
    messages: [{ role: 'user', content: prompt }],
  });

  const rawJson = response.content[0].text;
  // ... add validation, UUID assignment, truncation, and cap at 50 items
  // This ensures data integrity and prevents runaway storage/display issues.
  return JSON.parse(rawJson) as any[]; // Cast to any for now
}

This function is called immediately after a review step's LLM output is generated in src/server/services/workflow-engine.ts (around line 777), with the results stored in checkpoint.keyPoints.

Phase 2: Persistence and User Interaction

Once extracted, these key points needed to be stored and then presented to the user for interaction.

Storing Key Points: We leveraged an existing Json field on our WorkflowStep's checkpoint in Prisma. This allowed us to store the ReviewKeyPoint[] array directly.
- Lesson Learned: Prisma Json Typing: A minor but common hurdle surfaced here. Directly assigning Record<string, unknown>[] to a Prisma Json field often throws a TS2322 error (Type '{ keyPoints: Record<string, unknown>[]; }' is not assignable to type 'InputJsonValue'). The standard workaround, which we've adopted across our codebase for flexibility with untyped JSON fields, is to cast it: as unknown as Prisma.InputJsonValue.
User Actions: We added a new tRPC mutation, updateKeyPoints in src/server/trpc/routers/workflows.ts. This mutation allows the client to send updates for specific key points — marking them as kept, discarded, or applying inline edits. The server then merges these changes into the checkpoint.keyPoints array, ensuring persistence.

Phase 3: Closing the Loop — Actionable Feedback

This is where the extracted key points truly shine: guiding subsequent workflow steps.

Recreate with Hints: The recreateFromKeyPoint mutation is a powerful addition. If a user identifies a critical key point, they can choose to "Recreate with Hints." This mutation:
1. Resets the current and all subsequent workflow steps.
2. Takes the selected key points.
3. Injects them as a structured "hint block" directly into the next LLM prompt for the target step.
- Lesson Learned: Preventing Hint Accumulation: An important code review catch prevented a potential disaster here. Simply appending hints to a prompt would lead to infinite accumulation if a step was retried multiple times. Our solution: before appending new hints, we strip any previous hint blocks using a defined HINT_SEPARATOR. This ensures the prompt remains clean and relevant. The logic looks something like prompt.split(HINT_SEPARATOR)[0] before appending new hints.
Preserving State: We also fixed existing mutations like resume and retryFromReview. Initially, these were overwriting the entire checkpoint with just new reviewNotes, inadvertently destroying the keyPoints we had just extracted.
- Lesson Learned: State Merging vs. Overwriting: A classic state management pitfall. When updating a subset of fields on an existing object (like a workflow checkpoint), it's crucial to merge rather than overwrite. The fix was simple but crucial: always spread the existing checkpoint when updating specific fields: { ...existingCheckpoint, reviewNotes }. This ensures all other properties, including our precious keyPoints, are preserved.

Phase 4: The User Interface — Bringing it to Life

No feature is complete without a user-friendly interface. We built src/components/workflow/review-key-points-panel.tsx to present the extracted data.

This panel features:

A severity summary bar (e.g., "3 Critical, 5 High") for a quick overview.
A grouped list of key points, allowing users to quickly scan and filter.
Per-item actions: Keep, edit (with inline editing), and discard.
Bulk actions: "Accept All," "Recreate All (Review Criteria)" (injecting all kept points as hints), and "Discard All & Recreate from Source."

This interactive panel is seamlessly integrated into src/app/(dashboard)/dashboard/workflows/[id]/page.tsx, appearing on pending review steps and as a read-only view on completed ones.

Beyond Key Points: Workflow Metadata Enhancements

While focused on key points, we also took the opportunity to enhance our main workflow list (src/app/(dashboard)/dashboard/workflows/page.tsx). Each workflow card now displays aggregated metadata like:

Total cost and tokens consumed by the workflow.
Step progress (e.g., "5/10 steps completed").
Total duration.
Creation date.

These small additions provide a much richer overview at a glance, improving the overall user experience of the dashboard.

What's Next?

This sprint delivered a significant leap forward in making our LLM-powered workflows truly actionable. As always, development is an ongoing process. Our immediate next steps include:

Thorough end-to-end testing of the new features.
Consolidating the ReviewKeyPoint type into a shared src/types/review.ts to reduce duplication.
Considering rehype-sanitize for our MarkdownRenderer as a defense-in-depth against potential XSS from LLM outputs.
Refactoring our monolithic workflow detail page (~1640 lines!) into smaller, more manageable components.

Building robust, intelligent systems with LLMs involves more than just prompt engineering. It requires careful consideration of data extraction, persistence, user interaction, and closing the loop to make that AI intelligence truly actionable. This session was a solid step in that direction.

json

{
  "thingsDone": [
    "Implemented LLM-driven key point extraction from review outputs using Claude Haiku",
    "Integrated key point storage into Prisma-backed workflow step checkpoints",
    "Developed a comprehensive React UI panel for interactive key point management (keep, edit, discard)",
    "Created a tRPC mutation for recreating workflow steps, injecting key point suggestions as hints",
    "Fixed critical checkpoint merging logic in existing `resume` and `retryFromReview` mutations",
    "Enhanced workflow list metadata display (cost, tokens, duration, progress, creation date)"
  ],
  "pains": [
    "Navigating Prisma Json field type incompatibility (TS2322) with `Record<string, unknown>[]`",
    "Designing a mechanism to prevent infinite hint accumulation in LLM prompts during retries",
    "Debugging and fixing accidental checkpoint overwriting instead of merging in state updates"
  ],
  "successes": [
    "Achieved robust structured JSON extraction from LLM outputs with Claude Haiku",
    "Developed a highly interactive and functional client-side key point management panel",
    "Successfully implemented seamless hint injection for guided, iterative workflow progression",
    "Significantly improved workflow dashboard overview with enhanced metadata displays"
  ],
  "techStack": [
    "TypeScript",
    "Next.js",
    "tRPC",
    "Prisma",
    "Anthropic (Claude Haiku)",
    "React",
    "Zod (for validation, implied)"
  ]
}