Closing the Loop: Teaching Our AI Pipelines to Learn from Experience

Just finished a significant development sprint, and the air here is buzzing with that satisfying hum of a system that's just gotten a whole lot smarter and more capable. The goal for this session was ambitious: not just to add features, but to fundamentally evolve how our AI code pipelines operate, culminating in a true closed-loop learning system. And I'm thrilled to report: mission accomplished.

It's always a mix of focused coding, debugging head-scratchers, and those "aha!" moments that make it all worthwhile. Let's dive into what we shipped and the lessons we learned along the way.

What We Built: Elevating Our AI Pipelines

This session wasn't about one big thing, but several critical enhancements that collectively push our auto-fix and refactor pipelines into a new league.

1. Granular LLM Control: Provider & Model Chooser

Our AI pipelines leverage large language models for everything from detecting issues to generating fixes. Until now, the choice was largely hardcoded. We've introduced a new UI element, a LLM_PROVIDERS button group paired with a model input, directly into our auto-fix/page.tsx and refactor/page.tsx dialogs.

This means users can now explicitly select their preferred LLM provider (e.g., OpenAI, Anthropic, Google) and the specific model they want to use for a given pipeline run. This is crucial for:

Cost Optimization: Different models have different price points.
Performance Tuning: Specific models excel at certain tasks.
Feature Access: Some models offer unique capabilities.

We also added a provider badge to detail pages and list cards, so you can quickly see which LLM powered a particular run. Small detail, big UX win.

2. Fixing the Elusive Phase Desync

Every developer knows the pain of a UI that lies to you. We had a persistent bug where our pipeline detail pages ([id]/page.tsx) would always initially show the "scan" phase, regardless of the actual pipeline status. This was a classic frontend state management issue.

The fix was straightforward but critical:

typescript

// auto-fix/[id]/page.tsx & refactor/[id]/page.tsx
// Before: useState initialized to "scan"
// const [currentPhase, setCurrentPhase] = useState<RefactorPhase>("scan");

// After: Syncing with run status
useEffect(() => {
  if (run?.status) {
    setCurrentPhase(statusToPhase[run.status]);
  }
}, [run?.status]); // Re-run effect when run.status changes

By adding a useEffect hook that maps the run.status from our backend to the appropriate currentPhase, we ensure the UI accurately reflects the pipeline's progress from the moment the page loads. No more confusing "scan" states when your pipeline is already generating fixes!

3. Automating the Final Mile: PR Creation for Refactor

The refactor pipeline is all about suggesting improvements. But the final step — getting those improvements into the codebase — still required manual intervention. Not anymore!

We've added a new Phase 4: PR Creation to our refactor/pipeline.ts.

A new autoCreatePR checkbox in the UI and a corresponding tRPC router field allows users to opt-in.
The RefactorItem schema now includes prUrl and prNumber to track the generated PRs.
Our progress bar now proudly displays the "pr" phase.

Currently, this is enabled for single-file patches, automatically generating a pull request in the target repository. Multi-file changes are skipped for now, but the foundation is laid. This significantly reduces friction and accelerates the adoption of suggested refactorings.

4. The Big Leap: Closed-Loop Learning System (558 insertions!)

This is the crown jewel of this session. We've built a robust, end-to-end system that allows our AI pipelines to learn from their own runs, feeding insights back into future prompt injections. Think of it as giving our AI a long-term memory.

The Problem: LLMs are powerful but inherently stateless. They don't "remember" past interactions or the specific mistakes/successes of previous runs, even on the same codebase. This can lead to redundant suggestions, repeated errors, or missed opportunities for improvement.

Our Solution: Workflow Insights & Hybrid Search:

Here's the architectural flow:

Insight Extraction (pipeline-insight-extractor.ts): After an auto-fix or refactor pipeline completes, we now extract structured WorkflowInsight records. These capture details like:
- For auto-fix: specific issues detected and the fixes applied.
- For refactor: opportunities identified and the improvements generated. Crucially, these insights are stored with vector embeddings, allowing for semantic similarity searches.
Historical Learnings (pipeline-learnings.ts): Before a new pipeline run begins, this new module performs a hybrid search (combining keyword and vector similarity) against our WorkflowInsight store. It looks for insights relevant to the current codebase and task.
Prompt Injection: The retrieved "Historical Learnings" are then formatted into a concise Markdown block and injected directly into the LLM prompts for:
- issue-detector.ts
- fix-generator.ts
- opportunity-detector.ts
- improvement-generator.ts

typescript

// Conceptual example of prompt injection
const historicalLearnings = await getHistoricalLearnings(repoId, currentTask);

const prompt = `
You are an expert ${task} AI.
Consider the following historical learnings from similar tasks:

---
${historicalLearnings}
---

Now, analyze the code below and ${taskDescription}:
${codeSnippet}
`;

// LLM call with enhanced prompt

The Impact:

Reduced Hallucinations: The LLM is grounded in real past successes and failures.
Improved Relevance: Suggestions become more tailored to the specific context.
Faster Iteration: The AI learns from previous runs on the same repository, avoiding redundant or already-fixed issues.
Self-Improving System: This is a foundational step towards truly autonomous and adaptive developer tooling.

This system involved modifying 11 files and adding over 550 lines of code, touching schema, tRPC, and multiple core pipeline modules. It's fully wired end-to-end, and we're already seeing "Loaded historical learnings" messages in our SSE streams during test runs.

Lessons from the Trenches: The "Pain Log" Transformed

Not everything was smooth sailing. Here are a few key challenges and how we navigated them, turning "pain" into "learnings."

1. The `workflowId` Optionality Crisis (Database Schema Design)

The Problem: Our WorkflowInsight table was designed with workflowId as a non-nullable foreign key, assuming every insight would belong to a parent Workflow. However, the new closed-loop system generates insights directly from the pipeline runs, which don't have a parent workflowId in the same way. This led to frustrating type errors and database insertion failures.

The Learning: Not all data fits neatly into pre-conceived relational structures. Sometimes, a field that feels like it should always be present isn't. We had to make WorkflowInsight.workflowId optional (String?) in our Prisma schema, update the Workflow? relation, and adjust client-side types (InsightSearchResult.workflowId and knowledge-hub.ts InsightRow.workflowId) to accept string | null.

Takeaway: Be pragmatic with schema design. While strong relationships are good, forcing them where a natural parent doesn't exist can create unnecessary friction. A nullable foreign key is perfectly acceptable when the data source (like our pipeline) doesn't always provide one, especially if search queries don't strictly filter by it anyway.

2. Taming Prisma's `Json?` Fields (TypeScript Type Wrangling)

The Problem: Our config field on pipeline runs is stored as Json? in Prisma. When retrieving this on the client side, TypeScript often sees it as Prisma.JsonValue | null, which isn't directly assignable to Record<string, string> (or whatever specific shape we expect). This required repetitive type assertions.

The Learning: Prisma's Json type is powerful but can be a bit of a dance with TypeScript. We frequently needed to cast it:

typescript

// On detail pages for a single config
const config = run.config as Record<string, string>;

// On list pages where config might be missing on some items
const config = item.config as unknown as { config?: Record<string, string> };

Takeaway: Embrace as when dealing with Json fields, or consider more robust runtime validation (like Zod) if the schema for the JSON is complex and prone to change. For simple key-value pairs, as is often the path of least resistance.

The Road Ahead

With these changes pushed to main and typechecks clean, we're ready for the next phase. Immediate next steps include:

Manual Testing: Verifying LLM provider selection, automatic PR creation, and confirming "Historical Learnings" appear in SSE streams.
Insight Verification: Ensuring extracted insights correctly populate our WorkflowInsight table and appear in our MemoryPicker.
Future Enhancements: Considering a dedicated "Learnings" tab on pipeline detail pages and implementing deduplication logic for insights across repeated runs on the same repository.

This session marked a significant stride forward, particularly in our quest to build truly intelligent and self-improving developer tools. Seeing the closed-loop learning system come alive is incredibly rewarding, and I'm excited to see how it enhances our AI's capabilities over time.

json

{
  "thingsDone": [
    "Added LLM provider/model chooser UI to auto-fix and refactor pipelines",
    "Fixed phase desync bug in pipeline detail pages using useEffect",
    "Added automated PR creation (Phase 4) to Refactor pipeline",
    "Implemented a closed-loop learning system for LLMs (insight extraction, vector embeddings, hybrid search, prompt injection)",
    "Modified issue/fix/opportunity/improvement generators to accept historical learnings",
    "Made WorkflowInsight.workflowId optional in Prisma schema"
  ],
  "pains": [
    "Initial design of WorkflowInsight.workflowId as non-nullable FK caused type errors for pipeline-sourced insights",
    "Phase desync bug due to useState initialization not reflecting actual run status",
    "Prisma Json? field access requiring frequent `as Record<string, string>` casts"
  ],
  "successes": [
    "All 4 major features committed and pushed to main",
    "Learning loop fully wired end-to-end and functional",
    "Typecheck clean after significant changes",
    "Database schema updated successfully for RefactorItem and WorkflowInsight"
  ],
  "techStack": [
    "TypeScript",
    "Next.js",
    "React",
    "Prisma",
    "PostgreSQL",
    "tRPC",
    "LLM (Large Language Models)",
    "Vector Embeddings",
    "Hybrid Search"
  ]
}

What We Built: Elevating Our AI Pipelines

1. Granular LLM Control: Provider & Model Chooser

2. Fixing the Elusive Phase Desync

3. Automating the Final Mile: PR Creation for Refactor

4. The Big Leap: Closed-Loop Learning System (558 insertions!)

Lessons from the Trenches: The "Pain Log" Transformed

1. The workflowId Optionality Crisis (Database Schema Design)

2. Taming Prisma's Json? Fields (TypeScript Type Wrangling)

The Road Ahead

1. The `workflowId` Optionality Crisis (Database Schema Design)

2. Taming Prisma's `Json?` Fields (TypeScript Type Wrangling)