Teaching Our AI to Learn: Building a Closed-Loop System for Smarter Code Pipelines

Building intelligent developer tools is a journey of continuous refinement. Our AI-powered AutoFix and Refactor pipelines are designed to streamline development, but like any evolving system, they need to learn and adapt. This past week, we pushed a significant update that not only enhances user control and workflow automation but also introduces a powerful closed-loop learning mechanism, allowing our pipelines to become progressively smarter with each execution.

Let's dive into what we shipped.

Taking the Reins: LLM Provider & Model Selection

One of the most requested features was the ability to choose the underlying Large Language Model (LLM) provider and specific model for our pipelines. Different models excel at different tasks, offer varying performance, and come with diverse cost implications. Giving developers this control was paramount.

We've integrated a new UI element – a button group for LLM_PROVIDERS and a model input field – directly into the dialogs for both AutoFix and Refactor pipelines. This means you can now specify, for instance, whether you want to use OpenAI's GPT-4 Turbo or Anthropic's Claude 3 Opus for a given task. To keep things transparent, we've also added a provider badge to the detail pages and list cards, so you can easily see which model generated a particular fix or refactoring suggestion.

This seemingly small change significantly enhances flexibility and future-proofs our pipelines as the LLM landscape continues to evolve.

Streamlining Workflows: Automated PR Creation for Refactors

Our Refactor pipeline is designed to identify and suggest improvements to code. While it's great at finding opportunities, the manual step of creating a Pull Request (PR) for each accepted refactor could be a bottleneck. No more!

We've introduced a new Phase 4: PR Creation into the Refactor pipeline. Now, with a simple checkbox in the UI, you can enable autoCreatePR. When a refactoring run completes and generates a single-file patch (which are often straightforward and low-risk), our system will automatically create a PR in your target repository. For multi-file changes, we've opted to skip auto-PR for now, allowing for manual review due to their potentially broader impact.

This feature significantly reduces friction, allowing developers to integrate accepted refactors into their codebase with minimal overhead. We've updated our tRPC router, RefactorItem schema (adding prUrl and prNumber), and the progress bar to reflect this exciting new phase.

The Game Changer: Building a Closed-Loop Learning System

This is where things get really exciting. LLMs are powerful, but by default, they're stateless. Each interaction is a fresh start. Yet, our pipelines generate a treasure trove of data: identified issues, suggested fixes, refactoring opportunities, and proposed improvements. How could we leverage this wealth of information to make our pipelines smarter over time?

The answer: a closed-loop learning system.

We've implemented a comprehensive system that allows our pipelines to learn from their own historical runs and inject those learnings back into future LLM prompts. Here's how it works:

Insight Extraction: After every AutoFix or Refactor pipeline run completes, a new pipeline-insight-extractor.ts module springs into action. It meticulously extracts all identified issues, generated fixes, refactoring opportunities, and improvements, transforming them into structured WorkflowInsight records. These insights are then stored, complete with vector embeddings for efficient retrieval.
Historical Learnings: When a new pipeline run is initiated, a new pipeline-learnings.ts module performs a hybrid search against our stored WorkflowInsight records. It looks for past insights relevant to the current context (e.g., the repository, file, or type of issue).
Prompt Injection: These relevant "Historical Learnings" are then formatted into markdown and injected directly into the LLM prompts used by our core modules: issue-detector.ts, fix-generator.ts, opportunity-detector.ts, and improvement-generator.ts.

This means that if our AutoFix pipeline previously encountered and successfully fixed a particular pattern of bug in a similar codebase, future runs will have that context. The LLM won't be starting from scratch; it will be informed by the collective experience of past successful (and perhaps even unsuccessful) interventions. This is a massive leap towards self-improving, context-aware AI tools.

This system involved significant changes across 11 files and over 550 lines of code, touching every critical component of our pipeline architecture. It’s a foundational piece for truly intelligent automation.

Navigating the Hurdles: Lessons Learned

No significant development effort is without its challenges. Here are a few key lessons we learned along the way:

Database Schema Flexibility for Insights: Our WorkflowInsight table was initially designed with a non-nullable workflowId (a foreign key). However, the insights generated directly by the pipeline itself don't belong to a parent workflow in the same way user-initiated workflows do. This led to type errors. The clean solution was to make workflowId optional (String?) in our Prisma schema, updating the relation and associated TypeScript types. This highlighted the importance of anticipating diverse data sources when designing schemas, especially for learning systems that aggregate data from various origins. Our search queries, which use raw SQL, don't filter by workflowId for pipeline-sourced insights anyway, making this a pragmatic and effective change.
UI State Desynchronization: A subtle bug caused our pipeline detail pages to always start at the "scan" phase on mount, even if the actual run was much further along. This was due to useState<RefactorPhase>("scan") initializing the client-side state incorrectly. The fix involved adding a useEffect hook to synchronize currentPhase from the server-side run.status via a statusToPhase map, ensuring the UI accurately reflects the pipeline's real-time progress.
Prisma Json? Field Access: Working with Json? fields (like our config field on runs) in Prisma and TypeScript often requires explicit type assertions on the client side. We frequently used as Record<string, string> or as unknown as { config?: Record<string, string> } to correctly access properties, reminding us of TypeScript's strictness and the need for careful type handling when dealing with dynamically typed JSON data from the database.

What's Next?

With these features now live on main, our immediate focus shifts to rigorous testing and validation:

Manual Testing: We'll be running AutoFix with non-default providers to verify the provider badge, and Refactor with autoCreatePR enabled to confirm PR creation in target repositories.
Learning Loop Validation: Critically, we'll execute second runs of AutoFix/Refactor to ensure the "Loaded historical learnings" message appears in the SSE stream and that new insights are correctly stored and retrievable in our MemoryPicker.
Future Enhancements: We're already considering adding a dedicated "Learnings" tab to pipeline detail pages to visualize the extracted insights, and exploring deduplication logic to prevent redundant insights from repeated runs on the same codebase.

This sprint has been a monumental step forward, transforming our pipelines from reactive tools into proactive, self-improving agents. We're excited to see how this closed-loop learning system empowers developers and continues to push the boundaries of AI-assisted coding.