Teaching Our AI to Learn: Building a Closed-Loop System for Smarter Code Pipelines
We just shipped a major update to our AI-powered code pipelines, introducing user-controlled LLM providers, automated PRs, and critically, a closed-loop learning system that allows our tools to get smarter with every run.
Building intelligent developer tools is a journey of continuous refinement. Our AI-powered AutoFix and Refactor pipelines are designed to streamline development, but like any evolving system, they need to learn and adapt. This past week, we pushed a significant update that not only enhances user control and workflow automation but also introduces a powerful closed-loop learning mechanism, allowing our pipelines to become progressively smarter with each execution.
Let's dive into what we shipped.
Taking the Reins: LLM Provider & Model Selection
One of the most requested features was the ability to choose the underlying Large Language Model (LLM) provider and specific model for our pipelines. Different models excel at different tasks, offer varying performance, and come with diverse cost implications. Giving developers this control was paramount.
We've integrated a new UI element – a button group for LLM_PROVIDERS and a model input field – directly into the dialogs for both AutoFix and Refactor pipelines. This means you can now specify, for instance, whether you want to use OpenAI's GPT-4 Turbo or Anthropic's Claude 3 Opus for a given task. To keep things transparent, we've also added a provider badge to the detail pages and list cards, so you can easily see which model generated a particular fix or refactoring suggestion.
This seemingly small change significantly enhances flexibility and future-proofs our pipelines as the LLM landscape continues to evolve.
Streamlining Workflows: Automated PR Creation for Refactors
Our Refactor pipeline is designed to identify and suggest improvements to code. While it's great at finding opportunities, the manual step of creating a Pull Request (PR) for each accepted refactor could be a bottleneck. No more!
We've introduced a new Phase 4: PR Creation into the Refactor pipeline. Now, with a simple checkbox in the UI, you can enable autoCreatePR. When a refactoring run completes and generates a single-file patch (which are often straightforward and low-risk), our system will automatically create a PR in your target repository. For multi-file changes, we've opted to skip auto-PR for now, allowing for manual review due to their potentially broader impact.
This feature significantly reduces friction, allowing developers to integrate accepted refactors into their codebase with minimal overhead. We've updated our tRPC router, RefactorItem schema (adding prUrl and prNumber), and the progress bar to reflect this exciting new phase.
The Game Changer: Building a Closed-Loop Learning System
This is where things get really exciting. LLMs are powerful, but by default, they're stateless. Each interaction is a fresh start. Yet, our pipelines generate a treasure trove of data: identified issues, suggested fixes, refactoring opportunities, and proposed improvements. How could we leverage this wealth of information to make our pipelines smarter over time?
The answer: a closed-loop learning system.
We've implemented a comprehensive system that allows our pipelines to learn from their own historical runs and inject those learnings back into future LLM prompts. Here's how it works:
-
Insight Extraction: After every
AutoFixorRefactorpipeline run completes, a newpipeline-insight-extractor.tsmodule springs into action. It meticulously extracts all identified issues, generated fixes, refactoring opportunities, and improvements, transforming them into structuredWorkflowInsightrecords. These insights are then stored, complete with vector embeddings for efficient retrieval. -
Historical Learnings: When a new pipeline run is initiated, a new
pipeline-learnings.tsmodule performs a hybrid search against our storedWorkflowInsightrecords. It looks for past insights relevant to the current context (e.g., the repository, file, or type of issue). -
Prompt Injection: These relevant "Historical Learnings" are then formatted into markdown and injected directly into the LLM prompts used by our core modules:
issue-detector.ts,fix-generator.ts,opportunity-detector.ts, andimprovement-generator.ts.
This means that if our AutoFix pipeline previously encountered and successfully fixed a particular pattern of bug in a similar codebase, future runs will have that context. The LLM won't be starting from scratch; it will be informed by the collective experience of past successful (and perhaps even unsuccessful) interventions. This is a massive leap towards self-improving, context-aware AI tools.
This system involved significant changes across 11 files and over 550 lines of code, touching every critical component of our pipeline architecture. It’s a foundational piece for truly intelligent automation.
Navigating the Hurdles: Lessons Learned
No significant development effort is without its challenges. Here are a few key lessons we learned along the way:
-
Database Schema Flexibility for Insights: Our
WorkflowInsighttable was initially designed with a non-nullableworkflowId(a foreign key). However, the insights generated directly by the pipeline itself don't belong to a parent workflow in the same way user-initiated workflows do. This led to type errors. The clean solution was to makeworkflowIdoptional (String?) in our Prisma schema, updating the relation and associated TypeScript types. This highlighted the importance of anticipating diverse data sources when designing schemas, especially for learning systems that aggregate data from various origins. Our search queries, which use raw SQL, don't filter byworkflowIdfor pipeline-sourced insights anyway, making this a pragmatic and effective change. -
UI State Desynchronization: A subtle bug caused our pipeline detail pages to always start at the "scan" phase on mount, even if the actual run was much further along. This was due to
useState<RefactorPhase>("scan")initializing the client-side state incorrectly. The fix involved adding auseEffecthook to synchronizecurrentPhasefrom the server-siderun.statusvia astatusToPhasemap, ensuring the UI accurately reflects the pipeline's real-time progress. -
Prisma
Json?Field Access: Working withJson?fields (like ourconfigfield on runs) in Prisma and TypeScript often requires explicit type assertions on the client side. We frequently usedas Record<string, string>oras unknown as { config?: Record<string, string> }to correctly access properties, reminding us of TypeScript's strictness and the need for careful type handling when dealing with dynamically typed JSON data from the database.
What's Next?
With these features now live on main, our immediate focus shifts to rigorous testing and validation:
- Manual Testing: We'll be running
AutoFixwith non-default providers to verify the provider badge, andRefactorwithautoCreatePRenabled to confirm PR creation in target repositories. - Learning Loop Validation: Critically, we'll execute second runs of
AutoFix/Refactorto ensure the "Loaded historical learnings" message appears in the SSE stream and that new insights are correctly stored and retrievable in our MemoryPicker. - Future Enhancements: We're already considering adding a dedicated "Learnings" tab to pipeline detail pages to visualize the extracted insights, and exploring deduplication logic to prevent redundant insights from repeated runs on the same codebase.
This sprint has been a monumental step forward, transforming our pipelines from reactive tools into proactive, self-improving agents. We're excited to see how this closed-loop learning system empowers developers and continues to push the boundaries of AI-assisted coding.