Building a Refactor Pipeline: From Code Analysis to Automated Improvements
How we built an automated refactoring pipeline that scans repositories, detects improvement opportunities, and generates context-aware fixes using LLMs and real-time streaming.
Building a Refactor Pipeline: From Code Analysis to Automated Improvements
Late-night coding sessions often produce the most interesting breakthroughs. Last night was one of those sessions where everything clicked into place—we successfully implemented a complete refactoring pipeline that can automatically scan repositories, detect improvement opportunities, and generate tailored fixes.
The Vision: Smart Code Improvement at Scale
Every developer knows the feeling: you're deep in a codebase and spot duplicate code, dead functions, or those dreaded TODO comments from months ago. But who has time to systematically hunt down and fix these issues across an entire project?
That's exactly the problem we set out to solve with our new Refactor Pipeline—an automated system that:
- Scans repositories for improvement opportunities
- Detects six categories of refactoring needs
- Generates appropriate fixes based on complexity
- Streams progress in real-time to keep developers engaged
The Architecture: Three-Phase Pipeline
Phase 1: Repository Scanning
The pipeline starts by systematically scanning repository files, building a comprehensive map of the codebase structure and identifying potential problem areas.
Phase 2: Opportunity Detection
This is where the magic happens. We leverage LLMs to batch-analyze code and detect six distinct categories of refactoring opportunities:
enum RefactorCategory {
DUPLICATE_CODE = 'duplicate-code',
DEAD_CODE = 'dead-code',
TODO_REMNANT = 'todo-remnant',
UNNECESSARY_CODE = 'unnecessary-code',
SLOW_CODE = 'slow-code',
SLOW_QUERY = 'slow-query'
}
The system automatically deduplicates findings and sorts them by impact and difficulty, ensuring developers see the most valuable improvements first.
Phase 3: Improvement Generation
Here's where we get smart about output format. Not all refactoring opportunities are created equal:
- Easy fixes → Generate unified diff patches ready to apply
- Medium complexity → Provide step-by-step implementation guides
- Hard problems → Offer architectural recommendations and suggestions
// Example of difficulty-based output generation
const generateImprovement = (difficulty: RefactorDifficulty) => {
switch (difficulty) {
case 'easy':
return generatePatch(); // Ready-to-apply unified diff
case 'medium':
return generatePrompt(); // Step-by-step guide
case 'hard':
return generateSuggestion(); // Architectural advice
}
}
Real-Time Streaming: Keeping Users Engaged
One of the biggest challenges with long-running processes is user engagement. Nobody wants to stare at a spinning loader for minutes. We implemented Server-Sent Events (SSE) to stream progress updates in real-time:
// SSE endpoint streaming refactor progress
export async function GET(request: Request) {
const stream = new ReadableStream({
start(controller) {
// Stream progress updates as they happen
pipeline.on('progress', (data) => {
safeEnqueue(controller, data);
});
}
});
return new Response(stream, {
headers: { 'Content-Type': 'text/event-stream' }
});
}
Users can watch as the system moves through each phase, see opportunities being discovered, and even preview improvements as they're generated.
The User Experience: From Discovery to Implementation
The frontend brings it all together with an intuitive workflow:
- Start a scan from either the main refactor page or project detail view
- Watch progress with real-time phase updates and statistics
- Review opportunities with expandable cards showing context and fixes
- Filter results by category (dead code, duplicates, etc.) or difficulty
- Apply improvements directly or use them as implementation guides
Each opportunity is presented in a clean, expandable card format with syntax-highlighted code diffs and clear explanations of the proposed changes.
Lessons Learned: The Challenges We Overcame
The SSE Controller Crisis
Mid-development, we hit a nasty issue: "Invalid state: Controller is already closed". This happened when users navigated away from pages during long-running operations, causing the SSE stream to crash.
The fix? Defensive programming with wrapper functions:
const safeEnqueue = (controller: ReadableStreamDefaultController, data: any) => {
try {
controller.enqueue(`data: ${JSON.stringify(data)}\n\n`);
} catch (error) {
// Controller already closed - fail gracefully
console.warn('SSE controller closed:', error.message);
}
};
This pattern is now our standard for all real-time streaming endpoints.
Integration Complexity
Building a feature that touches repository scanning, LLM processing, database operations, and real-time UI updates meant juggling many moving parts. The key was maintaining clear separation of concerns:
- Services handle the core logic
- Routers manage API contracts
- Components focus on presentation
- Pipelines orchestrate the flow
The Technical Stack
The implementation spans multiple layers:
- Backend: tRPC procedures with Prisma ORM for data persistence
- Processing: AsyncGenerator pattern for pipeline orchestration
- Streaming: SSE with safe error handling
- Frontend: React components with real-time updates
- Database: PostgreSQL with proper relational modeling
What's Next?
With the core pipeline complete, we're already thinking about enhancements:
- Multi-model support for different LLM providers
- Custom refactoring rules for team-specific patterns
- Integration with CI/CD for automated improvement suggestions
- Batch processing for organization-wide code health initiatives
Key Takeaways
Building this refactoring pipeline taught us several valuable lessons:
- User feedback is crucial for long-running processes—real-time streaming makes all the difference
- Defensive programming prevents small edge cases from becoming major user experience problems
- Difficulty-aware outputs provide more value than one-size-fits-all solutions
- Clear separation of concerns makes complex features maintainable
The result is a system that doesn't just find problems—it provides actionable, context-aware solutions that developers can actually use. Sometimes the best coding sessions happen at 1 AM when everything finally clicks into place.
Want to see more deep dives into our development process? Follow along as we continue building tools that make developers' lives easier.