Building a Refactor Pipeline: From Code Analysis to Automated Improvements

Late-night coding sessions often produce the most interesting breakthroughs. Last night was one of those sessions where everything clicked into place—we successfully implemented a complete refactoring pipeline that can automatically scan repositories, detect improvement opportunities, and generate tailored fixes.

The Vision: Smart Code Improvement at Scale

Every developer knows the feeling: you're deep in a codebase and spot duplicate code, dead functions, or those dreaded TODO comments from months ago. But who has time to systematically hunt down and fix these issues across an entire project?

That's exactly the problem we set out to solve with our new Refactor Pipeline—an automated system that:

Scans repositories for improvement opportunities
Detects six categories of refactoring needs
Generates appropriate fixes based on complexity
Streams progress in real-time to keep developers engaged

The Architecture: Three-Phase Pipeline

Phase 1: Repository Scanning

The pipeline starts by systematically scanning repository files, building a comprehensive map of the codebase structure and identifying potential problem areas.

Phase 2: Opportunity Detection

This is where the magic happens. We leverage LLMs to batch-analyze code and detect six distinct categories of refactoring opportunities:

typescript

enum RefactorCategory {
  DUPLICATE_CODE = 'duplicate-code',
  DEAD_CODE = 'dead-code', 
  TODO_REMNANT = 'todo-remnant',
  UNNECESSARY_CODE = 'unnecessary-code',
  SLOW_CODE = 'slow-code',
  SLOW_QUERY = 'slow-query'
}

The system automatically deduplicates findings and sorts them by impact and difficulty, ensuring developers see the most valuable improvements first.

Phase 3: Improvement Generation

Here's where we get smart about output format. Not all refactoring opportunities are created equal:

Easy fixes → Generate unified diff patches ready to apply
Medium complexity → Provide step-by-step implementation guides
Hard problems → Offer architectural recommendations and suggestions

typescript

// Example of difficulty-based output generation
const generateImprovement = (difficulty: RefactorDifficulty) => {
  switch (difficulty) {
    case 'easy':
      return generatePatch(); // Ready-to-apply unified diff
    case 'medium': 
      return generatePrompt(); // Step-by-step guide
    case 'hard':
      return generateSuggestion(); // Architectural advice
  }
}

Real-Time Streaming: Keeping Users Engaged

One of the biggest challenges with long-running processes is user engagement. Nobody wants to stare at a spinning loader for minutes. We implemented Server-Sent Events (SSE) to stream progress updates in real-time:

typescript

// SSE endpoint streaming refactor progress
export async function GET(request: Request) {
  const stream = new ReadableStream({
    start(controller) {
      // Stream progress updates as they happen
      pipeline.on('progress', (data) => {
        safeEnqueue(controller, data);
      });
    }
  });
  
  return new Response(stream, {
    headers: { 'Content-Type': 'text/event-stream' }
  });
}

Users can watch as the system moves through each phase, see opportunities being discovered, and even preview improvements as they're generated.

The User Experience: From Discovery to Implementation

The frontend brings it all together with an intuitive workflow:

Start a scan from either the main refactor page or project detail view
Watch progress with real-time phase updates and statistics
Review opportunities with expandable cards showing context and fixes
Filter results by category (dead code, duplicates, etc.) or difficulty
Apply improvements directly or use them as implementation guides

Each opportunity is presented in a clean, expandable card format with syntax-highlighted code diffs and clear explanations of the proposed changes.

Lessons Learned: The Challenges We Overcame

The SSE Controller Crisis

Mid-development, we hit a nasty issue: "Invalid state: Controller is already closed". This happened when users navigated away from pages during long-running operations, causing the SSE stream to crash.

The fix? Defensive programming with wrapper functions:

typescript

const safeEnqueue = (controller: ReadableStreamDefaultController, data: any) => {
  try {
    controller.enqueue(`data: ${JSON.stringify(data)}\n\n`);
  } catch (error) {
    // Controller already closed - fail gracefully
    console.warn('SSE controller closed:', error.message);
  }
};

This pattern is now our standard for all real-time streaming endpoints.

Integration Complexity

Building a feature that touches repository scanning, LLM processing, database operations, and real-time UI updates meant juggling many moving parts. The key was maintaining clear separation of concerns:

Services handle the core logic
Routers manage API contracts
Components focus on presentation
Pipelines orchestrate the flow

The Technical Stack

The implementation spans multiple layers:

Backend: tRPC procedures with Prisma ORM for data persistence
Processing: AsyncGenerator pattern for pipeline orchestration
Streaming: SSE with safe error handling
Frontend: React components with real-time updates
Database: PostgreSQL with proper relational modeling

What's Next?

With the core pipeline complete, we're already thinking about enhancements:

Multi-model support for different LLM providers
Custom refactoring rules for team-specific patterns
Integration with CI/CD for automated improvement suggestions
Batch processing for organization-wide code health initiatives

Key Takeaways

Building this refactoring pipeline taught us several valuable lessons:

User feedback is crucial for long-running processes—real-time streaming makes all the difference
Defensive programming prevents small edge cases from becoming major user experience problems
Difficulty-aware outputs provide more value than one-size-fits-all solutions
Clear separation of concerns makes complex features maintainable

The result is a system that doesn't just find problems—it provides actionable, context-aware solutions that developers can actually use. Sometimes the best coding sessions happen at 1 AM when everything finally clicks into place.

Want to see more deep dives into our development process? Follow along as we continue building tools that make developers' lives easier.