The Case of the Phantom Pipeline: Debugging Server-Sent Events Gone Wild

Picture this: You're running a long data processing pipeline—maybe it's analyzing code for automated fixes or performing complex refactoring operations. You navigate away from the progress page to grab some coffee, and when you come back, the pipeline has mysteriously restarted from the beginning. Sound familiar?

This exact scenario had me scratching my head for hours until I uncovered a sneaky bug in our Server-Sent Events (SSE) implementation. Here's the story of how a simple navigation action was secretly spawning phantom pipelines.

The Mystery Unfolds

Our application has two main pipeline operations: AutoFix and Refactor. Both use SSE to stream real-time progress updates to the frontend. Users can monitor phases like "scanning," "analyzing," and "improving" as they happen.

The bug manifested in a frustrating way:

Start a pipeline (it begins processing normally)
Navigate away from the detail page
Navigate back to check progress
The pipeline restarts from Phase 1 😱

What made this particularly insidious was that it looked like normal behavior on the surface. The UI would reconnect and show progress updates—users might not even realize their pipeline had been reset.

Following the Digital Breadcrumbs

The investigation led me through our SSE architecture. Here's what was happening under the hood:

typescript

// The problematic SSE route (simplified)
export async function GET(request: Request, { params }: { params: { id: string } }) {
  const runId = params.id;
  
  // 🚨 This was the problem - unconditional pipeline start
  const generator = runAutoFix(runId);
  
  for await (const event of generator) {
    // Stream events to client
    await safeEnqueue(event);
  }
}

Every time the client connected to the SSE endpoint, it would call runAutoFix() or runRefactor() unconditionally. This meant:

First visit: Pipeline starts normally ✅
Navigate away: Client disconnects, but pipeline continues running in background
Navigate back: New SSE connection triggers a second pipeline instance 💥

The original pipeline would complete in the background, updating the database. But the new pipeline would overwrite the status back to "scanning" and start fresh.

The Elegant Solution

The fix turned out to be beautifully simple. Instead of always starting a new pipeline, we added a status guard:

typescript

// Fixed SSE route
export async function GET(request: Request, { params }: { params: { id: string } }) {
  const run = await getRun(params.id);
  
  // 🎯 The key insight: only pending runs should start pipelines
  if (run.status === "pending") {
    const generator = runAutoFix(params.id);
    
    for await (const event of generator) {
      await safeEnqueue(event);
    }
  } else {
    // For active/completed runs: send current status and close
    await safeEnqueue({ type: "status", data: run.status });
    return; // Close stream immediately
  }
}

But this created a new challenge: if active runs don't use SSE, how does the UI get updates? The solution was to add intelligent polling:

typescript

// Frontend: Poll for updates when SSE isn't available
const { data: run } = api.autoFix.get.useQuery(
  { id: runId },
  {
    // Only poll when run is in active status
    refetchInterval: ACTIVE_STATUSES.includes(data?.status) ? 3000 : false,
  }
);

Lessons Learned

This bug taught me several valuable lessons about real-time systems:

1. SSE Connections Are Ephemeral

Just because a client disconnects doesn't mean your server-side process should restart. Always check the current state before taking action.

2. Guard Your Entry Points

Any endpoint that triggers expensive operations should validate whether that operation is actually needed. A simple status check saved us from countless phantom pipelines.

3. Hybrid Approaches Work

We don't have to choose between SSE and polling. Use SSE for new operations and fall back to polling for reconnections. This gives us the best of both worlds: real-time updates with resilient reconnection behavior.

4. Background Processes Are Invisible

The most dangerous bugs are the ones that "work" from the user's perspective. Our phantom pipelines were consuming server resources and potentially causing race conditions, but users might never notice.

Alternative Approaches Considered

I briefly considered adding resume logic to the pipeline generators themselves—essentially making them stateful and able to skip completed phases. However, this approach had several drawbacks:

Complexity: Each pipeline phase would need resume logic
Fragility: Database state during mid-phase operations can be ambiguous
Performance: Checking completion status for every phase adds overhead

The SSE-level guard was much cleaner and more reliable.

Future Considerations

While this fix solves the immediate problem, it reveals some areas for future improvement:

Manual Re-run Capability: We might want to add a "Re-run" button that explicitly resets status to "pending"
Orphaned Process Detection: If the server crashes, active runs might get stuck forever. A cleanup job could help identify and handle these cases
Better State Management: More granular status tracking could help with resume functionality in the future

The Fix in Action

The final commit (bd96a5a) was surprisingly small for such an impactful bug fix:

Modified SSE routes for both AutoFix and Refactor operations
Added status guards to prevent unnecessary pipeline starts
Implemented intelligent polling on the frontend
Zero breaking changes or schema modifications

Sometimes the best solutions are the simplest ones. A single conditional check eliminated phantom pipelines and made our real-time system much more robust.

Have you encountered similar issues with SSE or real-time systems? I'd love to hear about your debugging adventures in the comments below.