nyxcore-systems
6 min read

Solving the Pipeline Paradox: A Tale of SSE, Polling, and Unconditional Execution

Discover how we tackled a tricky bug in our AutoFix/Refactor pipelines, preventing unwanted restarts by strategically combining Server-Sent Events guards and client-side polling.

TypeScriptNext.jsSSEtRPCPipelinesBugFixWebDevelopmentUserExperience

Ever been in the middle of a complex, long-running operation in a web application—say, a code refactor or an automated fix—only for it to mysteriously restart from scratch when you merely navigated away and back to the detail page? It's frustrating, wastes resources, and utterly breaks the user experience.

This was precisely the challenge we faced recently with our AutoFix and Refactor pipelines. Our users were encountering a peculiar and infuriating bug: whenever they navigated away from a running pipeline's detail page and then returned, the entire process would reset, kicking off from Phase 1. Let's dive into how we diagnosed this "pipeline paradox" and implemented a robust solution.

The Problem: A Tale of Unintended Restarts

Our application features powerful, multi-phase AutoFix and Refactor pipelines designed to automate complex code transformations. These pipelines provide real-time updates to the user interface, driven by Server-Sent Events (SSE). The user kicks off a run, and the UI streams progress, phase by phase.

The bug manifested like this:

  1. User starts an AutoFix or Refactor run.
  2. The pipeline progresses through Phase 1, Phase 2, etc.
  3. User navigates to another part of the dashboard.
  4. User navigates back to the original run's detail page.
  5. Boom! The pipeline status resets to "scanning" (Phase 1), and a brand new run begins, overwriting the progress of the original one.

This wasn't just an annoyance; it was a critical flaw that made our long-running features unreliable.

Unpacking the Pain: The Root Cause

Our investigation led us deep into the interaction between our client-side pages, our SSE routes, and the pipeline execution logic. Here's what we uncovered:

1. Unconditional Execution on SSE Reconnect: The core of the problem lay in how our SSE routes (/api/v1/events/auto-fix/[id] and /api/v1/events/refactor/[id]) handled new connections. Whenever a client (the browser page) connected to one of these routes, the API handler unconditionally called runAutoFix(id) or runRefactor(id).

typescript
// Simplified, problematic SSE route logic
export async function GET(request: Request, { params }: { params: { id: string } }) {
  // ... setup
  await runAutoFix(params.id); // PROBLEM: Always called!
  // ... stream events
}

2. Silent Background Completion: When the user navigated away from the detail page, the client-side SSE connection would naturally close. However, the runAutoFix() or runRefactor() function, once initiated on the server, would continue to run in the background. Our SSE implementation used a for await loop over a generator, and while writes to a closed stream would fail gracefully (via safeEnqueue), the pipeline itself continued to process, updating the database with its progress.

3. The Paradox: So, the original pipeline would complete in the background, updating the run's status in the database to "completed" or "failed." But then, when the user navigated back to the detail page, a new SSE connection was established. This new connection would trigger a fresh call to runAutoFix() (or runRefactor()), which, unaware of the completed background run, would reset the status field back to "pending" or "scanning" and start the entire process over.

It was a classic race condition and a fundamental misunderstanding of connection lifecycle.

The Solution: A Two-Pronged Approach

To resolve this, we implemented a dual strategy, addressing both the server-side SSE behavior and the client-side UI update mechanism.

1. Server-Side Smarts: The SSE Guard

The most critical change was to add a guard at the entry point of our SSE routes. Now, runAutoFix() or runRefactor() is only called if the pipeline's status is pending. For any other status (e.g., active, completed, failed), the SSE route simply sends a single status event reflecting the current state and then immediately closes the stream.

typescript
// src/app/api/v1/events/auto-fix/[id]/route.ts
export async function GET(request: Request, { params }: { params: { id: string } }) {
  const { id } = params;
  const run = await getAutoFixRunById(id); // Fetch current run status

  if (!run || run.status !== "pending") {
    // If run is not pending, send its current status and close the stream.
    // This prevents restarting completed/active runs.
    return new Response(
      `event: status\ndata: ${JSON.stringify(run || { id, status: "not_found" })}\n\n`,
      { headers: { 'Content-Type': 'text/event-stream;charset=utf-8', 'Cache-Control': 'no-cache' } }
    );
  }

  // If the run is pending, proceed to execute the pipeline and stream events.
  await runAutoFix(id);
  // ... rest of the SSE streaming logic for the generator
}

This ensures that reconnecting to an SSE stream for an already active or completed run doesn't inadvertently trigger a restart.

2. Client-Side Adaptability: tRPC Polling

With the server-side guard in place, active or completed runs no longer stream real-time updates via SSE. To keep the UI responsive for active runs, we introduced client-side polling using tRPC's refetchInterval feature.

For runs that are in an ACTIVE_STATUSES state (e.g., scanning, improving, reviewing), the UI now polls the server every 3 seconds for updates. Once the run transitions to a terminal state (completed, failed), the polling stops.

typescript
// src/app/(dashboard)/dashboard/auto-fix/[id]/page.tsx
const { data: run, isLoading } = trpc.autoFix.getRun.useQuery(
  { id },
  {
    refetchInterval: (data) => {
      // Only poll if the run is in an active, non-terminal state
      const isActive = data && ['scanning', 'improving', 'reviewing'].includes(data.status);
      return isActive ? 3000 : false; // Poll every 3 seconds
    },
    // ... other options
  }
);

// We still use SSE for the initial 'pending' state and real-time updates
// until the run transitions out of pending or completes.
const { data: sseData } = useSseSubscription(`/api/v1/events/auto-fix/${id}`, {
  enabled: run?.status === 'pending', // Only enable SSE if run is pending
  // ... handle incoming SSE events
});

This hybrid approach leverages SSE for its strengths (real-time, push-based updates for new or pending processes) and complements it with polling for ongoing, active processes where the server is simply updating state.

Lessons Learned and Future Considerations

This fix, committed as bd96a5a, has significantly improved the reliability and user experience of our AutoFix and Refactor features. The immediate tests confirmed that navigating away and back no longer restarts ongoing pipelines.

Key Takeaways:

  • Connection Lifecycle: Always consider the full lifecycle of your connections (especially with real-time technologies like SSE) and how reconnections are handled.
  • Idempotency: Ensure that actions triggered by connection events are idempotent or guarded against unintended side effects.
  • State Management: Clearly define what state triggers which action. A pending status should initiate; an active status should continue; a completed status should just reflect.
  • Hybrid Approaches: Don't be afraid to combine different communication patterns (SSE, polling, WebSockets) to best suit the needs of different stages of a process. We rejected a more complex alternative of adding resume logic inside the pipeline generators because it would have been more fragile and introduced ambiguity around mid-phase database states.

While this bug is squashed, our journey continues. Immediate next steps include:

  • Adding a "Re-run" button for completed or failed runs, allowing users to explicitly reset the status to "pending" and kick off a new pipeline.
  • Considering a cleanup job or manual retry mechanism for runs that might get stuck in an active status if the server crashes mid-process.

This journey from a frustrating bug to a robust solution highlights the intricacies of building real-time, stateful applications. It's a constant reminder to question assumptions about system behavior and design for resilience at every layer.