nyxcore-systems
6 min read

Beyond 'Failed': Fortifying Code Analysis and Illuminating Background Tasks

We tackled two critical areas this week: making our long-running code analysis more resilient to errors and giving users real-time visibility into all background processes.

TypeScriptNext.jstRPCPrismaPostgreSQLLLMCodeAnalysisUXErrorHandling

Every developer knows the frustration of a black box. A critical background process kicks off, and then... silence. Or worse, a generic "something failed" message that offers no clue about the root cause. This week, we set out to banish that ambiguity from our system, focusing on two key improvements: robust error handling for our LLM-powered code analysis and a brand-new, real-time "active processes" widget.

The goal was clear: empower users with transparent feedback and ensure our analysis engine could gracefully handle partial failures. I'm happy to report both are now live on main at commit 8594153, and our test runs show a significant leap in both reliability and user experience.

Fortifying the Code Analysis Engine: From Fatal to Forgiving

Our code analysis pipeline is a multi-step process, leveraging large language models (LLMs) to detect patterns and generate documentation. Previously, an error in any single batch of LLM processing or during documentation generation would halt the entire analysis run, often leaving a vague "Pattern detection failed" message in its wake. This was a critical flaw, as a minor hiccup shouldn't invalidate the entire effort.

The solution involved a fundamental shift in our error handling strategy:

  1. Granular Error Storage: In src/server/services/code-analysis/analysis-runner.ts, we moved from storing generic failure strings to capturing and persisting the actual error messages in the database. This immediately provides actionable insights for debugging.

  2. Non-Fatal Batch LLM Errors: In pattern-detector.ts, we redesigned the LLM batch processing. Instead of throwing an error that would terminate the entire analysis, individual batch failures now trigger a batch_error event. The analysis continues, accumulating these errors. This means if one small part of a large codebase causes an LLM to stumble, the rest of the analysis can still complete, providing partial results that are far more valuable than a complete failure. We also added batchErrors counters to the analysis stats.

  3. Non-Fatal Doc Generation Errors: Similarly, doc-generator.ts was updated. If a specific document generation fails (perhaps due to malformed input or an LLM timeout), it now emits a doc_error event, allowing other documents to be generated successfully. A docErrors counter keeps track of these.

This change is more than just a bug fix; it's an architectural improvement that embraces the inherent probabilistic nature of LLMs. We're now building a system that's resilient and provides maximum value even when individual components hit a snag.

Illuminating the Background: The Active Processes Widget

One of the biggest user experience gaps was the lack of visibility into long-running operations. When a user initiated a repo sync, a code analysis, or a data consolidation, they were left wondering if anything was happening. Our new "Active Processes" sidebar widget solves this.

Here's how we brought it to life:

  1. Unified Data Source: The core challenge was that active processes could originate from several different database tables (workflows, analysis runs, consolidations, syncing repos). To present a unified view, I created a new tRPC query: src/server/trpc/routers/dashboard.tsactiveProcesses. This query polls four tables in parallel, mapping their disparate structures into a single ActiveProcess[] type. This parallel fetching is crucial for performance, ensuring the widget remains responsive.

    typescript
    // Simplified ActiveProcess type (conceptual)
    export type ActiveProcess = {
      id: string;
      type: 'workflow' | 'analysis' | 'consolidation' | 'repoSync';
      name: string;
      status: 'running' | 'paused' | 'failed' | 'completed';
      progress: number; // 0-100
      link: string; // Deep link to the process details
      // ... other relevant details
    };
    
  2. Dynamic Sidebar Widget: The src/components/layout/active-processes.tsx component is where the magic happens visually. It's a sleek sidebar widget that:

    • Auto-refreshes every 5 seconds (we'll fine-tune this later).
    • Displays color-coded icons based on process type.
    • Shows progress bars and status labels.
    • Provides deep links directly to the details page of each active process.
  3. Seamless Integration: The widget was integrated into src/components/layout/sidebar.tsx at the bottom of the navigation. A quick fix was adding overflow-y-auto to the navigation section to ensure scrollability when many processes are active. We also clamped workflow progress labels to prevent "Step 4/3" issues.

Now, users have an immediate, glanceable overview of what's happening behind the scenes, transforming a previously opaque system into an open book.

Navigating the Trenches: Lessons Learned

Not every step was a smooth sail. Here are a couple of critical "pain points" that turned into valuable lessons:

Lesson 1: Authenticating CLI Tools with Web Sessions

  • The Problem: I wanted to trigger an analysis run from the command line for testing. Our analysis endpoint uses Server-Sent Events (SSE) and is protected by NextAuth, requiring a browser session cookie. Direct curl attempts failed due to authentication.
  • The "Aha!" Moment: Instead of trying to emulate a browser session (which is brittle and complex for internal dev tools), the cleaner approach was to bypass the HTTP layer entirely.
  • Actionable Takeaway: For internal development and testing scripts, directly import and call the core server-side functions. This avoids authentication complexities and ensures you're testing the business logic directly.
  • The Solution: I created scripts/run-analysis.ts, which directly imports runAnalysis() from src/server/services/code-analysis/analysis-runner.ts and executes it using npx tsx. This provides a reliable, authenticated-free way to trigger analysis runs for development.

Lesson 2: Prisma's @updatedAt and Raw SQL

  • The Problem: While trying to manually create an analysis run record via raw SQL INSERT into PostgreSQL, I hit a null value in column "updatedAt" error.
  • The "Aha!" Moment: Prisma's @updatedAt directive (and @createdAt) are managed by Prisma itself. When you use raw SQL, you're bypassing the ORM, and thus Prisma doesn't get a chance to automatically set these fields.
  • Actionable Takeaway: Always use the Prisma client for database operations, especially when dealing with fields managed by ORM directives. If raw SQL is absolutely unavoidable, remember to explicitly handle ORM-managed fields (e.g., by using now() for updatedAt in SQL, though this is generally discouraged for consistency).
  • The Solution: Stick to the Prisma client for all record creation and updates. It ensures data integrity and leverages the ORM's features correctly.

The Road Ahead

With these features shipped, our system is significantly more robust and user-friendly. However, the journey continues:

  • SSE Abort Handling: A crucial consideration from code review: if a client disconnects from an SSE stream, we should ideally abort any in-progress LLM operations to save tokens and resources. This is a high priority.
  • Polling Optimization: While 5s polling for active processes is fine for now, we'll consider increasing refetchInterval to 10-15s and adding staleTime to reduce client-side polling load.
  • Unit Tests: Orchestration logic in analysis-runner.ts (especially batch_error/doc_error handling) would benefit greatly from dedicated unit tests.
  • RLS Verification: The code_patterns table relies on a repository foreign key for Row Level Security (RLS) instead of an explicit tenantId. Verifying the RLS policy works correctly is important for data isolation.

This session was a great example of tackling both critical backend reliability issues and frontend UX improvements in tandem. The result is a more stable, transparent, and ultimately, a more usable product.

json
{"thingsDone":[
  "Implemented robust error handling for LLM-powered code analysis, capturing specific errors and allowing partial completion.",
  "Developed and integrated a real-time 'Active Processes' sidebar widget showing progress and status of long-running tasks.",
  "Created an internal CLI script to trigger analysis runs, bypassing browser authentication.",
  "Fixed minor UI issues related to progress display and sidebar layout."
],"pains":[
  "Authenticating CLI tools with NextAuth-protected SSE endpoints.",
  "Understanding Prisma's `@updatedAt` directive behavior with raw SQL inserts.",
  "Handling non-relational `String[]` fields in dashboard queries."
],"successes":[
  "Significant improvement in code analysis resilience and debuggability.",
  "Enhanced user experience with real-time visibility into background operations.",
  "Creation of useful internal development tooling.",
  "Successful integration of complex data from multiple sources into a unified UI component."
],"techStack":[
  "TypeScript",
  "Next.js",
  "tRPC",
  "Prisma",
  "PostgreSQL",
  "LLMs (Large Language Models)",
  "React (for UI components)"
]}