Beyond 'Failed': Fortifying Code Analysis and Illuminating Background Tasks
We tackled two critical areas this week: making our long-running code analysis more resilient to errors and giving users real-time visibility into all background processes.
Every developer knows the frustration of a black box. A critical background process kicks off, and then... silence. Or worse, a generic "something failed" message that offers no clue about the root cause. This week, we set out to banish that ambiguity from our system, focusing on two key improvements: robust error handling for our LLM-powered code analysis and a brand-new, real-time "active processes" widget.
The goal was clear: empower users with transparent feedback and ensure our analysis engine could gracefully handle partial failures. I'm happy to report both are now live on main at commit 8594153, and our test runs show a significant leap in both reliability and user experience.
Fortifying the Code Analysis Engine: From Fatal to Forgiving
Our code analysis pipeline is a multi-step process, leveraging large language models (LLMs) to detect patterns and generate documentation. Previously, an error in any single batch of LLM processing or during documentation generation would halt the entire analysis run, often leaving a vague "Pattern detection failed" message in its wake. This was a critical flaw, as a minor hiccup shouldn't invalidate the entire effort.
The solution involved a fundamental shift in our error handling strategy:
-
Granular Error Storage: In
src/server/services/code-analysis/analysis-runner.ts, we moved from storing generic failure strings to capturing and persisting the actual error messages in the database. This immediately provides actionable insights for debugging. -
Non-Fatal Batch LLM Errors: In
pattern-detector.ts, we redesigned the LLM batch processing. Instead of throwing an error that would terminate the entire analysis, individual batch failures now trigger abatch_errorevent. The analysis continues, accumulating these errors. This means if one small part of a large codebase causes an LLM to stumble, the rest of the analysis can still complete, providing partial results that are far more valuable than a complete failure. We also addedbatchErrorscounters to the analysis stats. -
Non-Fatal Doc Generation Errors: Similarly,
doc-generator.tswas updated. If a specific document generation fails (perhaps due to malformed input or an LLM timeout), it now emits adoc_errorevent, allowing other documents to be generated successfully. AdocErrorscounter keeps track of these.
This change is more than just a bug fix; it's an architectural improvement that embraces the inherent probabilistic nature of LLMs. We're now building a system that's resilient and provides maximum value even when individual components hit a snag.
Illuminating the Background: The Active Processes Widget
One of the biggest user experience gaps was the lack of visibility into long-running operations. When a user initiated a repo sync, a code analysis, or a data consolidation, they were left wondering if anything was happening. Our new "Active Processes" sidebar widget solves this.
Here's how we brought it to life:
-
Unified Data Source: The core challenge was that active processes could originate from several different database tables (workflows, analysis runs, consolidations, syncing repos). To present a unified view, I created a new tRPC query:
src/server/trpc/routers/dashboard.ts→activeProcesses. This query polls four tables in parallel, mapping their disparate structures into a singleActiveProcess[]type. This parallel fetching is crucial for performance, ensuring the widget remains responsive.typescript// Simplified ActiveProcess type (conceptual) export type ActiveProcess = { id: string; type: 'workflow' | 'analysis' | 'consolidation' | 'repoSync'; name: string; status: 'running' | 'paused' | 'failed' | 'completed'; progress: number; // 0-100 link: string; // Deep link to the process details // ... other relevant details }; -
Dynamic Sidebar Widget: The
src/components/layout/active-processes.tsxcomponent is where the magic happens visually. It's a sleek sidebar widget that:- Auto-refreshes every 5 seconds (we'll fine-tune this later).
- Displays color-coded icons based on process type.
- Shows progress bars and status labels.
- Provides deep links directly to the details page of each active process.
-
Seamless Integration: The widget was integrated into
src/components/layout/sidebar.tsxat the bottom of the navigation. A quick fix was addingoverflow-y-autoto the navigation section to ensure scrollability when many processes are active. We also clamped workflow progress labels to prevent "Step 4/3" issues.
Now, users have an immediate, glanceable overview of what's happening behind the scenes, transforming a previously opaque system into an open book.
Navigating the Trenches: Lessons Learned
Not every step was a smooth sail. Here are a couple of critical "pain points" that turned into valuable lessons:
Lesson 1: Authenticating CLI Tools with Web Sessions
- The Problem: I wanted to trigger an analysis run from the command line for testing. Our analysis endpoint uses Server-Sent Events (SSE) and is protected by NextAuth, requiring a browser session cookie. Direct
curlattempts failed due to authentication. - The "Aha!" Moment: Instead of trying to emulate a browser session (which is brittle and complex for internal dev tools), the cleaner approach was to bypass the HTTP layer entirely.
- Actionable Takeaway: For internal development and testing scripts, directly import and call the core server-side functions. This avoids authentication complexities and ensures you're testing the business logic directly.
- The Solution: I created
scripts/run-analysis.ts, which directly importsrunAnalysis()fromsrc/server/services/code-analysis/analysis-runner.tsand executes it usingnpx tsx. This provides a reliable, authenticated-free way to trigger analysis runs for development.
Lesson 2: Prisma's @updatedAt and Raw SQL
- The Problem: While trying to manually create an analysis run record via raw SQL
INSERTinto PostgreSQL, I hit anull value in column "updatedAt"error. - The "Aha!" Moment: Prisma's
@updatedAtdirective (and@createdAt) are managed by Prisma itself. When you use raw SQL, you're bypassing the ORM, and thus Prisma doesn't get a chance to automatically set these fields. - Actionable Takeaway: Always use the Prisma client for database operations, especially when dealing with fields managed by ORM directives. If raw SQL is absolutely unavoidable, remember to explicitly handle ORM-managed fields (e.g., by using
now()forupdatedAtin SQL, though this is generally discouraged for consistency). - The Solution: Stick to the Prisma client for all record creation and updates. It ensures data integrity and leverages the ORM's features correctly.
The Road Ahead
With these features shipped, our system is significantly more robust and user-friendly. However, the journey continues:
- SSE Abort Handling: A crucial consideration from code review: if a client disconnects from an SSE stream, we should ideally abort any in-progress LLM operations to save tokens and resources. This is a high priority.
- Polling Optimization: While 5s polling for active processes is fine for now, we'll consider increasing
refetchIntervalto 10-15s and addingstaleTimeto reduce client-side polling load. - Unit Tests: Orchestration logic in
analysis-runner.ts(especiallybatch_error/doc_errorhandling) would benefit greatly from dedicated unit tests. - RLS Verification: The
code_patternstable relies on a repository foreign key for Row Level Security (RLS) instead of an explicittenantId. Verifying the RLS policy works correctly is important for data isolation.
This session was a great example of tackling both critical backend reliability issues and frontend UX improvements in tandem. The result is a more stable, transparent, and ultimately, a more usable product.
{"thingsDone":[
"Implemented robust error handling for LLM-powered code analysis, capturing specific errors and allowing partial completion.",
"Developed and integrated a real-time 'Active Processes' sidebar widget showing progress and status of long-running tasks.",
"Created an internal CLI script to trigger analysis runs, bypassing browser authentication.",
"Fixed minor UI issues related to progress display and sidebar layout."
],"pains":[
"Authenticating CLI tools with NextAuth-protected SSE endpoints.",
"Understanding Prisma's `@updatedAt` directive behavior with raw SQL inserts.",
"Handling non-relational `String[]` fields in dashboard queries."
],"successes":[
"Significant improvement in code analysis resilience and debuggability.",
"Enhanced user experience with real-time visibility into background operations.",
"Creation of useful internal development tooling.",
"Successful integration of complex data from multiple sources into a unified UI component."
],"techStack":[
"TypeScript",
"Next.js",
"tRPC",
"Prisma",
"PostgreSQL",
"LLMs (Large Language Models)",
"React (for UI components)"
]}