Debugging the Fabric: A Multi-Tenant Odyssey Through Code, AI, and Hard-Won Lessons
Come along for a deep dive into a recent development session where we untangled multi-tenant access bugs, expanded our AI model catalog, and unearthed critical insights into our data flows – all while learning a few hard lessons along the way.
Every developer knows that some days are about building shiny new features, and some days are about diving deep into the existing codebase, untangling knots, and fortifying the foundations. My most recent session was definitely the latter – a multi-faceted sprint focused on stability, scalability, and preparing for an exciting rollout.
The mission? To squash multi-tenant access bugs, broaden our AI model horizons, analyze a critical note enrichment data flow, and gear up for a significant enrichment and ProviderModelPicker rollout. It was a dense session, but incredibly rewarding, and I'm excited to share some of the key challenges and breakthroughs.
Fortifying the Gates: Multi-Tenant Safety & Data Integrity
Building multi-tenant applications introduces a fascinating layer of complexity. Ensuring that users only see their data, and that automated processes don't step on each other's toes, is paramount. This session had a strong focus on precisely that.
1. Guarding Against Duplicates: Analysis Run Protection
Imagine a scenario where a user, or an automated project sync, triggers a code analysis. What if it gets triggered multiple times in quick succession? Or what if a previous run is still active? This leads to wasted resources, potential data corruption, and a poor user experience.
We tackled this by adding robust guards before codeAnalysisRun.create() calls.
- Location:
src/server/trpc/routers/code-analysis.tsandsrc/server/services/project-sync-service.ts. - Mechanism: Before initiating a new analysis, we now check for any active runs or any completed runs within the last 3 minutes. If one exists, we simply skip the new creation.
- Bonus: We also added a cleanup mechanism to prune any analysis runs that get stuck in an "active" state for more than 10 minutes (commits
b6837e5,73c1671). Database hygiene is key!
2. The userId Filter Fiasco: A Multi-Tenant Gotcha
This was a particularly thorny bug. Users were reporting "project not found" errors, especially during imports, but it quickly became clear it was far more widespread, affecting all mutations (update, delete, notes.create, blog operations).
The root cause? Our findFirst guards in mutations were checking userId: ctx.user.id. Sounds logical, right? Except, in a multi-tenant setup, a single tenant might have multiple user accounts (e.g., oliver.baer+ckbnyx vs. oliver.baer might represent the same person in different contexts, but are distinct userIds). If a project was created by one userId within a tenant, and then another userId (from the same tenant) tried to modify it, the userId filter would prevent access.
The Fix: A systematic grep and purge! We removed userId: ctx.user.id from ALL findFirst guards in src/server/trpc/routers/projects.ts (commits 0c0a15c, fe08f17, 51ea577). Now, all access guards correctly check tenantId only. The userId is, of course, still used when creating new projects or associating specific actions with a user, but not for tenant-level resource ownership checks. This was a critical architectural clarification.
3. Taming Orphaned Syncs with Raw SQL
Our project synchronization process is vital, but like any long-running task, it can sometimes leave behind orphaned records or get stuck. Specifically, if a sync failed mid-way, it could leave a previousSyncId referring to a non-existent record, violating a unique constraint on our projectSync table.
The Fix: In src/server/trpc/routers/projects.ts's sync.start procedure, we added a cleanup. Initially, I tried prisma.projectSync.updateMany() to null previousSyncId for orphaned syncs, but quickly hit a Prisma limitation: updateMany can't set relation fields to null in batch operations.
The Workaround: When Prisma's ORM reaches its limits, executeRaw comes to the rescue! We dropped down to raw SQL to identify and clean up projectSync records older than 10 minutes, nulling their previousSyncId to free up the unique constraint. This also extended our status check to correctly identify "running" syncs (commit 0c0a15c).
// Example of the raw SQL approach (simplified)
await prisma.$executeRaw`
UPDATE "ProjectSync"
SET "previousSyncId" = NULL
WHERE "createdAt" < NOW() - INTERVAL '10 minutes'
AND "status" = 'running';
`;
Expanding Horizons: AI Models & UX Refinements
The AI landscape moves at warp speed, and staying current with the latest models is crucial for our platform's value proposition.
1. A Blooming Model Catalog
We significantly expanded our MODEL_CATALOG in src/lib/constants.ts (commit 6d182c5). Say hello to:
- Claude: Opus 4.6, Sonnet 4.6, Sonnet 4.5
- GPT: GPT-5, GPT-5 Mini, o3, o4 Mini, GPT-4.1, GPT-4.1 Mini, GPT-4.1 Nano
- Gemini: 2.5 Flash-Lite
We also updated default adapters to leverage these new models: anthropic.ts now points to claude-sonnet-4-6, and openai.ts defaults to gpt-4.1-mini. Our FAST_MODELS list also received gpt-4.1-mini as its new OpenAI entry. Keeping our users at the cutting edge!
2. Streamlining GitHub Source Management
A small but impactful UX improvement went into src/app/(dashboard)/dashboard/projects/[id]/page.tsx (commits d93fe5a, 1cb7225). The "Sources" tab now features an "Import from GitHub" button in its empty state, making it more intuitive to get started.
Additionally, we added an editable "Scan path" input for githubPath in the project settings form. This allows users to specify subdirectories for their analysis, with a clear "/" displayed for an empty (root) path. A quick production database update for our CodeMCP project (a9568f7d) set its githubPath to "" (root) from .memory, ensuring it scans the entire repository.
Unearthing Critical Gaps: The Note Enrichment Deep Dive
Not everything in a dev session is about fixing bugs or adding features; sometimes it's about understanding existing systems better. I traced the full note enrichment flow through src/server/services/note-enrichment.ts to prepare for upcoming enhancements.
The Finding: A critical gap was discovered. While enrichment correctly loaded consolidation patterns, it completely missed other crucial data points like code_patterns (354 in one project!), memory_entries (748!), and workflow_insights (605). This means our enrichment process wasn't leveraging the full spectrum of available project intelligence. This discovery immediately queued up a critical next step.
Lessons from the Trenches: My "Pain Log" Transformed
Not every attempt sails smoothly. Here are some of the "pain points" from the session, reframed as valuable lessons learned:
Lesson 1: The Cardinal Rule of Deployment - git push Before git pull
The Problem: I deployed a batch of 6 commits via ssh root@... && git pull && docker compose build. Production showed old code, and git pull reported "up to date."
The Realization: My local commits were never pushed to GitHub! The deploy script pulls from the remote repository, not my local changes.
The Takeaway: Always, always git push origin main BEFORE initiating a remote git pull-based deployment. This sounds obvious, but in the heat of the moment, it's easy to overlook. I've now added git push to my commit+deploy one-liners to prevent future mishaps.
Lesson 2: Prisma's Nuances - When to Go Raw
The Problem: As mentioned earlier, prisma.projectSync.updateMany() failed to set relation fields (previousSyncId) to null.
The Realization: While Prisma is fantastic for most ORM operations, it has limitations, especially with batch updates involving relations or complex SQL logic.
The Takeaway: Don't be afraid to use $executeRaw for raw SQL queries when the ORM abstraction becomes a hindrance. It's a powerful escape hatch that allows you to leverage the full capabilities of your database directly.
Lesson 3: Systemic Bugs & Grep's Power
The Problem: What started as "project not found" errors on imports quickly escalated to "project not found" errors on all mutations.
The Realization: A seemingly isolated bug was actually a systemic architectural flaw (the userId filter everywhere).
The Takeaway: When a bug appears in multiple places or for multiple actions, it's a strong indicator of a deeper, more fundamental issue. Don't patch symptoms; hunt for the root cause. Tools like grep (or your IDE's global search) are invaluable for quickly identifying all instances of a problematic pattern and ensuring a comprehensive fix.
What's Next? Maintaining Momentum
With these challenges tackled, we're now poised for some significant advancements:
- Critical: Expand Note Enrichment: Wiring
code_patterns,workflow_insights, andmemory_entriesintosrc/server/services/note-enrichment.tsis the immediate top priority. This will unlock a huge amount of latent project intelligence. - ProviderModelPicker Rollout: Migrating the remaining old provider/model UIs in
docs-pipeline,refactor, anddiscussionsto our newProviderModelPickercomponent. This will standardize our model selection UX. - Cleanup: Removing deprecated components (
src/components/discussion/provider-picker.tsx) and server procedures (discussions.availableProviders). - API Verification: Keeping an eye on Anthropic API access once the pending payment clears.
This session was a testament to the dynamic nature of software development – a blend of meticulous bug-fixing, strategic feature expansion, and continuous learning. Every line of code, every bug squashed, and every lesson learned strengthens the foundation for what's next.
{
"thingsDone": [
"Implemented duplicate analysis run guards across services.",
"Purged userId filters from all findFirst guards in project mutations, switching to tenantId.",
"Added raw SQL cleanup for orphaned project syncs, nulling previousSyncId.",
"Expanded AI model catalog with new Claude, GPT, and Gemini models.",
"Updated default AI model adapters and FAST_MODELS list.",
"Improved Sources tab UI with GitHub import button and editable scan path.",
"Traced note enrichment flow and identified critical missing data sources."
],
"pains": [
"Deployment failure due to unpushed commits (git pull vs. git push).",
"Prisma updateMany limitation for setting relation fields to null, requiring raw SQL.",
"Systemic 'project not found' errors traced to widespread userId filter misuse."
],
"successes": [
"Successfully implemented robust multi-tenant access control for projects.",
"Resolved database integrity issues related to project syncs.",
"Significantly expanded and updated AI model offerings.",
"Improved user experience for GitHub source management.",
"Identified a critical data enrichment gap, paving the way for major improvements.",
"Learned valuable lessons about deployment best practices and Prisma's capabilities."
],
"techStack": [
"Next.js",
"Prisma",
"tRPC",
"TypeScript",
"Docker",
"PostgreSQL",
"GitHub",
"Anthropic API",
"OpenAI API"
]
}