Project Sync: Phase 1 — From Blueprint to Production (and a Near-Disaster Recovery)

It’s been an intense but incredibly rewarding sprint! We've successfully launched Phase 1 of our Project Sync feature, a crucial step towards giving users granular control over how their codebase is ingested and managed within our platform. The initial feedback has been fantastic, and we're already gearing up for Phases 2 and 3.

This post isn't just a celebration of what we built, but also a candid look at the challenges we faced, particularly a hair-raising moment with our production database. Let's dive in!

Project Sync Phase 1: What We Built

The core idea behind Project Sync is to allow users to select a specific branch from their GitHub repository and initiate a "smart" synchronization process. This isn't just a blind import; it's designed to be diff-aware, only processing what's changed and providing real-time feedback.

Here's a breakdown of the key components we shipped in Phase 1:

1. The Data Foundation

At the heart of any new feature is the data model. We extended our prisma/schema.prisma to introduce a new ProjectSync model, which tracks the history and status of each sync operation. Crucially, we also extended existing models like MemoryEntry, RepositoryFile, and Repository with sync-related fields, enabling us to link specific data back to its sync origin and manage superseded entries effectively.

2. GitHub Integration & The Sync Engine

To enable branch selection, we beefed up our src/server/services/github-connector.ts. New functions like fetchBranches(), fetchBranchHead(), and fetchRepoTreeWithSha() were essential for interacting with GitHub's API to get the necessary repository metadata.

The real magic happens in src/server/services/project-sync-service.ts. This is where we implemented a full AsyncGenerator pipeline:

Prepare: Initial setup and validation.
Scan: Traverse the repository tree to identify changes.
Import: Process new or modified files.
Finalize: Clean up and mark the sync as complete.

This pipeline is inherently diff-aware, meaning it intelligently compares the current repository state with the last successful sync, preventing redundant processing and ensuring efficiency.

3. Real-time Feedback & User Interface

A complex, potentially long-running operation like a project sync demands real-time feedback. We built an SSE (Server-Sent Events) endpoint at src/app/api/v1/events/project-sync/[syncId]/route.ts to stream progress updates directly to the client.

On the frontend, src/hooks/use-project-sync.ts consumes these SSE events, powering a dynamic sync-banner.tsx that displays phase indicators, a progress bar, and sync statistics. Users can initiate a sync and select their desired branch through src/components/project/sync-controls.tsx, seamlessly integrated into src/components/project/project-overview.tsx.

4. API & Data Integrity

Our tRPC router (src/server/trpc/routers/projects.ts) now includes a dedicated sync sub-router, exposing endpoints for fetching branches, checking sync status, starting new syncs, viewing history, and even restoring memory.

Finally, to protect against "ghost" or superseded entries, we updated 9 files to include a status: "active" filter, ensuring that only the most relevant data is presented to the user.

Phase 1 is now live on production, with the schema applied, the app rebuilt, and all existing embeddings thankfully intact. But getting there wasn't without its dramatic moments...

Navigating the Treacherous Waters: Lessons Learned

Every significant feature development comes with its share of challenges. For Project Sync, one particular hurdle stood out, teaching us a critical lesson about production database management.

Lesson 1: Never, Ever `prisma db push` on Production (Seriously!)

The Problem: We needed to apply new schema changes to our production PostgreSQL database. My initial thought was to use prisma db push --accept-data-loss for a quick application, assuming Prisma would intelligently handle existing data where possible.

The Disaster: This was a grave error. prisma db push, especially with --accept-data-loss, is designed for development environments. It dropped the embedding vector(1536) column on our workflow_insights table – taking all 382 production embeddings with it! This was a moment of sheer panic.

The Recovery:

Immediate action: We recreated the embedding column using raw SQL: ALTER TABLE workflow_insights ADD COLUMN embedding vector(1536);
Data restoration: We triggered our src/app/api/v1/admin/backfill-embeddings/route.ts endpoint, which thankfully restored all 382 lost embeddings.
Schema application: The remaining schema changes were then applied individually via raw SQL statements, ensuring no further data loss.

The Takeaway: prisma db push is a development tool. For production, always use carefully crafted manual SQL migrations or a robust, tested migration script like our existing ./scripts/db-migrate-safe.sh. This script needs to be updated to protect these new sync-related columns as well. This was a stark reminder of the importance of disciplined database deployment processes.

Lesson 2: Escaping SQL in Nested SSH/Docker Contexts

The Problem: After the prisma db push incident, I needed to run raw SQL commands on the production database, which lives inside a Docker container on a remote server. My initial attempt involved using heredoc SQL with escaped quotes via SSH: ssh root@... docker exec psql -c "<<EOF ... SQL_WITH_ESCAPED_QUOTES ... EOF".

The Failure: The nested SSH and docker exec context proved problematic for quote escaping. The SQL parser inside the Docker container received malformed statements.

The Solution: Simplicity won. Instead of complex heredocs, I ran individual docker exec psql -c "..." commands for each SQL statement. This ensured proper parsing and execution.

Lesson 3: tRPC Context Typing - `ctx.user.id` vs `ctx.userId`

The Problem: Inside a tRPC router, I instinctively tried to access the user ID via ctx.userId. TypeScript immediately flagged this as an error, indicating the property didn't exist on the context.

The Solution: A quick check revealed the correct path: ctx.user.id. A common pattern when user authentication middleware enriches the context with a user object.

Lesson 4: Prisma Self-Relation with `@unique`

The Problem: When defining the ProjectSync model, I included a previousSyncId field to link syncs in a chain. Without a @unique decorator, Prisma's validation failed, as it expects unique foreign keys in certain self-referential scenarios.

The Solution: Adding @unique to the previousSyncId field resolved the validation error, allowing us to correctly model the relationship.

prisma

model ProjectSync {
  // ... other fields
  previousSyncId  String?    @unique
  previousSync    ProjectSync? @relation("SyncHistory", fields: [previousSyncId], references: [id])
  nextSyncs       ProjectSync[] @relation("SyncHistory")
}

What's Next? Phases 2 & 3!

With Phase 1 successfully deployed and our lessons learned, we're already looking forward to expanding Project Sync's capabilities:

Phase 2: Code Analysis & Docs Regeneration: Extend the sync pipeline to include steps for deeper code analysis and automatic documentation regeneration based on the synced codebase.
Phase 3: Consolidation, Axiom & Embedding Refresh: Further enhance the pipeline with consolidation steps, integration with our "Axiom" feature, and intelligent embedding refreshes to keep our knowledge base up-to-date.
Security: Implement RLS (Row-Level Security) policies for the new project_syncs table to ensure data privacy and access control.
Testing: Conduct thorough end-to-end testing of the entire sync feature on a real-world GitHub project.
Database Safety: Revisit and update our ./scripts/db-migrate-safe.sh to explicitly protect the new sync-related columns during future migrations.

This journey has been a fantastic blend of technical innovation and practical problem-solving. We're excited to continue building out Project Sync and deliver even more value to our users!