Deep Dive: Building a Project Sync Engine, One Task at a Time (Part 1)
Ever wondered what it takes to build a robust system that keeps your internal 'memory' in perfect sync with external code repositories? Join me as I recount the first critical steps, from schema design to the core async pipeline, and the inevitable database hiccups along the way.
Building complex features is rarely a straight line. It's a series of focused sprints, unexpected detours, and those 'aha!' moments that make it all worthwhile. Today, I want to share a recent development session where I dove headfirst into implementing a critical new feature: Project Sync.
Our goal with Project Sync is ambitious: to seamlessly integrate our internal knowledge base (what we call 'memory entries') with the actual code in our repositories. This means intelligently fetching code, tracking changes, and ensuring our system's understanding of a project is always up-to-date. This session marked the beginning of Phase 1, tackling the first 13 tasks from our design document.
Here's a look at how the session unfolded, what we accomplished, and the valuable lessons learned.
Laying the Foundation: Schema and External Connectivity
The initial tasks focused on establishing the bedrock for our sync mechanism. You can't sync what you can't model, nor can you sync what you can't fetch.
Task 1: The Data Model – Extending Our Schema
The very first step was to define how Project Sync would live within our database. Using Prisma, I updated prisma/schema.prisma to introduce the ProjectSync model. This model will track each synchronization event, its status, and crucial metadata.
Crucially, I also extended existing models like MemoryEntry, RepositoryFile, and Repository with new fields to link them to our sync process. This ensures that every piece of data knows which sync operation brought it into existence or last updated it.
// Simplified snippet from prisma/schema.prisma
model ProjectSync {
id String @id @default(cuid())
repositoryId String
repository Repository @relation(fields: [repositoryId], references: [id])
branch String
status SyncStatus @default(PENDING)
startedAt DateTime @default(now())
finishedAt DateTime?
error String?
// ... other fields to track sync progress and details
// Self-relation to track previous syncs for diffing
previousSyncId String? @unique // This became important later!
previousSync ProjectSync? @relation("ProjectSyncHistory", fields: [previousSyncId], references: [id])
nextSync ProjectSync? @relation("ProjectSyncHistory")
// Link to files and memory entries created/updated by this sync
repositoryFiles RepositoryFile[]
memoryEntries MemoryEntry[]
}
// ... other models extended with sync-related fields
Tasks 2 & 3: Connecting to GitHub – The Source of Truth
With the data model in place, the next logical step was to build the bridge to our code repositories. For us, that means GitHub. I enhanced src/server/services/github-connector.ts with two vital functions:
fetchBranches(): This function allows us to list all available branches for a given repository. Essential for users to select which branch they want to sync.fetchBranchHead(): Given a branch name, this retrieves its latest commit SHA. This is crucial for knowing what state we're syncing against.fetchRepoTreeWithSha(): This is where the magic starts. It fetches the entire file tree for a repository at a specific SHA, returning a list ofTreeEntry[](path, SHA, size). This gives us a granular view of every file and its unique identifier, enabling efficient change detection later.
The Engine Room: Crafting the Project Sync Service
Task 4: The Core Logic – project-sync-service.ts
This was the most substantial part of the initial sprint. I created src/server/services/project-sync-service.ts, which now houses the full asynchronous pipeline for Project Sync. This isn't just a simple function; it's a robust AsyncGenerator that orchestrates the entire sync process in distinct, observable stages:
prepare: Initializes the sync, fetches necessary metadata, and sets up the environment.scan: Compares the fetched GitHub tree with our existingRepositoryFileentries. It identifies new files, modified files, deleted files, and unchanged files. This is where theTreeEntrySHAs become incredibly useful for efficient diffing.import: For new or modified files, it fetches their content, processes them, and creates/updatesRepositoryFileandMemoryEntryrecords in our database.finalize: Cleans up, updates theProjectSyncrecord with the final status, and handles any post-sync actions.
The use of an AsyncGenerator here is key. It allows us to stream updates about the sync's progress back to the client in real-time, providing a much better user experience than a fire-and-forget background job.
Navigating the Trenches: A Prisma Self-Relation Gotcha
Not everything was smooth sailing, and that's often where the best lessons are learned.
The Problem: Prisma Self-Relation Without @unique
I designed the ProjectSync model with a self-referencing relation (previousSyncId) to link a sync operation to its predecessor. This is vital for calculating diffs and understanding the evolution of a project's state over time.
// My initial (problematic) schema design for the relation:
model ProjectSync {
// ...
previousSyncId String?
previousSync ProjectSync? @relation("ProjectSyncHistory", fields: [previousSyncId], references: [id])
nextSync ProjectSync? @relation("ProjectSyncHistory")
}
My intention was for previousSyncId to be a foreign key that could be null (for the very first sync) and point to another ProjectSync record. However, when I tried to generate a migration or push the schema, Prisma threw a validation error: "A one-to-one relation needs unique fields."
The Lesson: Understanding Prisma's Relation Constraints
Prisma's design for one-to-one relations requires that the field on the child side (in this case, previousSyncId on the ProjectSync model itself) be unique. This ensures that a ProjectSync record can only be the previousSync for one other ProjectSync record, maintaining the one-to-one mapping. My mental model was slightly off; I was thinking of it more like a many-to-one where previousSyncId could be duplicated across multiple records if they somehow shared a common predecessor, which isn't what previousSyncId implies.
The Fix: Adding @unique
The solution was straightforward but critical: add @unique to previousSyncId.
// The corrected schema snippet:
model ProjectSync {
// ...
previousSyncId String? @unique // The fix!
previousSync ProjectSync? @relation("ProjectSyncHistory", fields: [previousSyncId], references: [id])
nextSync ProjectSync? @relation("ProjectSyncHistory")
}
This ensures that each ProjectSync record can only point to one previousSync, and crucially, only one ProjectSync record can claim a specific ProjectSync as its previousSync. It enforces the strict chronological chain we need.
Operational Notes for Prisma
- Schema Not Pushed Yet: This schema change is still local. It requires
npx prisma@5.22.0 db pushfor local development (which I'll do on my fresh Docker setup) and a carefully managed safe migration for production. This is a reminder that schema changes, especially on production, need respect and a plan. - Local
.env: A minor but common friction point: my localnode_moduleswere installed, but I forgot to set up my.envfile. For Prisma commands, I often just prefix them withDATABASE_URL=...to quickly get things done without a full.envsetup when I'm just poking around.
The Road Ahead: What's Next for Project Sync
With the core backend logic and data modeling largely in place, the immediate next steps are to expose this functionality and integrate it into our application's frontend.
- Task 5: SSE Endpoint (
/api/v1/events/project-sync/[syncId]/route.ts): This will be the conduit for streaming thoseAsyncGeneratorupdates to the frontend, giving users real-time feedback on their sync operations. - Task 6: tRPC Sync Sub-router: Building out the API endpoints for initiating syncs, checking status, viewing history, and potentially restoring memory from previous syncs.
- Tasks 7-10: Frontend Integration: Developing the
useProjectSynchook,SyncBannerandSyncControlscomponents, and integrating them into the Project Overview page. This is where the user experience truly comes alive. - Tasks 11-13: Refinements and Deployment: Filtering superseded entries from active queries, thorough type-checking and build verification, and finally, the production deployment, complete with safe migrations.
Wrapping Up
This session was a significant leap forward for Project Sync. We've laid down the crucial database schema, built the GitHub integration, and—most importantly—crafted the core asynchronous pipeline that will power the entire feature. The Prisma self-relation hiccup served as a valuable reminder to always understand the nuances of your ORM's constraints.
I'm incredibly excited about the next steps, particularly bringing the real-time feedback to life with SSE and the frontend components. Stay tuned for more updates as Project Sync evolves!
{
"thingsDone": [
"Implemented ProjectSync model and extended related models in Prisma schema.",
"Added fetchBranches(), fetchBranchHead(), and fetchRepoTreeWithSha() to GitHub connector service.",
"Created ProjectSyncService with a full AsyncGenerator pipeline (prepare, scan, import, finalize) for diff-aware memory and repo file synchronization.",
"Restored 382 embeddings on production via a backfill endpoint (earlier work)."
],
"pains": [
"Encountered Prisma validation error for one-to-one self-relation (ProjectSync.previousSyncId) requiring a unique field.",
"Temporary friction with local Prisma commands due to missing .env file."
],
"successes": [
"Successfully resolved Prisma schema validation by adding @unique to previousSyncId.",
"Established a robust AsyncGenerator-based backend pipeline for project synchronization.",
"Integrated essential GitHub API calls for repository introspection."
],
"techStack": [
"Prisma (ORM)",
"TypeScript",
"GitHub API",
"Node.js (Backend)",
"PostgreSQL (Database)",
"Redis",
"Docker (Local Dev)",
"tRPC (API Layer)",
"SSE (Server-Sent Events)"
]
}