Building Project Sync: Our First Sprint Towards Smarter Repos
Join us as we recount the initial sprint for Project Sync, tackling schema design, GitHub API integration, and the first iteration of our powerful new sync service, complete with a few unexpected Prisma challenges.
Development is rarely a straight line. It's a journey of discovery, problem-solving, and sometimes, wrestling with database constraints. Today, we're pulling back the curtain on the initial phase of one of our most anticipated features: Project Sync. Our goal is to empower developers with an always up-to-date understanding of their codebase, directly integrated into their workflow.
This feature is a big one, broken down into 13 distinct tasks for Phase 1, focusing on core synchronization logic and branch selection. We've just wrapped up the first four crucial tasks, laying down the foundational backend architecture. Let's dive into what we've accomplished and the lessons we've learned along the way.
Laying the Data Groundwork: Schema and GitHub Integration
Any robust feature starts with a solid data model. For Project Sync, this meant a significant update to our prisma/schema.prisma.
Task 1: Evolving Our Data Model for Sync
We introduced the ProjectSync model, which will track each synchronization event for a given project. This model is critical for understanding the history and state of a project's synced data. But it wasn't just about adding a new table; we also extended existing models like MemoryEntry, RepositoryFile, and Repository with new sync-related fields. These fields allow us to link individual files and memory entries back to specific sync operations, enabling powerful historical tracking and diffing capabilities.
model ProjectSync {
id String @id @default(cuid())
projectId String
repositoryId String
branchName String
commitSha String
syncStartTime DateTime @default(now())
syncEndTime DateTime?
status String @default("PENDING") // e.g., PENDING, IN_PROGRESS, COMPLETED, FAILED
errorMessage String?
// ... other fields for sync metrics
// Self-relation to track the previous sync in a chain
previousSyncId String? @unique // This @unique was a critical discovery!
previousSync ProjectSync? @relation("ProjectSyncChain", fields: [previousSyncId], references: [id])
nextSync ProjectSync? @relation("ProjectSyncChain")
project Project @relation(fields: [projectId], references: [id])
repository Repository @relation(fields: [repositoryId], references: [id])
memoryEntries MemoryEntry[]
repositoryFiles RepositoryFile[]
}
// ... existing models extended with sync-related fields
Tasks 2 & 3: Connecting to the GitHub API
To sync a project, we first need to know what's in the project. This meant enhancing our src/server/services/github-connector.ts to interact more deeply with the GitHub API:
fetchBranches(): This new function allows us to retrieve all available branches for a given repository. This is crucial for the "branch selection" aspect of Phase 1.fetchBranchHead(): Once a branch is selected, we need its latest commit SHA. This function fetches that specific detail.fetchRepoTreeWithSha(): The real heavy lifting for content discovery happens here. This function takes a commit SHA and returns a flat list ofTreeEntry[], each containing the filepath, its uniquesha(content hash), andsize. This gives us a complete manifest of the repository's files at a specific commit, without having to clone the entire repo.
These GitHub API integrations are the eyes and ears of our sync service, providing the raw data needed to understand repository state.
The Brains of the Operation: Our Sync Service
Task 4: Introducing project-sync-service.ts
With the data model in place and GitHub connectivity established, we built the core logic for the synchronization process in src/server/services/project-sync-service.ts. This service is designed as a full AsyncGenerator pipeline, enabling efficient, stream-based processing of repository files.
The pipeline comprises four key stages:
prepare: Initializes the sync operation, fetches initial repository metadata, and sets up theProjectSyncrecord.scan: UsesfetchRepoTreeWithSha()to get the current state of the repository's files. It then performs a diff-aware comparison against the previous successful sync (if one exists). This is where the magic happens: we identify new, modified, or deleted files, preventing unnecessary re-processing of unchanged content.import: For new or modified files identified in thescanphase, this stage fetches their content, processes them (e.g., extracts code, generates embeddings), and updatesMemoryEntryandRepositoryFilerecords.finalize: Cleans up temporary resources, updates theProjectSyncrecord with its final status, and handles any post-sync operations.
This diff-aware approach is critical for performance and scalability, ensuring our system only processes what's changed, rather than re-indexing entire repositories on every sync.
A Quick Win: Backfill Success
Before diving into Project Sync, we also deployed a backfill endpoint that successfully restored 382 embeddings on our production system. This was a valuable test of our underlying data processing capabilities and gave us confidence in the robustness of our embedding generation pipeline.
Lessons Learned: Navigating Prisma's Unique Constraints
Even with careful planning, development throws curveballs. Our biggest "aha!" moment came during the schema design for the ProjectSync model's self-relation.
The Prisma Self-Relation Gotcha
We initially defined ProjectSync.previousSyncId intending for it to be a foreign key pointing to an earlier ProjectSync record, forming a chain of sync operations. We thought a simple previousSyncId String? would suffice.
// Initial attempt (failed)
model ProjectSync {
// ...
previousSyncId String?
previousSync ProjectSync? @relation("ProjectSyncChain", fields: [previousSyncId], references: [id])
nextSync ProjectSync? @relation("ProjectSyncChain")
}
However, Prisma's validation threw an error: "Error: A one-to-one relation must use unique fields on both sides."
This was a critical reminder of how Prisma interprets relations. For a one-to-one relationship (which a self-referencing previousSyncId implies, as each sync can only have one previous sync), the foreign key must also be unique. If previousSyncId wasn't unique, multiple ProjectSync records could point to the same previous sync, effectively making it a one-to-many relationship from the perspective of the previousSync record.
The fix was straightforward but crucial: adding @unique to previousSyncId.
// Corrected schema
model ProjectSync {
// ...
previousSyncId String? @unique // This is the fix!
previousSync ProjectSync? @relation("ProjectSyncChain", fields: [previousSyncId], references: [id])
nextSync ProjectSync? @relation("ProjectSyncChain")
}
This highlighted the importance of understanding Prisma's strict interpretation of relation types and the underlying database constraints they enforce. While the schema isn't pushed to production yet, this experience serves as a valuable reminder for future database migrations – always test locally with npx prisma@5.22.0 db push (remembering to prefix with DATABASE_URL=... if you don't have a .env file) and plan for safe migrations on production.
What's Next on the Horizon?
With the backend foundation firmly in place, our immediate next steps involve bringing this powerful sync functionality to life in the user interface:
- Task 5: Real-time Updates with SSE: An SSE endpoint (
/api/v1/events/project-sync/[syncId]/route.ts) will provide real-time status updates as a sync operation progresses. - Task 6: tRPC Sync Sub-Router: Building out our tRPC API to expose functionalities like fetching branches, initiating syncs, checking status, viewing history, and restoring memory.
- Tasks 7-10: Frontend Integration: Developing the
useProjectSynchook,SyncBannerandSyncControlscomponents, and integrating them directly into the Project Overview page. - Task 11: Smarter Queries: Implementing logic to filter superseded memory entries from active queries, ensuring users always see the most relevant, up-to-date information.
- Tasks 12 & 13: Polish and Deploy: Comprehensive type checking, build verification, and finally, deploying Project Sync to production with a safe database migration.
This initial sprint has been incredibly productive, laying down the core backend infrastructure for a feature that promises to significantly enhance how developers interact with their codebases. We're excited to continue building and bring Project Sync to you soon!