Building Project Sync: Our First Sprint Towards Smarter Repos

Development is rarely a straight line. It's a journey of discovery, problem-solving, and sometimes, wrestling with database constraints. Today, we're pulling back the curtain on the initial phase of one of our most anticipated features: Project Sync. Our goal is to empower developers with an always up-to-date understanding of their codebase, directly integrated into their workflow.

This feature is a big one, broken down into 13 distinct tasks for Phase 1, focusing on core synchronization logic and branch selection. We've just wrapped up the first four crucial tasks, laying down the foundational backend architecture. Let's dive into what we've accomplished and the lessons we've learned along the way.

Laying the Data Groundwork: Schema and GitHub Integration

Any robust feature starts with a solid data model. For Project Sync, this meant a significant update to our prisma/schema.prisma.

Task 1: Evolving Our Data Model for Sync

We introduced the ProjectSync model, which will track each synchronization event for a given project. This model is critical for understanding the history and state of a project's synced data. But it wasn't just about adding a new table; we also extended existing models like MemoryEntry, RepositoryFile, and Repository with new sync-related fields. These fields allow us to link individual files and memory entries back to specific sync operations, enabling powerful historical tracking and diffing capabilities.

prisma

model ProjectSync {
  id               String    @id @default(cuid())
  projectId        String
  repositoryId     String
  branchName       String
  commitSha        String
  syncStartTime    DateTime  @default(now())
  syncEndTime      DateTime?
  status           String    @default("PENDING") // e.g., PENDING, IN_PROGRESS, COMPLETED, FAILED
  errorMessage     String?
  // ... other fields for sync metrics

  // Self-relation to track the previous sync in a chain
  previousSyncId   String?   @unique // This @unique was a critical discovery!
  previousSync     ProjectSync? @relation("ProjectSyncChain", fields: [previousSyncId], references: [id])
  nextSync         ProjectSync? @relation("ProjectSyncChain")

  project          Project @relation(fields: [projectId], references: [id])
  repository       Repository @relation(fields: [repositoryId], references: [id])
  memoryEntries    MemoryEntry[]
  repositoryFiles  RepositoryFile[]
}

// ... existing models extended with sync-related fields

Tasks 2 & 3: Connecting to the GitHub API

To sync a project, we first need to know what's in the project. This meant enhancing our src/server/services/github-connector.ts to interact more deeply with the GitHub API:

fetchBranches(): This new function allows us to retrieve all available branches for a given repository. This is crucial for the "branch selection" aspect of Phase 1.
fetchBranchHead(): Once a branch is selected, we need its latest commit SHA. This function fetches that specific detail.
fetchRepoTreeWithSha(): The real heavy lifting for content discovery happens here. This function takes a commit SHA and returns a flat list of TreeEntry[], each containing the file path, its unique sha (content hash), and size. This gives us a complete manifest of the repository's files at a specific commit, without having to clone the entire repo.

These GitHub API integrations are the eyes and ears of our sync service, providing the raw data needed to understand repository state.

The Brains of the Operation: Our Sync Service

Task 4: Introducing `project-sync-service.ts`

With the data model in place and GitHub connectivity established, we built the core logic for the synchronization process in src/server/services/project-sync-service.ts. This service is designed as a full AsyncGenerator pipeline, enabling efficient, stream-based processing of repository files.

The pipeline comprises four key stages:

prepare: Initializes the sync operation, fetches initial repository metadata, and sets up the ProjectSync record.
scan: Uses fetchRepoTreeWithSha() to get the current state of the repository's files. It then performs a diff-aware comparison against the previous successful sync (if one exists). This is where the magic happens: we identify new, modified, or deleted files, preventing unnecessary re-processing of unchanged content.
import: For new or modified files identified in the scan phase, this stage fetches their content, processes them (e.g., extracts code, generates embeddings), and updates MemoryEntry and RepositoryFile records.
finalize: Cleans up temporary resources, updates the ProjectSync record with its final status, and handles any post-sync operations.

This diff-aware approach is critical for performance and scalability, ensuring our system only processes what's changed, rather than re-indexing entire repositories on every sync.

A Quick Win: Backfill Success

Before diving into Project Sync, we also deployed a backfill endpoint that successfully restored 382 embeddings on our production system. This was a valuable test of our underlying data processing capabilities and gave us confidence in the robustness of our embedding generation pipeline.

Lessons Learned: Navigating Prisma's Unique Constraints

Even with careful planning, development throws curveballs. Our biggest "aha!" moment came during the schema design for the ProjectSync model's self-relation.

The Prisma Self-Relation Gotcha

We initially defined ProjectSync.previousSyncId intending for it to be a foreign key pointing to an earlier ProjectSync record, forming a chain of sync operations. We thought a simple previousSyncId String? would suffice.

prisma

// Initial attempt (failed)
model ProjectSync {
  // ...
  previousSyncId   String?
  previousSync     ProjectSync? @relation("ProjectSyncChain", fields: [previousSyncId], references: [id])
  nextSync         ProjectSync? @relation("ProjectSyncChain")
}

However, Prisma's validation threw an error: "Error: A one-to-one relation must use unique fields on both sides."

This was a critical reminder of how Prisma interprets relations. For a one-to-one relationship (which a self-referencing previousSyncId implies, as each sync can only have one previous sync), the foreign key must also be unique. If previousSyncId wasn't unique, multiple ProjectSync records could point to the same previous sync, effectively making it a one-to-many relationship from the perspective of the previousSync record.

The fix was straightforward but crucial: adding @unique to previousSyncId.

prisma

// Corrected schema
model ProjectSync {
  // ...
  previousSyncId   String?   @unique // This is the fix!
  previousSync     ProjectSync? @relation("ProjectSyncChain", fields: [previousSyncId], references: [id])
  nextSync         ProjectSync? @relation("ProjectSyncChain")
}

This highlighted the importance of understanding Prisma's strict interpretation of relation types and the underlying database constraints they enforce. While the schema isn't pushed to production yet, this experience serves as a valuable reminder for future database migrations – always test locally with npx prisma@5.22.0 db push (remembering to prefix with DATABASE_URL=... if you don't have a .env file) and plan for safe migrations on production.

What's Next on the Horizon?

With the backend foundation firmly in place, our immediate next steps involve bringing this powerful sync functionality to life in the user interface:

Task 5: Real-time Updates with SSE: An SSE endpoint (/api/v1/events/project-sync/[syncId]/route.ts) will provide real-time status updates as a sync operation progresses.
Task 6: tRPC Sync Sub-Router: Building out our tRPC API to expose functionalities like fetching branches, initiating syncs, checking status, viewing history, and restoring memory.
Tasks 7-10: Frontend Integration: Developing the useProjectSync hook, SyncBanner and SyncControls components, and integrating them directly into the Project Overview page.
Task 11: Smarter Queries: Implementing logic to filter superseded memory entries from active queries, ensuring users always see the most relevant, up-to-date information.
Tasks 12 & 13: Polish and Deploy: Comprehensive type checking, build verification, and finally, deploying Project Sync to production with a safe database migration.

This initial sprint has been incredibly productive, laying down the core backend infrastructure for a feature that promises to significantly enhance how developers interact with their codebases. We're excited to continue building and bring Project Sync to you soon!

Laying the Data Groundwork: Schema and GitHub Integration

Task 1: Evolving Our Data Model for Sync

Tasks 2 & 3: Connecting to the GitHub API

The Brains of the Operation: Our Sync Service

Task 4: Introducing project-sync-service.ts

A Quick Win: Backfill Success

Lessons Learned: Navigating Prisma's Unique Constraints

The Prisma Self-Relation Gotcha

What's Next on the Horizon?

Task 4: Introducing `project-sync-service.ts`