nyxcore-systems
7 min read

Deep Dive: Building a Project Sync Engine, One Task at a Time (Part 1)

Ever wondered what it takes to build a robust system that keeps your internal 'memory' in perfect sync with external code repositories? Join me as I recount the first critical steps, from schema design to the core async pipeline, and the inevitable database hiccups along the way.

PrismaTypeScriptGitHub APISystem DesignBackendDatabaseAsyncGeneratorDevLog

Building complex features is rarely a straight line. It's a series of focused sprints, unexpected detours, and those 'aha!' moments that make it all worthwhile. Today, I want to share a recent development session where I dove headfirst into implementing a critical new feature: Project Sync.

Our goal with Project Sync is ambitious: to seamlessly integrate our internal knowledge base (what we call 'memory entries') with the actual code in our repositories. This means intelligently fetching code, tracking changes, and ensuring our system's understanding of a project is always up-to-date. This session marked the beginning of Phase 1, tackling the first 13 tasks from our design document.

Here's a look at how the session unfolded, what we accomplished, and the valuable lessons learned.

Laying the Foundation: Schema and External Connectivity

The initial tasks focused on establishing the bedrock for our sync mechanism. You can't sync what you can't model, nor can you sync what you can't fetch.

Task 1: The Data Model – Extending Our Schema

The very first step was to define how Project Sync would live within our database. Using Prisma, I updated prisma/schema.prisma to introduce the ProjectSync model. This model will track each synchronization event, its status, and crucial metadata.

Crucially, I also extended existing models like MemoryEntry, RepositoryFile, and Repository with new fields to link them to our sync process. This ensures that every piece of data knows which sync operation brought it into existence or last updated it.

prisma
// Simplified snippet from prisma/schema.prisma
model ProjectSync {
  id               String      @id @default(cuid())
  repositoryId     String
  repository       Repository  @relation(fields: [repositoryId], references: [id])
  branch           String
  status           SyncStatus  @default(PENDING)
  startedAt        DateTime    @default(now())
  finishedAt       DateTime?
  error            String?
  // ... other fields to track sync progress and details

  // Self-relation to track previous syncs for diffing
  previousSyncId   String?     @unique // This became important later!
  previousSync     ProjectSync? @relation("ProjectSyncHistory", fields: [previousSyncId], references: [id])
  nextSync         ProjectSync? @relation("ProjectSyncHistory")

  // Link to files and memory entries created/updated by this sync
  repositoryFiles  RepositoryFile[]
  memoryEntries    MemoryEntry[]
}

// ... other models extended with sync-related fields

Tasks 2 & 3: Connecting to GitHub – The Source of Truth

With the data model in place, the next logical step was to build the bridge to our code repositories. For us, that means GitHub. I enhanced src/server/services/github-connector.ts with two vital functions:

  1. fetchBranches(): This function allows us to list all available branches for a given repository. Essential for users to select which branch they want to sync.
  2. fetchBranchHead(): Given a branch name, this retrieves its latest commit SHA. This is crucial for knowing what state we're syncing against.
  3. fetchRepoTreeWithSha(): This is where the magic starts. It fetches the entire file tree for a repository at a specific SHA, returning a list of TreeEntry[] (path, SHA, size). This gives us a granular view of every file and its unique identifier, enabling efficient change detection later.

The Engine Room: Crafting the Project Sync Service

Task 4: The Core Logic – project-sync-service.ts

This was the most substantial part of the initial sprint. I created src/server/services/project-sync-service.ts, which now houses the full asynchronous pipeline for Project Sync. This isn't just a simple function; it's a robust AsyncGenerator that orchestrates the entire sync process in distinct, observable stages:

  1. prepare: Initializes the sync, fetches necessary metadata, and sets up the environment.
  2. scan: Compares the fetched GitHub tree with our existing RepositoryFile entries. It identifies new files, modified files, deleted files, and unchanged files. This is where the TreeEntry SHAs become incredibly useful for efficient diffing.
  3. import: For new or modified files, it fetches their content, processes them, and creates/updates RepositoryFile and MemoryEntry records in our database.
  4. finalize: Cleans up, updates the ProjectSync record with the final status, and handles any post-sync actions.

The use of an AsyncGenerator here is key. It allows us to stream updates about the sync's progress back to the client in real-time, providing a much better user experience than a fire-and-forget background job.

Navigating the Trenches: A Prisma Self-Relation Gotcha

Not everything was smooth sailing, and that's often where the best lessons are learned.

The Problem: Prisma Self-Relation Without @unique

I designed the ProjectSync model with a self-referencing relation (previousSyncId) to link a sync operation to its predecessor. This is vital for calculating diffs and understanding the evolution of a project's state over time.

prisma
// My initial (problematic) schema design for the relation:
model ProjectSync {
  // ...
  previousSyncId   String?
  previousSync     ProjectSync? @relation("ProjectSyncHistory", fields: [previousSyncId], references: [id])
  nextSync         ProjectSync? @relation("ProjectSyncHistory")
}

My intention was for previousSyncId to be a foreign key that could be null (for the very first sync) and point to another ProjectSync record. However, when I tried to generate a migration or push the schema, Prisma threw a validation error: "A one-to-one relation needs unique fields."

The Lesson: Understanding Prisma's Relation Constraints

Prisma's design for one-to-one relations requires that the field on the child side (in this case, previousSyncId on the ProjectSync model itself) be unique. This ensures that a ProjectSync record can only be the previousSync for one other ProjectSync record, maintaining the one-to-one mapping. My mental model was slightly off; I was thinking of it more like a many-to-one where previousSyncId could be duplicated across multiple records if they somehow shared a common predecessor, which isn't what previousSyncId implies.

The Fix: Adding @unique

The solution was straightforward but critical: add @unique to previousSyncId.

prisma
// The corrected schema snippet:
model ProjectSync {
  // ...
  previousSyncId   String?     @unique // The fix!
  previousSync     ProjectSync? @relation("ProjectSyncHistory", fields: [previousSyncId], references: [id])
  nextSync         ProjectSync? @relation("ProjectSyncHistory")
}

This ensures that each ProjectSync record can only point to one previousSync, and crucially, only one ProjectSync record can claim a specific ProjectSync as its previousSync. It enforces the strict chronological chain we need.

Operational Notes for Prisma

  • Schema Not Pushed Yet: This schema change is still local. It requires npx prisma@5.22.0 db push for local development (which I'll do on my fresh Docker setup) and a carefully managed safe migration for production. This is a reminder that schema changes, especially on production, need respect and a plan.
  • Local .env: A minor but common friction point: my local node_modules were installed, but I forgot to set up my .env file. For Prisma commands, I often just prefix them with DATABASE_URL=... to quickly get things done without a full .env setup when I'm just poking around.

The Road Ahead: What's Next for Project Sync

With the core backend logic and data modeling largely in place, the immediate next steps are to expose this functionality and integrate it into our application's frontend.

  1. Task 5: SSE Endpoint (/api/v1/events/project-sync/[syncId]/route.ts): This will be the conduit for streaming those AsyncGenerator updates to the frontend, giving users real-time feedback on their sync operations.
  2. Task 6: tRPC Sync Sub-router: Building out the API endpoints for initiating syncs, checking status, viewing history, and potentially restoring memory from previous syncs.
  3. Tasks 7-10: Frontend Integration: Developing the useProjectSync hook, SyncBanner and SyncControls components, and integrating them into the Project Overview page. This is where the user experience truly comes alive.
  4. Tasks 11-13: Refinements and Deployment: Filtering superseded entries from active queries, thorough type-checking and build verification, and finally, the production deployment, complete with safe migrations.

Wrapping Up

This session was a significant leap forward for Project Sync. We've laid down the crucial database schema, built the GitHub integration, and—most importantly—crafted the core asynchronous pipeline that will power the entire feature. The Prisma self-relation hiccup served as a valuable reminder to always understand the nuances of your ORM's constraints.

I'm incredibly excited about the next steps, particularly bringing the real-time feedback to life with SSE and the frontend components. Stay tuned for more updates as Project Sync evolves!


json
{
  "thingsDone": [
    "Implemented ProjectSync model and extended related models in Prisma schema.",
    "Added fetchBranches(), fetchBranchHead(), and fetchRepoTreeWithSha() to GitHub connector service.",
    "Created ProjectSyncService with a full AsyncGenerator pipeline (prepare, scan, import, finalize) for diff-aware memory and repo file synchronization.",
    "Restored 382 embeddings on production via a backfill endpoint (earlier work)."
  ],
  "pains": [
    "Encountered Prisma validation error for one-to-one self-relation (ProjectSync.previousSyncId) requiring a unique field.",
    "Temporary friction with local Prisma commands due to missing .env file."
  ],
  "successes": [
    "Successfully resolved Prisma schema validation by adding @unique to previousSyncId.",
    "Established a robust AsyncGenerator-based backend pipeline for project synchronization.",
    "Integrated essential GitHub API calls for repository introspection."
  ],
  "techStack": [
    "Prisma (ORM)",
    "TypeScript",
    "GitHub API",
    "Node.js (Backend)",
    "PostgreSQL (Database)",
    "Redis",
    "Docker (Local Dev)",
    "tRPC (API Layer)",
    "SSE (Server-Sent Events)"
  ]
}