From GitHub to Your App: Orchestrating a Real-time Sync Engine with tRPC and SSE

It’s 5:30 PM on a Thursday, and the commit messages are flowing. My fingers are still warm from the keyboard, but there’s a distinct feeling of accomplishment in the air. We just wrapped Phase 1 of our project synchronization feature, and it feels good – really good.

This isn't just about pushing code; it's about bringing a complex system to life, ensuring our application stays perfectly in sync with external sources (in this case, GitHub repositories). It’s a delicate dance of fetching, diffing, persisting, and providing real-time feedback, and we've just nailed the core choreography.

The Mission: Project Sync Phase 1

Our primary goal for this session was to get the initial "Project Sync" up and running, complete with branch selection. This means users can connect a project to a GitHub repository, pick a branch, and our system will pull down the necessary files and metadata, keeping our internal state a faithful reflection of the chosen repository.

Phase 1 focused on laying down the robust foundation: defining the data models, building the GitHub integration, crafting a sophisticated sync pipeline, and wiring up a responsive user interface. And as of today, tasks 1 through 12 are complete. We're staring down Task 13: the production deployment.

Deconstructing the Sync Engine: What We Built

Bringing a feature like this to life requires a full-stack approach, touching almost every part of our system. Here’s a breakdown of the key components we shipped:

Schema Design: We introduced the ProjectSync model to track each synchronization attempt, along with extensions for MemoryEntry, RepositoryFile, and Repository. This allows us to store not just the current state of a repository, but also the historical context of each sync, crucial for future features like rollback or detailed auditing.
github-connector: Our dedicated service for talking to the GitHub API. We implemented fetchBranches to let users select their desired branch, fetchBranchHead to get the latest commit SHA, and fetchRepoTreeWithSha to retrieve the entire file tree for a given commit. This layer abstracts away the complexities of GitHub's API, making our sync service cleaner.

project-sync-service.ts: The Heartbeat of Sync This is where the magic happens. We engineered a full 4-phase AsyncGenerator pipeline with diff-awareness. Why an AsyncGenerator? Because synchronization is a long-running process. We need to fetch, process, and persist, all while providing real-time updates and allowing for potential cancellation. An AsyncGenerator is perfect for this, yielding progress updates at each step:

Fetch: Pulling the latest data from GitHub.
Diff: Comparing the fetched data with our current internal state to identify changes. This "diff-awareness" is key to efficiency, ensuring we only process what's actually changed.
Process: Applying the identified changes (e.g., creating new files, updating existing ones, marking old ones as superseded).
Persist: Saving the new state and sync history to our database.

typescript

// Conceptual AsyncGenerator for a project sync process
async function* projectSyncGenerator(syncId: string, projectId: string, branch: string): AsyncGenerator<SyncProgressEvent, void, void> {
  try {
    yield { syncId, status: 'FETCHING', progress: 10, message: 'Fetching latest from GitHub...' };
    const githubTree = await githubConnector.fetchRepoTree(projectId, branch);

    yield { syncId, status: 'DIFFING', progress: 30, message: 'Calculating differences...' };
    const { added, modified, removed } = diffService.calculateChanges(githubTree, await dbService.getCurrentState(projectId));

    yield { syncId, status: 'PROCESSING', progress: 60, message: `Processing ${added.length + modified.length + removed.length} changes...` };
    await dbService.applyChanges(projectId, added, modified, removed); // This might involve multiple yields for granular updates

    yield { syncId, status: 'PERSISTING', progress: 90, message: 'Saving sync history...' };
    await dbService.recordSyncCompletion(syncId);

    yield { syncId, status: 'COMPLETE', progress: 100, message: 'Project sync complete!' };
  } catch (error) {
    yield { syncId, status: 'FAILED', progress: 0, message: `Sync failed: ${error.message}` };
    throw error; // Re-throw to propagate the error
  }
}

SSE Endpoint (/api/v1/events/project-sync/[syncId]): Real-time Feedback To keep the user informed, we hooked up the AsyncGenerator to a Server-Sent Events (SSE) endpoint. As the generator yields progress updates, these events are streamed directly to the frontend, providing a seamless, real-time view of the sync status.

tRPC Integration (projects.sync sub-router): Type-Safe API Our API layer for sync operations leverages tRPC for end-to-end type safety. We defined a projects.sync sub-router with procedures for:

branches: Get available branches for a repo.
status: Check the current status of a specific sync.
start: Initiate a new sync.
history: Retrieve past syncs.
restoreMemory: A future-facing method for reverting to a previous state.

typescript

import { z } from 'zod';
import { t } from '../trpc'; // Assuming your tRPC context setup

const projectSyncRouter = t.router({
  branches: t.procedure
    .input(z.object({ projectId: z.string() }))
    .query(async ({ input, ctx }) => {
      // Fetch branches from github-connector
      return ['main', 'dev', 'feature-x'];
    }),
  start: t.procedure
    .input(z.object({ projectId: z.string(), branchName: z.string() }))
    .mutation(async ({ input, ctx }) => {
      // Logic to start the AsyncGenerator and return a syncId
      const syncId = await projectSyncService.startSync(input.projectId, input.branchName, ctx.user.id);
      return { syncId };
    }),
  status: t.procedure
    .input(z.object({ syncId: z.string() }))
    .query(async ({ input }) => {
      // Retrieve current status from in-memory or DB store
      return { status: 'RUNNING', progress: 75, message: 'Processing files...' };
    }),
  // ... other procedures like history, restoreMemory
});

export const appRouter = t.router({
  projects: projectSyncRouter, // Nest under projects
  // ... other routers
});

Frontend Components: We built a useProjectSync hook to encapsulate the logic for subscribing to SSE events and interacting with the tRPC API. This powers our SyncBanner (showing overall sync status) and SyncControls (buttons to start/stop/view syncs).
Integration: The SyncControls component was integrated into project-overview.tsx, making the sync functionality a central part of the project management experience.
Superseded Entry Filtering: A subtle but crucial detail. When files change or are deleted on GitHub, we don't just delete them from our database. Instead, we mark previous versions as "superseded" and ensure our queries only return "active" files. This maintains historical context and prevents data loss, crucial for features like version history.
Quality Assurance: The cherry on top? Build PASSES, 180/180 Tests PASS, and Typecheck CLEAN. This gives us the confidence to move forward.

Lessons Learned from the Trenches (The "Pain Log" Transformed)

No significant development sprint is without its bumps. Here are a few "aha!" moments that turned into valuable lessons:

The Case of the Missing previousSyncId Uniqueness:
- The Pain: I initially forgot to add a @unique constraint on previousSyncId in our schema. This field is vital for linking syncs in a chain, showing how one sync supersedes another. Without the constraint, it was possible to create invalid sync chains.
- The Lesson: Database schema design is paramount, especially when dealing with relational integrity and historical data. Always consider how records relate to each other and enforce those relationships with appropriate constraints (@unique, @relation, etc.) before writing too much application logic. It saves headaches down the line.
- Actionable Takeaway: When designing complex data models, sketch out entity-relationship diagrams and think through the lifecycle of your data. Use your ORM's features (like Prisma's @unique and @relation) to enforce data integrity at the database level.
ctx.userId vs. ctx.user.id in tRPC:
- The Pain: I kept trying to access ctx.userId in a tRPC procedure, only to find it undefined. Our tRPC context is set up to provide ctx.user, which is an object containing id, email, etc.
- The Lesson: Always double-check your context interfaces, especially when working with authentication and authorization. It's a common oversight, but TypeScript is there to help!
- Actionable Takeaway: Leverage TypeScript's type inference and explicit type definitions for your tRPC context. Define a clear interface Context and ensure all parts of your application adhere to it. This catches these simple access errors at compile time.
Schema NOT Pushed to Any DB Yet:
- The Pain: After all the schema changes, I realized the new ProjectSync model and extensions hadn't actually been migrated to any database instance yet. All local testing was against an in-memory or ephemeral DB that rebuilt on startup.
- The Lesson: Database migrations are a distinct, critical step in the development and deployment process. Defining your schema in code doesn't magically update your live database.
- Actionable Takeaway: Establish a clear and robust database migration strategy (e.g., using Prisma Migrate, Flyway, etc.). Make it an explicit part of your development workflow and deployment pipeline. Always test migrations thoroughly in a staging environment before pushing to production.

Active State & Next Steps

All the work is currently on our main branch, about 12 commits ahead of our current production state. This represents a significant feature drop, ready to go.

Our immediate next step is clear:

Task 13: Production Deployment
- Push: Get the code to our remote main branch.
- Safe Migration: Carefully run the database migrations on production. This is the most critical step, given the new schema.
- Rebuild: Deploy the updated application.
- Verify: Thoroughly test the new sync functionality in the production environment.

The journey continues, but reaching this milestone feels fantastic. It's a testament to incremental progress, robust architecture, and learning from every little snag along the way.

What are your go-to strategies for building robust synchronization features? I'd love to hear them!

json

{
  "thingsDone": [
    "ProjectSync model + MemoryEntry/RepositoryFile/Repository schema extensions",
    "github-connector: fetchBranches, fetchBranchHead, fetchRepoTreeWithSha",
    "project-sync-service.ts: Full 4-phase AsyncGenerator pipeline with diff-awareness",
    "SSE endpoint: /api/v1/events/project-sync/[syncId] for real-time updates",
    "tRPC: projects.sync sub-router (branches, status, start, history, restoreMemory)",
    "Frontend: useProjectSync hook, SyncBanner, SyncControls components",
    "Integration: SyncControls in project-overview.tsx",
    "Superseded entry filtering logic for active files",
    "All builds passing, 180/180 tests passing, typecheck clean"
  ],
  "pains": [
    "Self-relation needed @unique on previousSyncId (schema design oversight)",
    "ctx.userId doesn't exist on tRPC context — use ctx.user.id (context access error)",
    "Schema NOT pushed to any DB yet (missing migration step)"
  ],
  "successes": [
    "Achieved Phase 1 goal of Project Sync with branch selection",
    "Successful implementation of AsyncGenerator for long-running sync processes",
    "Real-time progress updates via SSE",
    "Comprehensive test coverage and clean typecheck",
    "Robust data modeling for sync history and file versions"
  ],
  "techStack": [
    "TypeScript",
    "tRPC",
    "Next.js",
    "Prisma",
    "PostgreSQL",
    "GitHub API",
    "Server-Sent Events (SSE)",
    "AsyncGenerators"
  ]
}