From GitHub to Your App: Orchestrating a Real-time Sync Engine with tRPC and SSE
Just hit a major milestone on our project sync feature! Dive into the journey of building a robust, real-time synchronization engine using AsyncGenerators, tRPC, and Server-Sent Events.
It’s 5:30 PM on a Thursday, and the commit messages are flowing. My fingers are still warm from the keyboard, but there’s a distinct feeling of accomplishment in the air. We just wrapped Phase 1 of our project synchronization feature, and it feels good – really good.
This isn't just about pushing code; it's about bringing a complex system to life, ensuring our application stays perfectly in sync with external sources (in this case, GitHub repositories). It’s a delicate dance of fetching, diffing, persisting, and providing real-time feedback, and we've just nailed the core choreography.
The Mission: Project Sync Phase 1
Our primary goal for this session was to get the initial "Project Sync" up and running, complete with branch selection. This means users can connect a project to a GitHub repository, pick a branch, and our system will pull down the necessary files and metadata, keeping our internal state a faithful reflection of the chosen repository.
Phase 1 focused on laying down the robust foundation: defining the data models, building the GitHub integration, crafting a sophisticated sync pipeline, and wiring up a responsive user interface. And as of today, tasks 1 through 12 are complete. We're staring down Task 13: the production deployment.
Deconstructing the Sync Engine: What We Built
Bringing a feature like this to life requires a full-stack approach, touching almost every part of our system. Here’s a breakdown of the key components we shipped:
-
Schema Design: We introduced the
ProjectSyncmodel to track each synchronization attempt, along with extensions forMemoryEntry,RepositoryFile, andRepository. This allows us to store not just the current state of a repository, but also the historical context of each sync, crucial for future features like rollback or detailed auditing. -
github-connector: Our dedicated service for talking to the GitHub API. We implementedfetchBranchesto let users select their desired branch,fetchBranchHeadto get the latest commit SHA, andfetchRepoTreeWithShato retrieve the entire file tree for a given commit. This layer abstracts away the complexities of GitHub's API, making our sync service cleaner. -
project-sync-service.ts: The Heartbeat of Sync This is where the magic happens. We engineered a full 4-phaseAsyncGeneratorpipeline with diff-awareness. Why anAsyncGenerator? Because synchronization is a long-running process. We need to fetch, process, and persist, all while providing real-time updates and allowing for potential cancellation. AnAsyncGeneratoris perfect for this, yielding progress updates at each step:- Fetch: Pulling the latest data from GitHub.
- Diff: Comparing the fetched data with our current internal state to identify changes. This "diff-awareness" is key to efficiency, ensuring we only process what's actually changed.
- Process: Applying the identified changes (e.g., creating new files, updating existing ones, marking old ones as superseded).
- Persist: Saving the new state and sync history to our database.
typescript// Conceptual AsyncGenerator for a project sync process async function* projectSyncGenerator(syncId: string, projectId: string, branch: string): AsyncGenerator<SyncProgressEvent, void, void> { try { yield { syncId, status: 'FETCHING', progress: 10, message: 'Fetching latest from GitHub...' }; const githubTree = await githubConnector.fetchRepoTree(projectId, branch); yield { syncId, status: 'DIFFING', progress: 30, message: 'Calculating differences...' }; const { added, modified, removed } = diffService.calculateChanges(githubTree, await dbService.getCurrentState(projectId)); yield { syncId, status: 'PROCESSING', progress: 60, message: `Processing ${added.length + modified.length + removed.length} changes...` }; await dbService.applyChanges(projectId, added, modified, removed); // This might involve multiple yields for granular updates yield { syncId, status: 'PERSISTING', progress: 90, message: 'Saving sync history...' }; await dbService.recordSyncCompletion(syncId); yield { syncId, status: 'COMPLETE', progress: 100, message: 'Project sync complete!' }; } catch (error) { yield { syncId, status: 'FAILED', progress: 0, message: `Sync failed: ${error.message}` }; throw error; // Re-throw to propagate the error } } -
SSE Endpoint (
/api/v1/events/project-sync/[syncId]): Real-time Feedback To keep the user informed, we hooked up theAsyncGeneratorto a Server-Sent Events (SSE) endpoint. As the generator yields progress updates, these events are streamed directly to the frontend, providing a seamless, real-time view of the sync status. -
tRPC Integration (
projects.sync sub-router): Type-Safe API Our API layer for sync operations leverages tRPC for end-to-end type safety. We defined aprojects.syncsub-router with procedures for:branches: Get available branches for a repo.status: Check the current status of a specific sync.start: Initiate a new sync.history: Retrieve past syncs.restoreMemory: A future-facing method for reverting to a previous state.
typescriptimport { z } from 'zod'; import { t } from '../trpc'; // Assuming your tRPC context setup const projectSyncRouter = t.router({ branches: t.procedure .input(z.object({ projectId: z.string() })) .query(async ({ input, ctx }) => { // Fetch branches from github-connector return ['main', 'dev', 'feature-x']; }), start: t.procedure .input(z.object({ projectId: z.string(), branchName: z.string() })) .mutation(async ({ input, ctx }) => { // Logic to start the AsyncGenerator and return a syncId const syncId = await projectSyncService.startSync(input.projectId, input.branchName, ctx.user.id); return { syncId }; }), status: t.procedure .input(z.object({ syncId: z.string() })) .query(async ({ input }) => { // Retrieve current status from in-memory or DB store return { status: 'RUNNING', progress: 75, message: 'Processing files...' }; }), // ... other procedures like history, restoreMemory }); export const appRouter = t.router({ projects: projectSyncRouter, // Nest under projects // ... other routers }); -
Frontend Components: We built a
useProjectSynchook to encapsulate the logic for subscribing to SSE events and interacting with the tRPC API. This powers ourSyncBanner(showing overall sync status) andSyncControls(buttons to start/stop/view syncs). -
Integration: The
SyncControlscomponent was integrated intoproject-overview.tsx, making the sync functionality a central part of the project management experience. -
Superseded Entry Filtering: A subtle but crucial detail. When files change or are deleted on GitHub, we don't just delete them from our database. Instead, we mark previous versions as "superseded" and ensure our queries only return "active" files. This maintains historical context and prevents data loss, crucial for features like version history.
-
Quality Assurance: The cherry on top? Build PASSES, 180/180 Tests PASS, and Typecheck CLEAN. This gives us the confidence to move forward.
Lessons Learned from the Trenches (The "Pain Log" Transformed)
No significant development sprint is without its bumps. Here are a few "aha!" moments that turned into valuable lessons:
-
The Case of the Missing
previousSyncIdUniqueness:- The Pain: I initially forgot to add a
@uniqueconstraint onpreviousSyncIdin our schema. This field is vital for linking syncs in a chain, showing how one sync supersedes another. Without the constraint, it was possible to create invalid sync chains. - The Lesson: Database schema design is paramount, especially when dealing with relational integrity and historical data. Always consider how records relate to each other and enforce those relationships with appropriate constraints (
@unique,@relation, etc.) before writing too much application logic. It saves headaches down the line. - Actionable Takeaway: When designing complex data models, sketch out entity-relationship diagrams and think through the lifecycle of your data. Use your ORM's features (like Prisma's
@uniqueand@relation) to enforce data integrity at the database level.
- The Pain: I initially forgot to add a
-
ctx.userIdvs.ctx.user.idin tRPC:- The Pain: I kept trying to access
ctx.userIdin a tRPC procedure, only to find it undefined. Our tRPC context is set up to providectx.user, which is an object containingid,email, etc. - The Lesson: Always double-check your context interfaces, especially when working with authentication and authorization. It's a common oversight, but TypeScript is there to help!
- Actionable Takeaway: Leverage TypeScript's type inference and explicit type definitions for your tRPC context. Define a clear
interface Contextand ensure all parts of your application adhere to it. This catches these simple access errors at compile time.
- The Pain: I kept trying to access
-
Schema NOT Pushed to Any DB Yet:
- The Pain: After all the schema changes, I realized the new
ProjectSyncmodel and extensions hadn't actually been migrated to any database instance yet. All local testing was against an in-memory or ephemeral DB that rebuilt on startup. - The Lesson: Database migrations are a distinct, critical step in the development and deployment process. Defining your schema in code doesn't magically update your live database.
- Actionable Takeaway: Establish a clear and robust database migration strategy (e.g., using Prisma Migrate, Flyway, etc.). Make it an explicit part of your development workflow and deployment pipeline. Always test migrations thoroughly in a staging environment before pushing to production.
- The Pain: After all the schema changes, I realized the new
Active State & Next Steps
All the work is currently on our main branch, about 12 commits ahead of our current production state. This represents a significant feature drop, ready to go.
Our immediate next step is clear:
- Task 13: Production Deployment
- Push: Get the code to our remote
mainbranch. - Safe Migration: Carefully run the database migrations on production. This is the most critical step, given the new schema.
- Rebuild: Deploy the updated application.
- Verify: Thoroughly test the new sync functionality in the production environment.
- Push: Get the code to our remote
The journey continues, but reaching this milestone feels fantastic. It's a testament to incremental progress, robust architecture, and learning from every little snag along the way.
What are your go-to strategies for building robust synchronization features? I'd love to hear them!
{
"thingsDone": [
"ProjectSync model + MemoryEntry/RepositoryFile/Repository schema extensions",
"github-connector: fetchBranches, fetchBranchHead, fetchRepoTreeWithSha",
"project-sync-service.ts: Full 4-phase AsyncGenerator pipeline with diff-awareness",
"SSE endpoint: /api/v1/events/project-sync/[syncId] for real-time updates",
"tRPC: projects.sync sub-router (branches, status, start, history, restoreMemory)",
"Frontend: useProjectSync hook, SyncBanner, SyncControls components",
"Integration: SyncControls in project-overview.tsx",
"Superseded entry filtering logic for active files",
"All builds passing, 180/180 tests passing, typecheck clean"
],
"pains": [
"Self-relation needed @unique on previousSyncId (schema design oversight)",
"ctx.userId doesn't exist on tRPC context — use ctx.user.id (context access error)",
"Schema NOT pushed to any DB yet (missing migration step)"
],
"successes": [
"Achieved Phase 1 goal of Project Sync with branch selection",
"Successful implementation of AsyncGenerator for long-running sync processes",
"Real-time progress updates via SSE",
"Comprehensive test coverage and clean typecheck",
"Robust data modeling for sync history and file versions"
],
"techStack": [
"TypeScript",
"tRPC",
"Next.js",
"Prisma",
"PostgreSQL",
"GitHub API",
"Server-Sent Events (SSE)",
"AsyncGenerators"
]
}