Real-time Syncing with AsyncGenerators & SSE: A Deep Dive into Project Sync Phase 1
Join me as I recount the journey of building a robust, real-time project synchronization feature, from intricate schema design and GitHub API integration to streaming updates with AsyncGenerators and Server-Sent Events.
It's 5 PM, and the IDE is still humming. The coffee cup is empty, but the satisfaction of a solid development session is palpable. Today, we pushed hard on a critical new feature: Project Sync with Branch Selection (Phase 1). This isn't just about pulling code; it's about creating a living, breathing connection between our application's understanding of a project and its actual state on GitHub, with real-time feedback every step of the way.
We set out to tackle 13 tasks for this initial phase, and as the dust settles, 9 of them are firmly in the "done" column. The remaining four are integration and polish – the kind of tasks that feel like unwrapping a gift you've been meticulously crafting.
The Core Challenge: Bridging the Gap
Our goal was clear: allow users to select a branch from their connected GitHub repository and trigger a full synchronization. This sync needed to update not just the repository's files but also our internal MemoryEntry system, which processes and stores contextual information about the code. Crucially, this had to be a real-time, user-friendly experience, providing immediate feedback on progress.
Here's how we broke it down and what we built:
1. Laying the Data Foundation: Schema Evolution
Any robust feature starts with its data model. We introduced the ProjectSync model to track each synchronization attempt, its status, and the branch it targeted. But it wasn't just a new table; it required extending existing models:
MemoryEntry: To associate processed memory items with specific syncs.RepositoryFile: To link file versions to the sync that brought them in.Repository: To establish the one-to-many relationship withProjectSyncrecords.
This careful extension ensures a clear historical record and allows us to filter data based on its synchronization context later on.
2. Speaking to GitHub: The Connector's New Voice
To enable branch selection, our github-connector.ts needed some new muscles:
fetchBranches(repoId): To list all available branches for a given repository.fetchBranchHead(repoId, branchName): To get the latest commit SHA for a specific branch.fetchRepoTreeWithSha(repoId, treeSha): To fetch the entire file tree for a given commit.
These functions form the backbone of our ability to inspect and retrieve repository data directly from GitHub. Handling API rate limits, authentication, and potential network errors became a critical consideration here.
3. The Real-time Engine: AsyncGenerator & SSE
This is where things got really interesting. A full repository sync can be a long-running process. We needed a way to:
- Perform complex, asynchronous operations (fetching files, processing them).
- Emit progress updates incrementally.
- Stream these updates to the frontend in real-time.
Enter the AsyncGenerator pipeline in project-sync-service.ts:
// Conceptual example of an AsyncGenerator for syncing
async function* syncProject(repoId: string, branch: string): AsyncGenerator<SyncProgressEvent, void, unknown> {
yield { type: 'PHASE_START', phase: 'Fetching Branches' };
const branches = await githubConnector.fetchBranches(repoId);
yield { type: 'PHASE_COMPLETE', phase: 'Fetching Branches', data: branches };
// ... more phases like fetching tree, processing files, saving to DB
for await (const fileEvent of processFilesStream(repoId, treeSha)) {
yield { type: 'FILE_PROGRESS', data: fileEvent };
}
yield { type: 'SYNC_COMPLETE', message: 'Project synced successfully!' };
}
This pattern is incredibly powerful. It allows us to orchestrate a complex sequence of asynchronous steps, yielding progress events along the way.
To get these events to the client, we implemented a Server-Sent Events (SSE) endpoint at /api/v1/events/project-sync/[syncId]. SSE is perfect for this unidirectional stream of updates. It's simpler than WebSockets for server-to-client communication and handles reconnections automatically, making it ideal for displaying live progress.
4. The API Gateway: tRPC's projects.sync Sub-router
Our API layer uses tRPC, giving us end-to-end type safety. We created a dedicated projects.sync sub-router with endpoints for:
getBranches: To populate the branch dropdown.getStatus: To check the current sync state.startSync: To initiate a new sync for a selected branch.getHistory: To view past syncs.restoreMemory: A future-proofing endpoint for restoring specific memory states.
tRPC's autocompletion and type inference make working with these endpoints a breeze on both the backend and frontend.
5. Bringing it to Life: The Frontend Experience
On the React side, we built the user interface to interact with this real-time system:
-
useProjectSyncHook: This custom hook wraps ouruseSSEhook, abstracting away the SSE connection details and providing a clean interface for any component to subscribe to sync events and status updates. It manages the connection lifecycle and parses incoming messages. -
SyncBannerComponent: This component provides dynamic visual feedback. Think phase dots, a progress bar, and real-time statistics (files processed, errors encountered). It's crucial for managing user expectations during a potentially long operation. -
SyncControlsComponent: This is where the user interacts. It features a branch selection dropdown (populated bygetBranches) and a "Sync Now" button (triggeringstartSync).
Lessons Learned & Potholes Navigated
No development session is complete without a few "aha!" moments or "d'oh!" realizations.
1. The @unique Constraint: Ensuring Sync History Integrity
We hit a snag early on with our ProjectSync model. We wanted a previousSyncId field to link syncs in a chain, creating a clear history. However, without proper constraints, it was possible for multiple ProjectSync records to point to the same previousSyncId, breaking the single-chain history we envisioned.
The Fix: Adding an @unique constraint to previousSyncId in our Prisma schema.
model ProjectSync {
id String @id @default(cuid())
projectId String
branch String
status SyncStatus
startedAt DateTime @default(now())
completedAt DateTime?
errorMessage String?
// Ensures only one sync can directly follow a previous sync
previousSync ProjectSync? @relation("SyncHistory", fields: [previousSyncId], references: [id])
previousSyncId String? @unique // <-- The critical unique constraint!
nextSync ProjectSync? @relation("SyncHistory")
// ... other fields
}
This subtle but crucial change ensures that each sync has a distinct predecessor, maintaining a pristine lineage of project states.
2. Schema Migrations: Don't Forget to Push!
It's a classic: you define your beautiful new schema, build out all your services and components, and then realize you haven't actually applied the changes to the database!
The Takeaway: Always integrate your schema migration process into your development workflow. For Prisma, it's npx prisma migrate dev --name <migration_name> early and often when making schema changes. Forgetting this leads to confusing runtime errors (Table '...' doesn't exist) that can waste precious debugging time.
3. Consistent Database Access: ctx.prisma vs. Bare prisma
When working within our tRPC routers, there was a question about how to access the Prisma client. Should we import a global prisma instance, or use ctx.prisma which is passed in through the tRPC context?
The Decision: Always use ctx.prisma within tRPC resolvers.
// context.ts
export const createContext = async ({ req, res }: CreateNextContextOptions) => {
// ... authentication, etc.
return {
req,
res,
prisma, // The Prisma client instance
// ... other context values
};
};
// tRPC router
export const projectsRouter = t.router({
sync: t.procedure
.input(z.object({ projectId: z.string(), branch: z.string() }))
.mutation(async ({ input, ctx }) => {
// Use ctx.prisma for consistency and potential transaction benefits
const newSync = await ctx.prisma.projectSync.create({
data: { projectId: input.projectId, branch: input.branch, status: 'STARTED' }
});
// ... start the async generator
return newSync.id;
}),
});
Using ctx.prisma ensures consistency, allows for easier testing (you can mock ctx.prisma), and paves the way for advanced patterns like transaction management within a request context, should we need it in the future.
What's Next: Wrapping Up Phase 1
With the core logic and UI components largely in place, our immediate next steps are focused on integration and validation:
- Integrate
SyncControlsintoproject-overview.tsx: Hook up the new UI to the main project view. - Filter superseded entries from active queries: This is crucial. Once a new sync completes, older
MemoryEntryrecords might become irrelevant. We need to implement logic to ensure our active queries only show the most pertinent, up-to-date information. - Typecheck + build verification: The first full compilation check after significant changes is always a moment of truth. This will catch any lurking type errors or integration mismatches.
- Production deployment: The final frontier for Phase 1.
Conclusion
Building real-time features like Project Sync is a complex but incredibly rewarding endeavor. From the database schema to the frontend's reactive components, every piece plays a vital role. Leveraging tools like AsyncGenerator for backend streaming and SSE for real-time client updates proved to be a powerful combination.
The journey isn't over, but Phase 1 has laid a robust foundation. We've learned valuable lessons about schema design, API integration, and the nuances of real-time communication. Now, it's time to iron out the final kinks and get this feature into the hands of users.
What are your experiences building real-time sync features? Share your thoughts and challenges in the comments below!
{
"thingsDone": [
"ProjectSync model & schema extensions",
"GitHub API connector functions (fetchBranches, fetchBranchHead, fetchRepoTreeWithSha)",
"AsyncGenerator pipeline for project-sync-service",
"SSE endpoint for real-time sync events",
"tRPC projects.sync sub-router (branches, status, start, history, restoreMemory)",
"useProjectSync React hook (wrapping useSSE)",
"SyncBanner React component (progress, stats)",
"SyncControls React component (branch dropdown, sync button)"
],
"pains": [
"Missing @unique constraint on previousSyncId for ProjectSync model",
"Schema changes not pushed to DB (migration forgotten)",
"Inconsistent tRPC database access (ctx.prisma vs bare prisma)"
],
"successes": [
"Successful implementation of real-time sync progress using AsyncGenerator and SSE",
"Type-safe API with tRPC",
"Clear separation of concerns between backend services and frontend components",
"Robust schema design for historical sync tracking"
],
"techStack": [
"TypeScript",
"Node.js",
"React",
"tRPC",
"Prisma",
"PostgreSQL",
"Server-Sent Events (SSE)",
"AsyncGenerator",
"GitHub API"
]
}