Unlocking Developer Memories: Building an AI-Powered Blog Pipeline from GitHub
We just hit a major milestone on our journey to automate content creation from developer notes. Discover how we built an end-to-end pipeline to transform GitHub memories into engaging blog posts, the challenges we overcame, and what's next!
Imagine a world where your detailed development notes, your "memory files," could automatically transform into polished, public-ready blog posts. That's the vision driving our latest project, "Letter to Myself." It's an ambitious endeavor to bridge the gap between raw development insights and engaging content, and I'm thrilled to share that we've just hit a significant milestone: the core pipeline is fully functional and tested end-to-end!
This past session, we brought the dream to life: a user successfully created a project, imported 10 memory files directly from a GitHub repository, and generated 9+ compelling blog posts. Let's dive into how we did it, the tech stack that made it possible, and the crucial lessons we learned along the way.
The Vision: From GitHub Commits to Blog Posts
Our goal was clear: to create an automated pipeline that takes your development "memories" (think markdown files, session logs, or detailed notes) stored in GitHub, processes them, and generates blog posts using AI. But it's more than just automation; it's about providing a structured, project-based UI that allows you to manage these memories, preview generated content, and publish with ease, all with a mobile-first experience.
Here's the high-level flow we implemented:
- GitHub Integration (BYOK): Connect securely to your GitHub repositories using your own token.
- Memory Import: Browse and select specific "memory files" (markdown, etc.) from your repos.
- AI Blog Generation: Feed these memories to an AI, which crafts them into engaging blog posts.
- Project-Based UI: Organize all your imported memories and generated blogs within distinct projects.
- Markdown Rendering: Display the generated content beautifully with syntax highlighting and robust markdown support.
- Mobile-First Design: Ensure the entire experience is seamless on any device.
Under the Hood: Crafting the Pipeline
Bringing this vision to life required a robust full-stack architecture. Here’s a look at the key components we built and integrated:
The Data Foundation: Prisma & PostgreSQL
At the heart of our application are the Project and BlogPost models, meticulously defined with Prisma. This allowed us to quickly set up our PostgreSQL database schema, handling tenant and user relations to keep everything organized and secure.
Connecting to the Source: GitHub Connector Service
One of the most critical pieces is src/server/services/github-connector.ts. This service is our gateway to GitHub, implementing a "Bring Your Own Key" (BYOK) approach for security and user control. It handles everything from resolving user tokens to:
- Fetching a list of all accessible repositories.
- Checking for a specified "memory path" within a repo.
- Listing all memory files in that path.
- Fetching the raw content of selected files.
- Synchronizing this content into our database, ready for AI processing.
The Brain: AI Blog Generator Service
The magic happens in src/server/services/blog-generator.ts. We ported our core blog generation logic from a Python prototype (blog_gen.py) into TypeScript, leveraging the AnthropicProvider for powerful AI capabilities. This service takes your raw memory content and, following a carefully crafted prompt, transforms it into a coherent and engaging blog post.
API & UI Orchestration: tRPC & React
Our API layer, built with tRPC, is exposed via src/server/trpc/routers/projects.ts. This router provides a comprehensive set of endpoints for:
- CRUD operations on
ProjectandBlogPostmodels. - Interacting with the GitHub connector (listing repos, checking paths, importing files).
- Triggering single or batch blog post generations.
- Managing blog post states (listing, getting, updating, deleting).
On the frontend, we built out four core pages: a projects list, a new project creation flow, a detailed project view (with tabs for memories and blog posts), and a dedicated blog post viewer. We integrated react-markdown with remark-gfm and a custom Nyx theme in src/components/markdown-renderer.tsx to ensure our generated content looks stunning. Navigation was streamlined with a sidebar for desktop and a mobile bottom nav for touch devices.
Navigating the Rapids: Lessons Learned
No significant development session is without its challenges. Here are some of the key "pain points" we encountered and the solutions that emerged, offering valuable lessons for any developer:
1. The Elusive refetch() on Disabled Queries
The Problem: We wanted to trigger a tRPC query (github.repos.useQuery) to fetch repositories only when a user clicked a button. Our initial approach was to use enabled: false and call reposQuery.refetch() on button click.
The Pitfall: refetch() on a disabled query (especially in React Query v5, which tRPC leverages) is unreliable and often does nothing. The query simply remains disabled.
The Solution: Instead of relying on refetch(), we introduced a useState flag (loadRepos). This flag is initialized to false, passed as the enabled prop to useQuery, and set to true when the button is clicked. This correctly enables the query and triggers the fetch.
// Before (Problematic)
const reposQuery = trpc.projects.github.repos.useQuery(undefined, {
enabled: false, // Query is disabled
});
// In button handler:
reposQuery.refetch(); // Often does nothing
// After (Solution)
const [loadRepos, setLoadRepos] = useState(false);
const reposQuery = trpc.projects.github.repos.useQuery(undefined, {
enabled: loadRepos, // Query is enabled when loadRepos is true
});
// In button handler:
setLoadRepos(true); // This enables the query and triggers the fetch
2. Real-time Progress with Batch AI Generation
The Problem: We initially designed a generateBatch mutation that would send a list of memory files to the server, expecting the server to generate posts sequentially. However, the client would get stuck at "0/X generated," providing no real-time feedback. Additionally, a server-side Zod validation (.max(10)) prevented selecting more than 10 memories.
The Pitfall: Long-running server-side batch operations, when initiated by a single request, don't naturally provide progress updates to the client unless specifically designed for streaming or polling.
The Solution: We shifted the batching logic to the client. Instead of a single generateBatch call, we now make sequential generateSingle.mutateAsync() calls in a for loop on the client side. After each successful generation, we update a client-side progress state. This provides immediate, live feedback to the user ("1/10 generated," "2/10 generated," etc.) and bypasses the Zod .max(10) validation for the batch (it still applies per single generation, which is fine).
3. The Next.js Cache Corruption Dance
The Problem: After making Prisma schema changes, we ran rm -rf .next to clear the Next.js cache while the development server was still running.
The Pitfall: Deleting the .next directory while the dev server is active can lead to internal Next.js cache corruption, often manifesting as obscure clientModules errors. Furthermore, Prisma schema changes require prisma generate to update the client before the dev server starts.
The Solution: Always stop the Next.js development server first. Then, delete .next. After any Prisma schema changes, run prisma generate. Finally, restart the dev server. This ensures a clean slate and correctly generated Prisma client.
4. Taming TypeScript's Type Inference
The Problem: We tried to infer the type of a component prop using ReturnType<typeof trpc.projects.blogPosts.unblogged.useQuery>.
The Pitfall: TypeScript, in certain contexts, can infer data from useQuery as a generic {} type, losing the specific structure of the query's return value. This made it impossible to safely access properties like .length or .map() on the prop.
The Solution: For complex query return types that need to be passed as props, it's often clearer and more reliable to define an explicit interface or type for the prop. This ensures TypeScript has the exact shape of the data, providing robust type checking.
// Before (Problematic)
// type UnbloggedEntriesProp = ReturnType<typeof trpc.projects.blogPosts.unblogged.useQuery>;
// Then trying to access unbloggedEntries.data.length might be an error
// After (Solution)
interface UnbloggedEntry {
id: string;
// ... other properties of an unblogged memory entry
fileName: string;
contentLength: number;
}
interface UnbloggedEntriesProp {
unbloggedEntries: UnbloggedEntry[];
// ... other props
}
What's Next?
With the core pipeline stable and functional, our immediate next steps involve refining the user experience and adding more advanced features:
- Mobile Layout Refinements: Thoroughly test and polish the mobile layout at 375px, ensuring sticky buttons, touch targets, and collapsible blog cards are perfect.
- Error Notifications: Implement proper toast notifications for failed generations and other critical errors.
- Pagination: Consider adding pagination for projects with a large number of blog posts to improve performance and user experience.
- Blog Post Editing: Integrate and test an edit mode for blog posts, allowing users to fine-tune the AI-generated content (raw markdown textarea toggle is ready!).
- Regeneration Flow: Test the regenerate feature, ensuring it correctly replaces existing blog post content from a memory entry.
- Fixing Known Issues: Address a pre-existing TypeScript error related to a Badge variant in another part of the dashboard.
This session marked a huge leap forward for "Letter to Myself." We've moved from concept to a tangible, working system that can transform raw developer memories into engaging blog content. The journey continues, and I'm excited to see where it takes us next!