From Brainstorm to Breakthrough: Shipping a Full LLM Memory System with pgvector
We just hit a massive milestone: the complete end-to-end LLM-powered memory system, featuring pgvector for blazing-fast semantic search, is live and operational. Dive into the tech, the triumphs, and the hard-won lessons.
It's 2 AM, the coffee's cold, but the code is hot. Tonight marked a significant victory: the full project-workflow memory system, including the advanced pgvector Phase 2 implementation, is officially done. Everything's pushed to origin/main at 9b924f9, and the system is humming end-to-end.
This isn't just about storing text; it's about giving our LLM-powered workflows a genuine, searchable, and semantically rich long-term memory. Let's unpack how we got here, what we built, and the invaluable lessons learned along the way.
The Quest for LLM Memory: Why pgvector?
Imagine an AI assistant that truly remembers your past projects, insights, and decisions. That's the core problem we're solving. Our LLM-driven workflows needed a way to pull in relevant historical context, not just keyword matches, but semantically similar information.
Enter pgvector. PostgreSQL is already our reliable workhorse, and pgvector transforms it into a powerful vector database capable of handling high-dimensional embeddings and performing lightning-fast similarity searches. Phase 2 of this integration was all about bringing pgvector to its full potential: implementing HNSW indexing and verifying its performance.
pgvector Phase 2: From Zero to Semantic Search
This was the technical centerpiece of the session. Here's how we upgraded our database to handle the demands of semantic memory:
-
Docker Migration: We swapped out our plain
postgres:16-alpineDocker image for the specializedpgvector/pgvector:pg16. Crucially, we ensured data integrity by preserving our named Docker volume. No data loss, just a better engine. -
Enabling the Extension: With the new image, enabling
pgvectorwas a single command:sqlCREATE EXTENSION vector;(Verified
v0.8.1was installed.) -
Schema Evolution: We added a new
embeddingcolumn to ourworkflow_insightstable to store the vector representations of our insights:sqlALTER TABLE workflow_insights ADD COLUMN embedding vector(1536);The
vector(1536)type corresponds to the output dimension of OpenAI'stext-embedding-3-smallmodel, which we're using for generating embeddings. -
HNSW Indexing for Speed: This is where the magic happens for performance. We created an HNSW (Hierarchical Navigable Small World) index on our
embeddingcolumn:sqlCREATE INDEX ON workflow_insights USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);HNSW is a cutting-edge approximate nearest neighbor (ANN) algorithm. It allows us to search through millions of high-dimensional vectors in milliseconds, crucial for a responsive memory system.
mandef_constructionare parameters that balance index size, build time, and search quality/speed. -
Backfilling & Verification: We backfilled all existing 10 insights with embeddings generated by OpenAI's
text-embedding-3-smallmodel. A quick test confirmed that cosine similarity searches were returning "sensible clustering"—meaning related insights were indeed grouped together, just as we hoped.
This setup provides the backbone for our LLM to semantically understand and retrieve relevant historical context, moving far beyond simple keyword searches.
Building the Interface: The MemoryPicker
What's powerful backend without a usable front-end? We built the MemoryPicker (src/components/workflow/memory-picker.tsx) to allow users to interact with this new memory system.
The MemoryPicker allows users to:
- Search through insights (soon to be powered by hybrid semantic search!).
- Filter by category chips and severity badges.
- View expandable details for each insight.
- See a live
{{memory}}preview of what the LLM will receive.
This component was then seamlessly integrated into our new/page.tsx workflow, allowing selected memoryIds to feed directly into the LLM's context during workflow creation.
We also added CollapsibleSection components to wrap areas like Consolidations, Personas, Docs, and Memory, making the UI cleaner and showing item counts with badges.
Lessons from the Trenches: The "Pain Log" Refactored
No development session is complete without a few head-scratching moments. Here’s what we encountered and how we overcame it, offering some actionable takeaways:
-
The Elusive
SaveInsightsDialogFilter:- Problem: The
SaveInsightsDialogwasn't appearing as expected when trying to filter key points. Our initial filter waskp.action === "keep". - Insight: Key points extracted by the LLM often don't have an
actionfield initially, meaningkp.action === "keep"would always befalse. - Solution: We adjusted the logic to
!kp.action || kp.action === "keep" || kp.action === "edit". This correctly handles cases where the action isundefined, allowing the dialog to show up for newly extracted insights. - Lesson: Always account for
undefinedornullstates, especially when dealing with data that might be partially processed or user-generated.
- Problem: The
-
tsxModule Resolution Woes:- Problem: Attempting to run a backfill script directly from
/tmp(npx tsx /tmp/backfill-embeddings.ts) resulted inCannot find module '@prisma/client'. - Insight:
tsx, likets-node, resolves modules relative to its current working directory or thetsconfig.jsonit finds. When run from/tmp, it couldn't locate our project'snode_modules. - Workaround: Copying the script into our project's
scripts/directory and running it from there resolved the issue. - Lesson: Be mindful of the execution context for Node.js/TypeScript scripts. Pathing and module resolution can be tricky when moving scripts outside the main project structure.
- Problem: Attempting to run a backfill script directly from
-
zsh's Reserved Variables:- Problem: A
while true; do status=$(...)loop in a shell script failed inzshwithread-only variable: status. - Insight:
statusis a reserved variable inzsh(and some other shells) for the exit status of the last command. - Workaround: Simply changing the variable name to
step_statusfixed it. - Lesson: Always be aware of shell-specific keywords and reserved variables. A quick search can save a lot of debugging time!
- Problem: A
-
Anticipating Edge Cases: Duplicate Saves:
- Problem (identified, not fixed yet): The "Save" button on the
SaveInsightsDialogcan be clicked multiple times, potentially creating duplicate entries. - Next Step: Add a dedup guard or disable the button after the first click.
- Lesson: It's good practice to note down known edge cases and plan for their resolution, even if not immediately tackled.
- Problem (identified, not fixed yet): The "Save" button on the
Confidence Through E2E Testing
We also shipped an E2E test (b5588a67) specifically for the {{memory}} injection. This test verified that when 5 insights are selected in the UI, the LLM engine correctly receives and processes that injected content. This kind of end-to-end validation is crucial for complex systems involving multiple moving parts (UI, API, database, LLM).
What's Next? The Journey Continues
With the core memory system fully operational, our immediate next steps involve refining and enhancing its capabilities:
- Duplicate-Save Guard: Implement the dedup guard on the
SaveInsightsDialog. - Hybrid Search: Wire
insight-search.tsto use a powerful hybrid search approach (70% vector similarity + 30%tsvectortext search) in theMemoryPickerfor truly intelligent retrieval. - Auto-Embeddings: Ensure
insight-persistence.tsreliably auto-generates embeddings for new insights as they're saved. - Project-Scoped Filtering: Add filtering to the
MemoryPickerso users can easily focus on memories relevant to their current project. - Cleanup: Tidy up stale
.logfiles.
This session was a huge step forward in building a truly intelligent, context-aware workflow system. The power of pgvector combined with thoughtful UI design and robust testing is truly exciting. Onwards to the next challenge!
{
"thingsDone": [
"Completed full project-workflow memory system with pgvector Phase 2 (embeddings, HNSW index, similarity search)",
"Implemented MemoryPicker component with search, category chips, severity badges, expandable detail, and {{memory}} preview",
"Integrated MemoryPicker into new/page.tsx for workflow creation",
"Consolidated saveInsights logic to memory.saveInsights only",
"Fixed SaveInsightsDialog logic for key point action filtering",
"Developed CollapsibleSection component for UI organization",
"Created memory-pull.sh script for remote memory fetching",
"Implemented E2E test for {{memory}} injection into LLM workflow",
"Migrated Docker Postgres to pgvector/pgvector:pg16 and preserved data",
"Enabled pgvector extension (v0.8.1)",
"Added embedding column (vector(1536)) to workflow_insights table",
"Created HNSW index on embedding column (m=16, ef_construction=64, vector_cosine_ops)",
"Backfilled all existing workflow insights with OpenAI embeddings",
"Verified cosine similarity search returns sensible clustering",
"Removed obsolete version from docker-compose.yml"
],
"pains": [
"Debugging SaveInsightsDialog filter logic due to undefined 'action' field on key points",
"Resolving '@prisma/client' module not found error when running tsx scripts from /tmp",
"Encountering 'read-only variable: status' in zsh shell scripts",
"Identified potential for duplicate saves on SaveInsightsDialog (TODO)"
],
"successes": [
"Achieved full end-to-end operational memory system",
"Successfully implemented and verified pgvector Phase 2 features",
"Created intuitive MemoryPicker UI component",
"Resolved critical UI and scripting bugs through careful debugging",
"Ensured data integrity during Docker image migration",
"Validated LLM content injection with E2E tests"
],
"techStack": [
"PostgreSQL",
"pgvector",
"HNSW",
"OpenAI Embeddings (text-embedding-3-small)",
"TypeScript",
"Next.js",
"React",
"Prisma",
"Docker",
"Zsh",
"LLM (Large Language Model) integration"
]
}