From Brainstorm to Breakthrough: Shipping a Full LLM Memory System with pgvector

It's 2 AM, the coffee's cold, but the code is hot. Tonight marked a significant victory: the full project-workflow memory system, including the advanced pgvector Phase 2 implementation, is officially done. Everything's pushed to origin/main at 9b924f9, and the system is humming end-to-end.

This isn't just about storing text; it's about giving our LLM-powered workflows a genuine, searchable, and semantically rich long-term memory. Let's unpack how we got here, what we built, and the invaluable lessons learned along the way.

The Quest for LLM Memory: Why `pgvector`?

Imagine an AI assistant that truly remembers your past projects, insights, and decisions. That's the core problem we're solving. Our LLM-driven workflows needed a way to pull in relevant historical context, not just keyword matches, but semantically similar information.

Enter pgvector. PostgreSQL is already our reliable workhorse, and pgvector transforms it into a powerful vector database capable of handling high-dimensional embeddings and performing lightning-fast similarity searches. Phase 2 of this integration was all about bringing pgvector to its full potential: implementing HNSW indexing and verifying its performance.

`pgvector` Phase 2: From Zero to Semantic Search

This was the technical centerpiece of the session. Here's how we upgraded our database to handle the demands of semantic memory:

Docker Migration: We swapped out our plain postgres:16-alpine Docker image for the specialized pgvector/pgvector:pg16. Crucially, we ensured data integrity by preserving our named Docker volume. No data loss, just a better engine.
Enabling the Extension: With the new image, enabling pgvector was a single command:
sql
```
CREATE EXTENSION vector;
```
(Verified v0.8.1 was installed.)
Schema Evolution: We added a new embedding column to our workflow_insights table to store the vector representations of our insights:
sql
```
ALTER TABLE workflow_insights ADD COLUMN embedding vector(1536);
```
The vector(1536) type corresponds to the output dimension of OpenAI's text-embedding-3-small model, which we're using for generating embeddings.
HNSW Indexing for Speed: This is where the magic happens for performance. We created an HNSW (Hierarchical Navigable Small World) index on our embedding column:
sql
```
CREATE INDEX ON workflow_insights USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);
```
HNSW is a cutting-edge approximate nearest neighbor (ANN) algorithm. It allows us to search through millions of high-dimensional vectors in milliseconds, crucial for a responsive memory system. m and ef_construction are parameters that balance index size, build time, and search quality/speed.
Backfilling & Verification: We backfilled all existing 10 insights with embeddings generated by OpenAI's text-embedding-3-small model. A quick test confirmed that cosine similarity searches were returning "sensible clustering"—meaning related insights were indeed grouped together, just as we hoped.

This setup provides the backbone for our LLM to semantically understand and retrieve relevant historical context, moving far beyond simple keyword searches.

Building the Interface: The `MemoryPicker`

What's powerful backend without a usable front-end? We built the MemoryPicker (src/components/workflow/memory-picker.tsx) to allow users to interact with this new memory system.

The MemoryPicker allows users to:

Search through insights (soon to be powered by hybrid semantic search!).
Filter by category chips and severity badges.
View expandable details for each insight.
See a live {{memory}} preview of what the LLM will receive.

This component was then seamlessly integrated into our new/page.tsx workflow, allowing selected memoryIds to feed directly into the LLM's context during workflow creation.

We also added CollapsibleSection components to wrap areas like Consolidations, Personas, Docs, and Memory, making the UI cleaner and showing item counts with badges.

Lessons from the Trenches: The "Pain Log" Refactored

No development session is complete without a few head-scratching moments. Here’s what we encountered and how we overcame it, offering some actionable takeaways:

The Elusive SaveInsightsDialog Filter:
- Problem: The SaveInsightsDialog wasn't appearing as expected when trying to filter key points. Our initial filter was kp.action === "keep".
- Insight: Key points extracted by the LLM often don't have an action field initially, meaning kp.action === "keep" would always be false.
- Solution: We adjusted the logic to !kp.action || kp.action === "keep" || kp.action === "edit". This correctly handles cases where the action is undefined, allowing the dialog to show up for newly extracted insights.
- Lesson: Always account for undefined or null states, especially when dealing with data that might be partially processed or user-generated.
tsx Module Resolution Woes:
- Problem: Attempting to run a backfill script directly from /tmp (npx tsx /tmp/backfill-embeddings.ts) resulted in Cannot find module '@prisma/client'.
- Insight: tsx, like ts-node, resolves modules relative to its current working directory or the tsconfig.json it finds. When run from /tmp, it couldn't locate our project's node_modules.
- Workaround: Copying the script into our project's scripts/ directory and running it from there resolved the issue.
- Lesson: Be mindful of the execution context for Node.js/TypeScript scripts. Pathing and module resolution can be tricky when moving scripts outside the main project structure.
zsh's Reserved Variables:
- Problem: A while true; do status=$(...) loop in a shell script failed in zsh with read-only variable: status.
- Insight: status is a reserved variable in zsh (and some other shells) for the exit status of the last command.
- Workaround: Simply changing the variable name to step_status fixed it.
- Lesson: Always be aware of shell-specific keywords and reserved variables. A quick search can save a lot of debugging time!
Anticipating Edge Cases: Duplicate Saves:
- Problem (identified, not fixed yet): The "Save" button on the SaveInsightsDialog can be clicked multiple times, potentially creating duplicate entries.
- Next Step: Add a dedup guard or disable the button after the first click.
- Lesson: It's good practice to note down known edge cases and plan for their resolution, even if not immediately tackled.

Confidence Through E2E Testing

We also shipped an E2E test (b5588a67) specifically for the {{memory}} injection. This test verified that when 5 insights are selected in the UI, the LLM engine correctly receives and processes that injected content. This kind of end-to-end validation is crucial for complex systems involving multiple moving parts (UI, API, database, LLM).

What's Next? The Journey Continues

With the core memory system fully operational, our immediate next steps involve refining and enhancing its capabilities:

Duplicate-Save Guard: Implement the dedup guard on the SaveInsightsDialog.
Hybrid Search: Wire insight-search.ts to use a powerful hybrid search approach (70% vector similarity + 30% tsvector text search) in the MemoryPicker for truly intelligent retrieval.
Auto-Embeddings: Ensure insight-persistence.ts reliably auto-generates embeddings for new insights as they're saved.
Project-Scoped Filtering: Add filtering to the MemoryPicker so users can easily focus on memories relevant to their current project.
Cleanup: Tidy up stale .log files.

This session was a huge step forward in building a truly intelligent, context-aware workflow system. The power of pgvector combined with thoughtful UI design and robust testing is truly exciting. Onwards to the next challenge!

json

{
  "thingsDone": [
    "Completed full project-workflow memory system with pgvector Phase 2 (embeddings, HNSW index, similarity search)",
    "Implemented MemoryPicker component with search, category chips, severity badges, expandable detail, and {{memory}} preview",
    "Integrated MemoryPicker into new/page.tsx for workflow creation",
    "Consolidated saveInsights logic to memory.saveInsights only",
    "Fixed SaveInsightsDialog logic for key point action filtering",
    "Developed CollapsibleSection component for UI organization",
    "Created memory-pull.sh script for remote memory fetching",
    "Implemented E2E test for {{memory}} injection into LLM workflow",
    "Migrated Docker Postgres to pgvector/pgvector:pg16 and preserved data",
    "Enabled pgvector extension (v0.8.1)",
    "Added embedding column (vector(1536)) to workflow_insights table",
    "Created HNSW index on embedding column (m=16, ef_construction=64, vector_cosine_ops)",
    "Backfilled all existing workflow insights with OpenAI embeddings",
    "Verified cosine similarity search returns sensible clustering",
    "Removed obsolete version from docker-compose.yml"
  ],
  "pains": [
    "Debugging SaveInsightsDialog filter logic due to undefined 'action' field on key points",
    "Resolving '@prisma/client' module not found error when running tsx scripts from /tmp",
    "Encountering 'read-only variable: status' in zsh shell scripts",
    "Identified potential for duplicate saves on SaveInsightsDialog (TODO)"
  ],
  "successes": [
    "Achieved full end-to-end operational memory system",
    "Successfully implemented and verified pgvector Phase 2 features",
    "Created intuitive MemoryPicker UI component",
    "Resolved critical UI and scripting bugs through careful debugging",
    "Ensured data integrity during Docker image migration",
    "Validated LLM content injection with E2E tests"
  ],
  "techStack": [
    "PostgreSQL",
    "pgvector",
    "HNSW",
    "OpenAI Embeddings (text-embedding-3-small)",
    "TypeScript",
    "Next.js",
    "React",
    "Prisma",
    "Docker",
    "Zsh",
    "LLM (Large Language Model) integration"
  ]
}

The Quest for LLM Memory: Why pgvector?

pgvector Phase 2: From Zero to Semantic Search