Giving AI Workflows a Brain: Building Persistent Context with Vector Search & pgvector

In the world of AI-powered applications, especially those orchestrating complex LLM workflows, context is king. But what happens when your AI completes a task, and then needs to start a new, related one? Does it forget everything it just learned? Most current systems operate like goldfish, requiring fresh context with each prompt. That's a huge waste of accumulated wisdom.

Our latest mission: to give our AI workflow engine a persistent memory. Imagine an LLM reviewing a design proposal, identifying key strengths and weaknesses. Instead of these insights vanishing into the ether, we want them captured, stored, and made available as context for future workflows – perhaps when refining that design, or even starting a new, similar project.

Enter the Workflow Memory System. Our goal was ambitious: allow users to review past insights, persist them as WorkflowInsight records, and then select these memories to inject directly into future prompts via a simple {{memory}} template variable.

This isn't just about dumping old text into a prompt; it's about intelligent retrieval using the power of vector embeddings and hybrid search. Here's a deep dive into how we're bringing this to life.

The Core Idea: `{{memory}}` as a Superpower

At its heart, the system revolves around capturing "insights" – distilled learnings, pain points, solutions, or key observations from any step in a workflow. Once captured, these insights are vectorized and stored. Later, when initiating a new workflow, a user can search and select relevant past insights. These selected insights are then dynamically injected into the LLM's prompt via our custom {{memory}} template variable, acting as highly targeted, relevant context.

Architecting Memory: From Schema to Search

Building this required touching almost every part of our stack.

1. The `WorkflowInsight` Data Model

We started with the WorkflowInsight Prisma schema. This model needed to be robust, capturing not just the insight's content (title, detail, suggestion, category, severity, insightType), but also its origin (workflowId, stepId, stepLabel, projectId) for traceability.

Crucially, it included a tsvector column for efficient full-text search and a vector(1536) column for our OpenAI embeddings. The vector column was the first hint of the adventures to come...

typescript

// Simplified WorkflowInsight Prisma schema excerpt
model WorkflowInsight {
  id              String    @id @default(uuid())
  projectId       String
  workflowId      String
  stepId          String
  stepLabel       String
  title           String
  detail          String?
  suggestion      String?
  category        String?
  insightType     InsightType // e.g., PAIN, STRENGTH
  severity        Severity?   // e.g., HIGH, LOW
  createdAt       DateTime  @default(now())
  updatedAt       DateTime  @updatedAt

  // For full-text search
  searchableContent       Unsupported("tsvector")? // Populated by a DB trigger
  // For vector search
  embedding               Unsupported("vector(1536)")? // Populated at runtime via raw SQL

  @@index([projectId])
  @@index([workflowId])
  @@fulltext([title, detail, suggestion])
}

Note the Unsupported types – more on that in the "Lessons Learned" section!

2. The Embedding Service: Turning Text into Vectors

To enable semantic search, we built a dedicated embedding-service.ts. This service leverages OpenAI's text-embedding-3-small model, generating 1536-dimensional vectors. We added batching support for efficiency and implemented buildEmbeddingText() to create structured prefixes for our insight content. This helps the embedding model understand the context of the text it's vectorizing.

typescript

// src/server/services/embedding-service.ts (simplified)
import { OpenAI } from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

export async function createEmbeddings(texts: string[]): Promise<number[][]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: texts,
  });
  return response.data.map(item => item.embedding);
}

// Example of structured text for embedding
function buildEmbeddingText(insight: { title: string; detail?: string; suggestion?: string }): string {
  let text = `Title: ${insight.title}`;
  if (insight.detail) text += `\nDetail: ${insight.detail}`;
  if (insight.suggestion) text += `\nSuggestion: ${insight.suggestion}`;
  return text;
}

3. Insight Persistence & Auto-Pairing

Our insight-persistence.ts service handles the heavy lifting of converting raw ReviewKeyPoints (our temporary review artifacts) into durable WorkflowInsight records. A neat feature here is the auto-pairing logic: if a "pain point" insight is identified, the system attempts to find a related "strength" insight from the same workflow step and category, linking them via pairedInsightId. This enriches the contextual understanding.

4. The Brain's Search Engine: Hybrid Vector + Full-Text

This was a critical component. Pure vector search is great for semantic similarity, but sometimes you just need to find exact keywords. We opted for a hybrid approach: 70% vector similarity + 30% tsvector full-text search. This gives us the best of both worlds.

The search logic lives in insight-search.ts, using Prisma.sql tagged templates for safe, parameterized SQL queries. (A critical lesson learned here, detailed below!)

5. Context Resolution: `{{memory}}` Comes to Life

The workflow-engine.ts is where the magic happens. When a prompt is being prepared, it checks for the {{memory}} template variable. If present, it triggers loadMemoryContent(memoryIds) from our workflow-insights.ts service. This service fetches the selected insights, formats them into a clean, grouped markdown block, and injects them directly into the prompt.

typescript

// Simplified prompt resolution (workflow-engine.ts)
async function resolvePrompt(promptTemplate: string, context: Record<string, any>): Promise<string> {
  let resolvedPrompt = promptTemplate;

  // ... resolve other context variables ...

  if (resolvedPrompt.includes('{{memory}}') && context.memoryIds) {
    const memoryContent = await loadMemoryContent(context.memoryIds);
    resolvedPrompt = resolvedPrompt.replace('{{memory}}', memoryContent);
  }

  return resolvedPrompt;
}

6. User Interface for Insight Capture

On the front-end, the SaveInsightsDialog is where users interact with the system after a review step. It presents a checkbox list of potential insights derived from the workflow's output, allowing users to keep, edit, or discard them before saving. This ensures only valuable, human-curated insights make it into the long-term memory.

Lessons Learned: The Bumps in the Road

No complex feature build is without its challenges. Here are some of the key "aha!" moments and critical fixes we encountered:

Lesson 1: `pgvector` and Prisma Don't Play Nice (Yet)

Our first attempt to add the vector(1536) column directly to the Prisma schema using npm run db:push resulted in an Unsupported("vector(1536)") error. Then, when trying to apply migrations, we hit ERROR: type "vector" does not exist.

The Problem: Our standard postgres:16-alpine Docker image didn't have the pgvector extension installed. Prisma, while aware of the vector type, requires the underlying database to actually support it.

The Workaround & Solution:

Initial workaround: We temporarily removed the embedding column from the Prisma schema and added it via raw SQL after manually installing the pgvector extension in a running container. This was a stop-gap.
The proper fix: Upgrade our Docker setup! We switched our docker-compose.yml from postgres:16-alpine to the specialized pgvector/pgvector:pg16 image. This image comes pre-bundled with the pgvector extension, simplifying the setup significantly. Once the new container is up, we can install the extension with CREATE EXTENSION vector; and then add the column safely. (Note: This still requires a docker compose down && docker compose up -d cycle, which needs careful volume migration planning in production).

Lesson 2: Inline `tsx` and Shell Globbing

A minor but frustrating hiccup: trying to run npx tsx -e '...' with a ! character in the inline code. Zsh's glob expansion interpreted the ! as a history expansion, leading to syntax errors before tsx even saw the code.

The Workaround: Instead of inline execution, we now write the TypeScript snippet to a temporary file and execute it via npx tsx /path/to/temp/file.ts. Simple, but a reminder of shell interaction quirks.

Lesson 3: The SQL Injection Scare (CRITICAL)

This was the most important lesson. Our initial implementation for hybrid search in insight-search.ts used $queryRawUnsafe() with string interpolation for user-supplied filter arrays.

typescript

// DANGEROUS - DO NOT USE THIS PATTERN! (simplified)
const query = `SELECT * FROM "WorkflowInsight" WHERE id IN (${userSuppliedIds.map(id => `'${id}'`).join(',')})`;
await prisma.$queryRawUnsafe(query);

The Problem: This is a classic SQL injection vulnerability. If userSuppliedIds contained malicious input, an attacker could execute arbitrary SQL commands.

The Solution (Thanks to Security Review!): Our diligent security reviewer agent flagged this immediately. We switched to using Prisma's Prisma.sql tagged templates, which correctly parameterize inputs, preventing injection. We also tightened tRPC input validation with enums, regex patterns, and max lengths.

typescript

// SAFE - Use parameterized queries! (simplified)
import { Prisma } from '@prisma/client';

const ids = Prisma.join(userSuppliedIds.map(id => Prisma.sql`${id}`));
const query = Prisma.sql`SELECT * FROM "WorkflowInsight" WHERE id IN (${ids})`;
await prisma.$queryRaw(query); // Note: $queryRaw is safe with Prisma.sql

This was a stark reminder that even in a modern TypeScript/Prisma stack, vigilance against SQL injection is paramount, especially when constructing dynamic queries.

What's Next on the Roadmap

We've laid a solid foundation, but the journey isn't over. Our immediate next steps include:

UI Completion: Building the MemoryPicker component for the workflow creation page and integrating it.
End-to-End Testing: Verifying the full flow, from saving insights to resolving {{memory}} in LLM prompts.
Phase 2: Full pgvector Integration: Recreating the Docker Postgres container with the pgvector image, installing the extension, and finally adding the embedding column directly in Prisma. This will enable full vector search capabilities.
Refinement: Consolidating tRPC procedures and optimizing the hybrid search for performance.

Conclusion

Building a persistent memory system for AI workflows is a complex but incredibly rewarding endeavor. It transforms our LLM applications from stateless tools into intelligent partners that learn and adapt over time. While we've navigated infrastructure quirks, secured against critical vulnerabilities, and architected a robust data and search pipeline, the core vision – giving our AI a brain – is steadily becoming a reality.

Stay tuned for more updates as we continue to evolve our workflow intelligence!

json

{
  "thingsDone": [
    "Implemented WorkflowInsight model with tsvector and pgvector placeholders",
    "Developed embedding service using OpenAI text-embedding-3-small",
    "Created insight persistence logic with auto-pairing",
    "Built hybrid 70% vector + 30% full-text search",
    "Integrated {{memory}} template variable resolution in workflow engine",
    "Developed SaveInsightsDialog UI for insight capture",
    "Upgraded Docker to pgvector/pgvector:pg16 image",
    "Applied RLS and tsvector trigger to workflow_insights table",
    "Fixed critical SQL injection vulnerability using Prisma.sql"
  ],
  "pains": [
    "Prisma 'Unsupported(\"vector(1536)\")' error due to missing pgvector extension",
    "Docker Postgres image lacking pgvector extension ('type vector does not exist')",
    "Zsh glob expansion breaking inline tsx script execution",
    "SQL injection vulnerability with $queryRawUnsafe and string interpolation"
  ],
  "successes": [
    "Successful implementation of hybrid search strategy",
    "Robust insight persistence and auto-pairing logic",
    "Secure handling of dynamic SQL queries after review",
    "Clear path for pgvector integration with dedicated Docker image",
    "Functional UI for capturing insights"
  ],
  "techStack": [
    "TypeScript",
    "Prisma",
    "PostgreSQL",
    "pgvector",
    "OpenAI API",
    "tRPC",
    "Next.js",
    "React",
    "Docker",
    "Zsh",
    "esbuild"
  ]
}