nyxcore-systems
5 min read

Bringing Vision to Your Notes: How We Built AI-Powered Image Uploads

We just rolled out a major upgrade: image uploads with AI-powered auto-descriptions for Project Notes. Dive into the technical journey, the challenges we overcame, and the lessons learned in bringing visual context to your development workflows.

LLMVision AIFull-stackDeploymentPrismaTypeScriptNext.jsAI WorkflowsProduct Development

In the world of project management and development, context is king. While text-based notes are invaluable, sometimes a screenshot, a diagram, or a quick sketch can convey more information than a thousand words. That's why we're thrilled to announce a significant new feature: image uploads with AI-powered auto-descriptions for your Project Notes!

This isn't just about sticking an image into your notes. We've integrated cutting-edge LLM vision capabilities to automatically describe what's in your images, making your notes richer, more searchable, and incredibly powerful for subsequent AI-driven workflows.

Seeing is Believing: The Feature in Action

Imagine you're debugging a tricky UI bug or documenting a new feature. You take a screenshot, drag it directly into your Project Note, and voilà! Our system immediately sends it to your chosen LLM (like GPT-4o-mini or Anthropic's latest vision model). Within moments, a detailed description of the image appears, ready to be edited or used as-is.

This description isn't just static text. Thanks to our new {{notes}} template variable, these visual insights can now flow directly into your AI workflows. Need to convert an annotated screenshot into a list of action points? An "Enrich" workflow can now take the image description and generate precise tasks for you.

Here's what makes this feature shine for our users:

  • Seamless Drag-and-Drop: Uploading images is as intuitive as dragging them from your desktop directly into the notes editor.
  • Intelligent Auto-Description: Leverage state-of-the-art LLM vision models to automatically generate descriptive text for your images.
  • User-Selected Providers: With our new ProviderModelPicker, you have control. Choose your preferred LLM provider and model for image description, allowing you to balance cost, speed, and detail.
  • Workflow Integration: The {{notes}} template variable now includes image descriptions, opening up a new dimension for AI-driven analysis and task generation based on visual input.
  • Visual Context: Thumbnails are displayed directly within your notes, providing quick visual cues.

Under the Hood: A Glimpse into the Architecture

Building this feature was a journey involving several core components of our platform. Here’s a quick rundown of the technical decisions and implementations:

  1. Data Model Expansion: We updated our Prisma schema, adding imageKey (for storage reference) and imageDescription (for the LLM-generated text) to both MemoryEntry and ProjectNote models.
  2. LLM Vision Adapters: This was a crucial piece. We extended our LLMMessage.content type to a union (string | Array<TextBlock | ImageBlock>).
    • OpenAI Adapter: Maps our ImageBlock format to OpenAI's image_url format, handling base64 data URIs for direct uploads.
    • Anthropic Adapter: Maps ImageBlock to Anthropic's native format, ensuring compatibility and efficient processing. We also added a getTextContent() helper for extracting text from mixed-content messages, especially useful for system prompts.
  3. Robust Storage Layer: Our StorageAdapter interface was extended with getFileBuffer(key). Both our LocalStorage and S3 implementations were updated to support retrieving image data for LLM processing. Image uploads are stored securely in a notes/{tenantId}/{uuid}-{filename} structure.
  4. Dedicated Image Description Service: A new image-describe.ts service orchestrates the entire process: retrieving the image buffer, selecting the correct LLM provider/model (based on user choice), sending the image for description, and returning the result.
  5. Workflow Engine Integration: The {{notes}} template variable was registered within our workflow-engine.ts to include the imageDescription when resolving prompts, truly enabling AI to "see" what's in your notes.
  6. API & UI: We added tRPC mutations (getImageUploadUrl, describeImage) to both memory and projects.notes routers. The UI in the Project Notes tab now features drag-and-drop upload, the ProviderModelPicker, and displays image thumbnails and descriptions.
  7. Design & Planning: As always, robust design specifications (docs/superpowers/specs/2026-03-15-image-notes-design.md) and implementation plans (docs/superpowers/plans/2026-03-15-image-notes.md) guided our development, ensuring a clear path from concept to code.

Lessons from the Trenches: Navigating Challenges

No development sprint is without its bumps. We encountered a couple of significant challenges that offered valuable lessons:

The pgvector Scare: A Reminder on Safe Deployments

In a rush to deploy, a npx prisma db push --accept-data-loss command was inadvertently run on our production environment. This, as many experienced developers know, can be a dangerous move. In our case, Prisma, not natively supporting vector(1536) for pgvector, saw the embedding column on our workflow_insights table as an "extra" column and dropped it!

The Lesson: NEVER use db push --accept-data-loss on production. This incident reinforced the critical importance of our existing safe migration script (./scripts/db-migrate-safe.sh), which specifically handles unsupported column types and ensures schema changes are applied incrementally and safely. We quickly restored the column via raw SQL (ALTER TABLE ... ADD COLUMN IF NOT EXISTS embedding vector(1536); and CREATE INDEX ... USING hnsw ...) and moved forward, but it was a stark reminder of why robust deployment processes are paramount.

User Feedback Guides the Way: Feature Placement

Initially, we began building the image upload functionality for our general "Memory Hub" page. However, through internal feedback and understanding the core user workflows, it became clear that the most impactful location for this feature was directly within the "Project Notes" section.

The Lesson: User-centric development is key. While the initial implementation wasn't wasted (many components were reusable), we quickly pivoted, adding the necessary fields to the ProjectNote model and integrating the tRPC mutations into the projects.notes sub-router. This agility ensured the feature landed where it would provide the most value to our users.

What's Next?

With the image upload feature now live, our immediate next steps involve rigorous testing on production with real-world scenarios. We'll be uploading annotated screenshots, using our "Enrich" workflows to convert their descriptions into action points, and generating implementation prompts for our upcoming "Project Onboarding" feature.

This continuous cycle of building, deploying, and refining is what drives us forward. We're excited to see how these new visual capabilities empower your development workflows and look forward to sharing more updates soon!