From Protocol to Pixels: A Sprint of Shipping and Vision
Our latest dev sprint saw the full deployment of a complex arbitration protocol and API, alongside a breakthrough design for multimodal AI-powered image uploads. Dive into the details of shipping big, and thinking bigger.
The hum of the servers, the satisfying click of a git merge main, and the immediate pivot to the next big challenge – that's the rhythm of a productive development sprint. This past session was a masterclass in duality: the intense focus required to ship a complex, multi-faceted protocol, immediately followed by the creative freedom of designing a brand-new, AI-powered feature.
Let's unpack what went down.
The Big Ship: IPCHA Protocol & Its Ecosystem
Shipping a new protocol is never a small feat. This session saw the full deployment of our IPCHA (Inter-Process CHAllenge) protocol, complete with its API service and a comprehensive dashboard. This wasn't just a minor update; it was a substantial chunk of work, touching multiple layers of our stack.
Under the Hood of IPCHA:
- Protocol Implementation: The core of IPCHA lives across 15 distinct modules, rigorously tested with 78 Python tests. These live in
ipcha/,src/arbitration/,benchmarks/, andsdrl_claims/. Theb34b44fcommit now stands proudly on ourmainbranch, deployed and humming along. - API & Dashboard: To make IPCHA accessible and observable, we built a robust ecosystem:
- FastAPI Sidecar: A dedicated FastAPI service (
ipcha/api.py) exposing 10 core endpoints on port8100, containerized in its ownDockerfile. - REST Proxy: Our main application's API gateway (
src/app/api/v1/ipcha/) now offers 11 endpoints, secured withnyx_ip_token authentication, acting as a proxy to the sidecar. - Data Persistence: We extended our Prisma schema with new models like
IpchaApiToken,IpchaUsageLog, andIpchaJob, complete with Row Level Security (RLS) for robust data isolation. - BYOK Integration: A key feature for our users, the API seamlessly integrates Bring Your Own Key (BYOK) for LLM providers via the
X-LLM-Api-Keyheader, managed by ourProviderModelPicker. - Observability: An 8-tab dashboard provides real-time insights into IPCHA operations, token usage, and job status.
- FastAPI Sidecar: A dedicated FastAPI service (
- The Proof is in the Passing: After merging PR #133, all 14/14 integration tests passed with flying colors, a testament to the thoroughness of the implementation. You can dive into the full report at
docs/reports/2026-03-15-ipcha-api-integration-test.md.
Here's a glimpse of what a token-authenticated IPCHA API endpoint might look like, demonstrating the X-LLM-Api-Key integration:
# ipcha/api.py (FastAPI sidecar)
from fastapi import FastAPI, Header, HTTPException, Depends
from pydantic import BaseModel
app = FastAPI()
class IpchaJobRequest(BaseModel):
challenge_data: str
llm_provider: str = "openai" # Default provider
@app.post("/jobs/create")
async def create_ipcha_job(
request: IpchaJobRequest,
x_llm_api_key: str | None = Header(None, alias="X-LLM-Api-Key")
):
"""
Initiates an IPCHA job, optionally using a user-provided LLM API key.
"""
if not request.challenge_data:
raise HTTPException(status_code=400, detail="Challenge data is required")
# In a real scenario, we'd validate the token and persist the job.
# For LLM interaction, we'd pass x_llm_api_key to the ProviderModelPicker.
llm_key_status = "provided" if x_llm_api_key else "using default"
print(f"Received IPCHA job for {request.llm_provider}. LLM Key: {llm_key_status}")
# ... logic to create and track the job ...
return {"status": "job_created", "llm_key_usage": llm_key_status}
# src/app/api/v1/ipcha/ (REST proxy, simplified)
# This layer would handle nyx_ip_ token authentication and forward to the sidecar
# e.g., using httpx.post(f"http://ipcha-sidecar:8100/jobs/create", headers={"X-LLM-Api-Key": user_llm_key})
Peering into the Future: Designing Image Uploads for Project Notes
No sooner had the IPCHA dust settled than we pivoted to a challenge of a different kind: multimodal AI. Our goal? To enable image uploads for Project Notes, transforming static visuals into searchable, actionable insights.
After some intense brainstorming, we landed on Option A: describe-on-upload. The core idea is simple yet powerful: when a user uploads an image, we immediately send it to a vision-capable Large Language Model (LLM) to generate a detailed textual description. This description is then stored in our MemoryEntry alongside a reference to the image.
Why "describe-on-upload"?
- Searchability: Images become first-class citizens in our knowledge base. You can search for "UI elements with red annotations" and find the exact screenshot.
- Contextual Understanding: The LLM doesn't just tag; it understands relationships, identifies UI components, and even extracts action items directly from visual cues.
- Efficiency: By processing at upload time, we avoid on-the-fly processing delays when retrieving notes.
The Proof of Concept was genuinely exciting: We tested three annotated screenshots, and the LLM didn't just see pixels; it understood. It correctly identified:
- Specific UI elements (pages, fields, values, statistics).
- Handwritten annotations – colored circles, arrows, and text in different hues.
- Cross-references between images (e.g., "the feature checklist shown in image A relates to the settings page in image B").
- Actionable insights derived from the annotations.
Our existing src/server/services/storage.ts adapter already handles local and S3 storage for JPEG/PNG/WebP up to 50MB, so the storage layer is solid. The immediate next step is extending our LLM adapters to support vision capabilities (hello, Anthropic's Claude 3 Vision, or OpenAI's GPT-4V!). We'll also need to add an imageKey field to our MemoryEntry model.
Lessons from the Trenches: The Unseen Edges
Even with the high of deploying a major new protocol, the developer's journey is one of continuous vigilance. While IPCHA's specific pain points were meticulously documented in letter_20260315_0001.md (a good practice to keep separate, focused logs!), a new, subtle issue surfaced:
- The "Hanging" Analysis Feature: A user reported that our project analysis feature "hängt manchmal" (sometimes hangs). This isn't a showstopper for IPCHA, but it's a critical signal. It reminds us that even as we build new, exciting features, the stability and performance of existing systems must be continuously monitored and refined. Shipping new doesn't mean forgetting the old. This bug is now on our radar for immediate investigation.
What's Next on the Horizon
Our immediate focus shifts to bringing the image upload feature to life, which will then power a significant redesign of our Project Onboarding process:
- Build Image Upload for Notes:
- Extend
MemoryEntrywith animageKeyfield. - Add vision support to at least one LLM adapter (Anthropic or OpenAI).
- Implement a user-friendly upload UI (drag-and-drop or file picker) on the Notes page.
- On upload: store the image, send it to the vision LLM, save the description as
contentin theMemoryEntry, and display a thumbnail in the note card.
- Extend
- Redesign Project Onboarding: Use the new image upload feature to create a rich Project Onboarding note, complete with those three annotated screenshots.
- Automate Onboarding Action Points: Convert the visual information in the onboarding note into concrete action points via our workflow, covering areas like:
- Sources refactoring (e.g., a "Letters" menu item, Git filesystem view).
- Debugging and auto-running analysis on onboarding projects.
- Auto-filling settings from
READMEfiles and implementing a branch selector. - Refining the onboarding flow for existing vs. new repositories.
- Creating demo projects for one-shot Claude Code execution.
- Generate Implementation Prompts: Turn those action points into detailed prompts for our development workflow.
- Minor IPCHA Cleanup: A few small but important tasks remain, like setting up an
env_filefor Docker Compose and ensuring RLS SQL is idempotent.
This sprint was a fantastic blend of solid execution and visionary design. We shipped a core protocol that will underpin future capabilities, and we laid the groundwork for a multimodal AI feature that promises to transform how we interact with project knowledge. The journey continues, and we're excited for what's next!
{
"thingsDone": [
"IPCHA protocol implementation (15 modules, 78 tests)",
"IPCHA API service deployed (FastAPI sidecar, REST proxy, 10+11 endpoints)",
"Prisma models for IPCHA (IpchaApiToken, IpchaUsageLog, IpchaJob) with RLS",
"BYOK integration via X-LLM-Api-Key header",
"IPCHA Dashboard (8 tabs)",
"Image upload feature designed (Option A: describe-on-upload)",
"Image upload proof of concept (LLM correctly describes UI, annotations, cross-references, action items)"
],
"pains": [
"IPCHA-specific pain points documented separately",
"User reported 'analysis feature hangs sometimes' bug (needs investigation)"
],
"successes": [
"IPCHA fully deployed on main (b34b44f)",
"14/14 IPCHA API integration tests pass",
"Successful PoC of LLM-powered image description for Project Notes",
"Existing storage adapter supports image uploads up to 50MB"
],
"techStack": [
"Python",
"FastAPI",
"REST",
"Docker",
"Prisma",
"tRPC",
"TypeScript",
"LLMs (Anthropic, OpenAI vision models)",
"S3",
"Git",
"main branch deployment"
]
}