Optimizing AI Workflows: Speed, Metrics, and the Road Ahead

As developers building AI-powered tools, our work is never truly "done." It's a continuous cycle of building, optimizing, measuring, and envisioning the next big leap. Every development session is a chance to chip away at the todo list, squash a bug, or lay the groundwork for a transformative feature.

This past session was a prime example of that iterative process. We tackled a mix of immediate performance gains, crucial metric additions, and some foundational research for the future. The core goals were clear: deploy faster models for auxiliary tasks, implement a team success rate metric, fix an action-points limit, and kick off research for a seamless "Notes to Action Points" flow and a comprehensive RAG system.

I'm happy to report significant progress on the first three, with the groundwork firmly laid for the latter.

The Need for Speed: Smaller Models, Bigger Impact

One of the biggest challenges with integrating large language models (LLMs) is balancing their power with the practicalities of cost and latency. While a powerful model like GPT-4 or Claude Opus is essential for core, complex tasks, many auxiliary functions — like quality scoring, simple data extraction, or content enrichment — don't require such heavy lifting. Using expensive models for these tasks is like driving a supercar to pick up groceries: overkill and inefficient.

Our solution was to introduce a FAST_MODELS constant. This simple map allows us to quickly switch to more economical and faster models for specific providers:

typescript

// src/lib/constants.ts
export const FAST_MODELS = {
  anthropic: 'claude-haiku-4-5-20251001',
  openai: 'gpt-4o-mini',
  google: 'gemini-2.5-flash',
  kimi: 'kimi-k2-0711-preview',
  // ... other providers
};

With this in place, we updated six key auxiliary services to leverage these cheaper, faster alternatives:

quality-scorer.ts: Now uses FAST_MODELS[provider.name] instead of the provider's expensive default.
quality-gates.ts: Three critical gate functions (security, documentation, letter generation) now benefit from the speed.
note-enrichment.ts: Faster processing for user notes.
discussion-knowledge.ts: Both digest generation and insight extraction are now more efficient.
action-point-extraction.ts: Quicker identification of actionable items.

This change not only reduces our operational costs but also significantly improves the responsiveness of these features, leading to a snappier user experience. We also added a model?: string field to our StepTemplate interface, providing even finer-grained control for future per-template model overrides. This foresight ensures we can continue to optimize model usage as our system evolves.

Measuring What Matters: Team Success Rate

In any collaborative environment, understanding performance is key to improvement. We added a new teamSuccessRate() function to our dashboard, specifically within src/app/(dashboard)/dashboard/personas/teams/page.tsx.

This function computes the average success rate across all team member personas, providing a quick, at-a-glance metric. Displayed as a color-coded percentage next to the member count badge, it offers immediate feedback on overall team effectiveness. This simple addition empowers team leads and members to monitor progress and identify areas for coaching or process refinement.

Unblocking Creativity: Action Points Description Limit

Sometimes, the most impactful changes are the simplest. We noticed that our 2000-character limit for action point descriptions was occasionally stifling users who needed to provide more detailed context or instructions.

The fix was straightforward: bump the description validation from 2000 to 10000 characters across create, update, and auto-extraction processes. This minor adjustment removes an arbitrary constraint, allowing users to fully articulate their action points without unnecessary truncation or workarounds. It's a small victory for user experience, demonstrating that paying attention to seemingly minor friction points can lead to significant quality-of-life improvements.

Lessons Learned: Robustness by Design

While we encountered no critical issues during this session, a valuable defensive coding pattern emerged from the FAST_MODELS implementation. We noted that FAST_MODELS[provider.name] might return undefined if a provider isn't explicitly mapped (e.g., a local Ollama instance).

Crucially, this isn't an error. Our LLM adapter is designed to gracefully fall back to the provider's default model when model: undefined is passed. This built-in resilience means we don't need extensive error handling for unmapped providers; the system simply defaults to a sensible, albeit potentially more expensive, option. It's a good reminder that anticipating edge cases and designing for graceful degradation makes for a much more robust and maintainable system.

The Road Ahead: Intelligent Automation and Knowledge Injection

With the immediate optimizations deployed, our focus now shifts to two exciting, larger-scale initiatives that promise to significantly enhance the platform's intelligence and utility.

1. Notes → Action Points Flow

Currently, when users enrich project notes with AI wisdom, the system returns suggested action points, but the user must manually apply them. This is a friction point we aim to eliminate. Our next step is to automate this process:

enrichmentStatus Field: We'll add a new field to the ProjectNote model to track whether a note has been processed into action points.
Auto-Apply or Selective UI: The goal is to auto-apply these action points directly after enrichment. We'll explore keeping a selective UI (e.g., "Apply All" button) for user control, but the default will lean towards automation.
Mark as Processed: Once action points are created, the note will be marked as "processed to action" in the notes list, providing clear visual feedback.

This will transform note enrichment from a manual step into a seamless, intelligent automation that directly drives productivity.

2. Project/Tenant RAG System

This is arguably the most ambitious item on our immediate roadmap: building a comprehensive Retrieval-Augmented Generation (RAG) system. The vision is to allow users to upload various files (.md, .pdf, .docx, even entire code repositories) and inject that knowledge directly into their workflows. Imagine an AI assistant that truly understands your project's specific documentation, codebase, or historical context.

This will involve significant research and development across several areas:

File Upload & Storage: Deciding on a robust solution for file upload and storage (e.g., S3 for scalability, or local storage for specific deployments).
Document Parsing: Developing reliable parsers for different file types (.md, .pdf, .docx) to extract clean text.
Chunking Strategy: Determining optimal strategies for breaking down documents into manageable chunks suitable for embedding.
Embedding & Vector Storage: Leveraging existing pgvector capabilities for storing and retrieving document embeddings.
API Endpoint Design: Creating a secure API endpoint with token-based authentication to manage file uploads and knowledge retrieval.
Workflow Integration: Seamlessly integrating this RAG system with our existing workflow template variables, allowing users to reference their uploaded knowledge bases within prompts.

The RAG system will be a game-changer, moving beyond generic LLM capabilities to truly personalized, context-aware AI assistance.

Onwards and Upwards

This session was a microcosm of our development philosophy: continuously seeking efficiency, enhancing user experience with thoughtful features, and meticulously planning for the future. We've made our AI workflows faster and cheaper, added valuable team insights, and unblocked user creativity. Now, with the research for automated action points and a powerful RAG system underway, we're excited to see how these next advancements will further empower our users. Stay tuned for more updates as we dive deeper into these exciting challenges!

json

{
  "thingsDone": [
    "Implemented FAST_MODELS for auxiliary AI tasks, reducing cost and latency.",
    "Updated 6 core services to utilize cheaper, faster LLMs.",
    "Added 'model?: string' to StepTemplate for future granular control.",
    "Deployed teamSuccessRate() function to display average success rate across team member personas.",
    "Increased action points description limit from 2000 to 10000 characters."
  ],
  "pains": [
    "No critical issues encountered, but learned to appreciate default fallbacks for unmapped providers in FAST_MODELS for robust design."
  ],
  "successes": [
    "Significant cost reduction and performance improvement for auxiliary AI tasks.",
    "Enhanced visibility into team performance with a new dashboard metric.",
    "Improved user experience by removing an arbitrary character limit.",
    "Proactive planning and research initiated for major new features (Notes -> Action Points, RAG system)."
  ],
  "techStack": [
    "TypeScript",
    "Next.js",
    "LLMs (Anthropic, OpenAI, Google, Kimi)",
    "pgvector",
    "S3 (planned for RAG)",
    "Frontend (React/Next.js)",
    "Backend (Node.js/Next.js API routes)"
  ]
}