Beyond 'Creative': Engineering Better LLM Expert Teams for Robust Workflows

The promise of LLM-powered agentic workflows is immense, but building them reliably, efficiently, and professionally is a journey. Our recent development sprint focused on finalizing a core component of our system: the "expert team" templates. This isn't just about generating text; it's about orchestrating a team of specialized AI agents to tackle complex development tasks.

After several sessions of iterating, refining, and occasionally wrestling with stubborn bugs, we've hit a major milestone. This post dives into the final tweaks, the critical lessons learned, and the architectural decisions that shaped our expert team feature.

The Evolution of Our AI Expert Team

At its heart, our system leverages the concept of an "expert team." When a complex task comes in, the LLM first assembles a dynamic team of specialized agents, each with a specific role and expertise. These agents then collaborate, with their prompts tailored to their assigned skills, to deliver a comprehensive solution.

Our goal for this final session was clear:

Refine the Expert Team Templates: Ensure gender-neutral agents with standard, professional titles.
Verify End-to-End Workflow: Run new test cases to confirm the system behaves as expected.

From "Creative" to "Professional": A Shift in Philosophy

Initially, we experimented with more "creative, unconventional" titles for our AI experts. While fun, we realized that for a system designed to tackle real-world engineering problems, clarity and professionalism trump whimsy. Our users, mostly developers, need to quickly understand the roles and responsibilities of the AI agents assisting them.

This led to a significant update in our src/lib/constants.ts file, where our core prompt templates reside. We made the following key changes:

Gender-Neutral Names & Standard Professional Roles: We replaced specific gendered names and creative titles with a diverse set of gender-neutral names paired with common, recognizable professional roles. This not only promotes inclusivity but also grounds the agents in a more relatable, professional context.

typescript

// src/lib/constants.ts (simplified snippet showing the new style)

// Example for extensionPrompt:
const extensionPrompt = `...
Examples of expert team members:
- Alex Chen — Senior Plugin Architect
- Jordan Rivera — API Integration Lead
- Sam Nakamura — Test Automation Engineer
...`

// Example for deepPrompt:
const deepPrompt = `...
Examples of expert team members:
- Taylor Kim — Senior Full-Stack Engineer
- Robin Andersen — Database Architect
- Jamie Okafor — UX Engineer
- Quinn Reyes — DevOps Lead
...`

// Example for secPrompts:
const secPrompts = `...
Examples of expert team members:
- Morgan Lee — Application Security Engineer
- Riley Tanaka — Cryptography Specialist
- Casey Okoye — Auth & IAM Lead
...`

Updated Field Labels: We changed the internal field label from **Name & Title** to **Name & Role** across all templates, aligning with our new focus on professional roles over abstract titles.
Removed Ambiguous Instructions: The instruction "Give each expert a creative, unconventional title" was explicitly removed from all templates. This simplifies the LLM's task and steers it towards our desired output.

Validating the Vision: Successful Workflow Runs

To confirm these changes, we ran two critical test workflows:

"Expert Team v2" (K8s CLI Project): A 3-step workflow generating a Kubernetes CLI project.
- The LLM successfully generated a team: Taylor Kim (Go/CLI), Jordan Chen (K8s Platform), Alex Rivera (TUI/Systems), Sam Okafor (AI Integration), Morgan Liu (DevOps Reliability).
- Crucially, each subsequent step's prompt was correctly assigned to the relevant expert(s) based on their context-specific expertise. This validated the agentic routing.

These successful runs confirmed that our expert team templates are now robust, professional, and effectively guide the LLM's agentic behavior.

Broader Enhancements Across Sessions

This final session also brought to a close several other improvements made across the last four sprints:

Enhanced Output Presentation: We replaced our old PromptSectionCard UI with a full inline MarkdownRenderer. This means richer, more readable output directly in the UI.
Developer Toolbar: Added a handy toolbar for completed step outputs, featuring downloadMarkdown(), Copy, Edit, and Retry actions.
Improved Workflow Initialization: Integrated "Step 0: Assemble the Expert Team" into our extensionPrompt, deepPrompt, and secPrompts to explicitly guide the initial phase of any workflow.
System Prompt Alignment: All systemPrompt fields were updated to consistently mention the expert team assembly process, reinforcing the core architectural pattern.

Navigating the Minefield: Lessons Learned

No complex system is built without its share of head-scratching moments. Our "Pain Log" from this session (and prior ones) offers some valuable lessons:

Playwright and Project Roots:
- The Struggle: Trying to run Playwright tests from a /tmp/ directory.
- The Lesson: Playwright, like many Node.js tools, expects to be run from the project root to properly resolve node_modules and other project-specific paths. Always mind your current working directory (cwd) when executing tests or build scripts.
Environment Variable Naming Consistency:
- The Struggle: Expecting NEXTAUTH_SECRET to be the auth secret.
- The Lesson: In our project, the environment variable for Auth.js (formerly NextAuth.js) is AUTH_SECRET. Always double-check the specific configuration or documentation for the libraries you're using, especially when dealing with sensitive environment variables. A quick grep can save a lot of time.
Asynchronous Workflows: Polling vs. Server-Sent Events (SSE):
- The Struggle: Attempting to poll a workflow after a start mutation.
- The Lesson: Our workflow engine is designed around an AsyncGenerator. This means it streams events as they happen, rather than providing a static state to poll. The correct way to drive execution and get real-time updates is to consume the SSE endpoint (/api/v1/events/workflows/[id]). Polling an AsyncGenerator is fundamentally at odds with its design; you need to listen for events, not repeatedly ask for state. This was a critical architectural understanding for our real-time UI updates.
Zod and API Input Validation:
- The Struggle: Passing a plain string as input in a create mutation.
- The Lesson: Our API uses Zod for robust input validation, expecting z.record(z.string()) for the input field. This means simple string inputs need to be wrapped in an object, e.g., { text: "..." }. Always adhere to your API's defined schema; it prevents unexpected errors and ensures data integrity.
The Ever-Present Legacy Bug:
- Note: A pre-existing TypeScript error in discussions/[id]/page.tsx:139 ("outline" not assignable to Badge variant) remains. It's not our current scope, but a good reminder that projects always have their quirks.

What's Next?

Even with these significant milestones, the journey continues. Our immediate next steps include:

Cleaning up stale workflows.
Testing alternative selection flows (e.g., generating multiple solutions and selecting one).
Refining cost estimation to account for multiple generation attempts.
Enabling prompt editing on pending workflows.
Considering navigation aids for extremely long prompt outputs.

This session marked a crucial point in solidifying our LLM expert team architecture. By focusing on clarity, professionalism, and robust error handling, we're building a more reliable and effective system for developers. The evolution from "creative" to "professional" isn't just a stylistic change; it's a reflection of our commitment to building serious tools for serious problems.

json

{"thingsDone":[
    "Updated expert team templates to use gender-neutral names and standard professional roles (e.g., Senior Plugin Architect, Database Architect, Application Security Engineer)",
    "Removed 'Give each expert a creative, unconventional title' instruction from templates",
    "Changed field label from '**Name & Title**' to '**Name & Role**'",
    "Successfully ran 'Expert Team v2' workflow (K8s CLI project), validating expert assignment and context-specific expertise",
    "Replaced collapsed/truncated prompt section cards with full inline MarkdownRenderer",
    "Added download, copy, edit, and retry toolbar for completed step outputs",
    "Integrated 'Step 0: Assemble the Expert Team' instruction into all major prompt templates",
    "Updated all system prompts to mention expert team assembly"
],"pains":[
    "Attempting to run Playwright from a temporary directory instead of the project root",
    "Incorrect environment variable name for Auth.js secret (`NEXTAUTH_SECRET` vs `AUTH_SECRET`)",
    "Trying to poll an AsyncGenerator-based workflow engine instead of consuming its SSE endpoint",
    "Passing a plain string as API input when Zod expected a record (e.g., `{ text: '...' }`)"
],"successes":[
    "Finalization of expert team prompt templates",
    "Successful end-to-end verification of new workflow patterns",
    "Improved UI for workflow output and interaction",
    "Enhanced clarity and professionalism in AI agent roles"
],"techStack":[
    "Next.js",
    "TypeScript",
    "LLM (Large Language Models)",
    "Prompt Engineering",
    "Agentic AI",
    "PostgreSQL",
    "Kubernetes (implied by K8s CLI project)",
    "Auth.js",
    "Zod",
    "Playwright"
]}