nyxcore-systems
6 min read

Building Smarter, More Inclusive AI Workflows: A Deep Dive into Expert Teams

We've reached a significant milestone in refining our AI workflow engine, focusing on creating more professional, gender-neutral 'expert teams' and ensuring robust end-to-end execution. Discover how we're making AI collaboration more precise and inclusive.

AILLMWorkflowTypeScriptNext.jsSoftwareDevelopmentInclusivityAgenticAI

In the rapidly evolving landscape of AI-powered development, creating systems that are not only powerful but also precise, robust, and inclusive is paramount. Our latest development sprint centered on a crucial aspect of our AI workflow engine: the "expert teams" that guide our Large Language Models (LLMs) through complex tasks. This session marked the culmination of several weeks of focused effort, bringing us to a complete and verified solution.

The core idea behind our "expert teams" is to simulate a collaborative environment for the LLM. Instead of a single, monolithic prompt, we define a team of specialized "agents" with distinct roles and expertise. The LLM then leverages this team, assigning parts of a problem to the most relevant experts, leading to more structured, accurate, and context-aware outputs.

The Evolution of Our Expert Teams: Precision Meets Inclusivity

Our primary goal for this final session was to refine these expert team templates, moving towards a more professional and inclusive standard. Previously, our templates included some creative, female-coded titles. While imaginative, we recognized the importance of promoting gender neutrality and standard professional roles to enhance clarity, reduce potential biases, and better align with real-world project teams.

Here's how we transformed our expert team definitions:

  1. Gender-Neutrality and Standard Professional Roles: We meticulously updated the expert examples in src/lib/constants.ts. Instead of creative titles, we now feature a diverse set of gender-neutral names paired with standard, recognizable professional roles. This not only fosters inclusivity but also provides clearer guidance to the LLM on the specific expertise each "team member" brings.

    For instance, in our extensionPrompt template, you'll now find:

    typescript
    // src/lib/constants.ts (excerpt from extensionPrompt examples)
    // ...
    // - Alex Chen — Senior Plugin Architect
    // - Jordan Rivera — API Integration Lead
    // - Sam Nakamura — Test Automation Engineer
    // ...
    

    Similarly, our deepPrompt and secPrompts (for security-focused tasks) received similar updates:

    typescript
    // src/lib/constants.ts (excerpt from deepPrompt examples)
    // ...
    // - Taylor Kim — Senior Full-Stack Engineer
    // - Robin Andersen — Database Architect
    // - Jamie Okafor — UX Engineer
    // - Quinn Reyes — DevOps Lead
    // ...
    
    // src/lib/constants.ts (excerpt from secPrompts examples)
    // ...
    // - Morgan Lee — Application Security Engineer
    // - Riley Tanaka — Cryptography Specialist
    // - Casey Okoye — Auth & IAM Lead
    // ...
    
  2. Refined Instructions: We updated the field label from **Name & Title** to **Name & Role** across all templates, reinforcing the focus on functional expertise. Crucially, we removed the instruction "Give each expert a creative, unconventional title" from all templates. This subtle but significant change ensures the LLM adheres to the professional role guidelines we've established.

End-to-End Verification: Putting the New System to the Test

With the templates updated, the next critical step was to verify everything end-to-end with a new workflow. We ran a dedicated "Expert Team v2" workflow (ID 8cf4402b-e450-4624-a5d2-fb7c33ad1c79) designed for a Kubernetes CLI project.

The results were precisely what we aimed for:

  • The LLM successfully generated a diverse and relevant expert team for the Kubernetes CLI project, including roles like Taylor Kim (Go/CLI), Jordan Chen (K8s Platform), Alex Rivera (TUI/Systems), Sam Okafor (AI Integration), and Morgan Liu (DevOps Reliability).
  • Each prompt within the workflow was properly assigned to the relevant expert(s), demonstrating the system's ability to interpret context and distribute tasks effectively based on the new, refined roles.
  • The entire 3-step workflow completed successfully in just 208.2 seconds, confirming the efficiency of our updated engine.

This successful run, alongside another "Expert Team Test" workflow, validated that our changes not only improved the quality and inclusivity of our expert team definitions but also seamlessly integrated into our existing workflow execution engine.

Broader Enhancements: A Holistic Improvement

This final session also capped off a series of broader improvements across the application, enhancing the overall developer and user experience:

  • Streamlined Output Display: We replaced our previous parsePromptSections() and PromptSectionCard components with a full inline MarkdownRenderer. Now, all completed step outputs are beautifully rendered in markdown, making them much easier to read and digest.
  • Enhanced Output Control: A new toolbar for all completed step outputs provides convenient options: downloadMarkdown(), Copy, Edit, and Retry. This significantly improves the usability of our workflow results.
  • Explicit Expert Team Assembly: To ensure the LLM always correctly initializes the expert team, we added "Step 0: Assemble the Expert Team" to our extensionPrompt, deepPrompt, and secPrompts. This explicit instruction at the beginning of each workflow guides the LLM to set up its internal "team" before tackling the core problem.
  • Consistent System Prompts: All three systemPrompt fields were updated to consistently mention expert team assembly, reinforcing this crucial first step.

Lessons Learned: Navigating the Development Path

Development sessions are rarely without their challenges. Overcoming these "pains" provides invaluable lessons that strengthen our understanding and our codebase. Here are some key insights from this sprint:

  • Playwright Execution Context: We initially tried running Playwright tests from a /tmp/ directory, only to find that it must be executed from the project root to correctly resolve node_modules. A classic pathing gotcha!
  • Environment Variable Nuances: A small but significant detail: our project uses AUTH_SECRET for authentication, not NEXTAUTH_SECRET. Double-checking environment variable names specific to the project's configuration is always a good practice.
  • Asynchronous Workflow Execution with SSE: Early attempts to poll for workflow status after a start mutation proved inefficient. We quickly realized that our engine leverages an AsyncGenerator, meaning the correct approach is to consume the Server-Sent Events (SSE) endpoint (/api/v1/events/workflows/[id]) to drive and monitor execution in real-time. This provides a much more responsive and efficient user experience.
  • Zod Schema Validation for API Inputs: When creating a mutation, we initially passed input as a simple string. However, our Zod schema expected z.record(z.string()), requiring the input to be an object like { text: "..." }. Adhering strictly to API schema definitions is vital for robust data handling.

It's also worth noting a pre-existing TypeScript error in discussions/[id]/page.tsx:139 regarding a badge variant ("outline" not assignable). While not introduced by our changes, it's on the radar for future cleanup.

The Road Ahead

With these significant improvements deployed, our journey continues. We've identified several immediate next steps to further enhance the system:

  • Clean up any stale workflows that might be stuck in a running state.
  • Test the alternatives selection flow end-to-end, ensuring users can generate multiple solutions and select the best one.
  • Update our estimateWorkflowCost to accurately account for the generateCount multiplier, providing transparent cost estimates.
  • Test prompt editing on a pending workflow, empowering users to refine their instructions before execution.
  • Consider implementing a Table of Contents (TOC) or navigation aid for extremely long implementation prompt outputs (10k+ tokens), improving readability and scannability.

Conclusion

This development session marks a significant step forward in our mission to build sophisticated, user-friendly, and ethically responsible AI development tools. By refining our expert team templates for gender neutrality and professional roles, we're not only enhancing the precision of our LLM outputs but also fostering a more inclusive environment. Coupled with robust workflow verification and continuous improvements to the user experience, we're excited about the future of AI-powered collaboration.