nyxcore-systems
8 min read

Supercharging LLM Workflows: Integrating Expert Personas and Multi-Provider A/B Testing in nyxCore

Dive into the recent development sprint for nyxCore, where we implemented powerful workflow personas and multi-provider LLM comparison, transforming raw ideas into robust features for smarter AI interactions.

LLMWorkflow EngineAI DevelopmentTypeScriptPrismatRPCNext.jsFrontend Development

Building intelligent applications with Large Language Models (LLMs) often means navigating a complex landscape of prompt engineering, model selection, and iterative refinement. At nyxCore, our mission is to simplify this for developers and users alike, providing a flexible workflow engine that empowers sophisticated AI interactions.

Recently, we pushed a significant update to nyxCore that tackles two critical challenges in LLM-driven workflows: injecting specialized knowledge and enabling robust model comparison. This post chronicles the journey from concept to code, detailing the technical decisions, the implementation, and the inevitable "lessons learned" along the way.

The Challenge: Smarter Prompts, Better Models

Our goal for this sprint was clear:

  1. Workflow Personas: How can we allow users to inject specific "expert" knowledge or roles into their workflow steps? Imagine a legal expert, a creative writer, or a technical reviewer – each with their own unique system prompt that can be dynamically included in any workflow step. This moves beyond a single static system prompt to a dynamic, composable one.
  2. Multi-Provider A/B Comparison: With the rapid evolution of LLMs, choosing the "best" model for a specific task is a moving target. We needed a way for users to compare outputs from multiple providers (e.g., OpenAI, Anthropic, Google, Ollama) side-by-side for a given step, allowing them to visually assess and select the most suitable alternative. This is crucial for both robustness and cost-effectiveness.

After a focused development session, these features are now live on main (aa37799), and our dev server is purring on port 3000. Let's break down how we got there.

Building the Brains and the UI

Implementing these features touched almost every layer of the nyxCore stack, from the database schema to the user interface.

1. The Data Foundation: Prisma Schema & tRPC API

First, we needed to store our new configurations. For personas, we added personaIds to the Workflow model, allowing a workflow to reference multiple expert personas. For multi-provider comparison, compareProviders was added to WorkflowStep, an array to hold the identifiers of LLM providers to compare.

prisma
// prisma/schema.prisma
model Workflow {
  // ... other fields
  personaIds      String[] @db.Uuid
  WorkflowSteps   WorkflowStep[]
}

model WorkflowStep {
  // ... other fields
  compareProviders String[] @default([]) // e.g., ["anthropic", "openai"]
}

With the schema updated, we extended our tRPC API:

  • A new personasRouter was created (src/server/trpc/routers/personas.ts) to list and retrieve persona definitions.
  • Our existing workflows.ts router was updated to handle personaIds during workflow creation and updates, and compareProviders for step creation and updates. We also bumped the selectAlternative limit to support up to four providers, reflecting the expanded comparison capabilities.

2. The Core Logic: The Workflow Engine

The heart of these features lies within src/server/services/workflow-engine.ts. This is where the magic of prompt composition and parallel execution happens.

  • Persona System Prompt Injection:

    • A new loadPersonaSystemPrompts() function fetches the details of selected personas.
    • It formats them into a standardized string, like ## Expert: [Name]\n[SystemPrompt].
    • When executeStep() runs, these formatted persona prompts are prepended to the step's own system prompt, creating a powerful, combined instruction set for the LLM. This ensures the expert's perspective is established before the specific step instruction.
  • Multi-Provider Parallel Execution:

    • Inside executeStep(), we introduced a conditional fork: if compareProviders has more than one entry, we switch into a "one-per-provider" mode.
    • Instead of just running one LLM call, we now fire off multiple requests in parallel, one for each specified provider, each with its own providerOverride parameter.
    • The results are then collected and presented as distinct alternatives.
    • Crucially, the estimateWorkflowCost() function was updated to account for this, multiplying the estimated cost by the number of providers being compared. This ensures users have a clear understanding of potential expenses.

3. Bringing it to Life: The Frontend Experience

User-facing changes were essential to make these powerful features accessible.

  • New Workflow Creation (/dashboard/workflows/new):

    • After selecting workflow consolidations, users are now presented with a "Persona picker" – a multi-select checklist where they can assign one or more expert personas to their workflow.
    • Within the SortableStepCard component, a new "Compare Providers" multi-toggle allows users to select which LLM providers they want to compare for that specific step. This toggle intelligently auto-syncs with the generateCount field: if you select 3 providers, generateCount will automatically adjust to 3 (or more, if you want temperature variations per provider!).
  • Workflow Detail View (/dashboard/workflows/[id]):

    • The settings panel now proudly displays "Persona badges" with a Users icon, showing at a glance which experts are guiding the workflow.
    • On the alternative cards generated by a step, we now show distinct "Provider + Model" badges, making it clear which LLM generated which output.
    • Step headers also sport an "N providers" badge when comparisons are active, providing quick visual feedback.

These UI updates ensure that the underlying complexity of persona injection and multi-provider execution is presented intuitively to the user.

Lessons from the Trenches: The "Pain Log" Reframed

No development sprint is without its snags. These moments, while frustrating in real-time, often yield the most valuable lessons.

1. Type Safety vs. Runtime Reality: The Zod Enum Conundrum

  • The Problem: We defined compareProviders in our stepConfigSchema using a Zod enum (e.g., z.enum(["anthropic", "openai"])). In the frontend, we initially tried to type compareProviders as string[]. TypeScript rightly threw an error: string[] is not assignable to ("anthropic"|"openai")[]. While string[] could hold valid enum values, it doesn't guarantee it, which is the point of the enum.
  • The Solution: We explicitly typed compareProviders in our frontend StepConfig, LocalStep, and StepTemplate interfaces using a literal union type: ("anthropic" | "openai" | "google" | "ollama")[]. This aligns the frontend types perfectly with the backend's Zod schema, ensuring compile-time safety.
  • Lesson Learned: When dealing with strict enums from the backend (like Zod's), ensure your frontend types reflect that strictness. Don't try to loosen it with a generic string[]; embrace the explicit union for robust type checking.

2. The Silent Killer: Missing Defaults

  • The Problem: When creating new workflows, our default step configuration in the workflows.ts mutation initially omitted the new compareProviders field. TypeScript, being the diligent guardian it is, promptly flagged TS2741: Property 'compareProviders' is missing in type....
  • The Solution: A simple fix: adding compareProviders: [] to our default step configurations.
  • Lesson Learned: Always consider the default state of new fields, especially when adding them to existing data structures or creation processes. Even optional fields often benefit from an explicit empty array or null default to prevent type errors.

3. Dev Environment Headaches: The Port 3000 Dance

  • The Problem: During a rapid iteration cycle, I tried running two dev servers simultaneously (an older branch and the new one) on what I thought were different ports. Turns out, one might have silently reverted or had a conflict, leading to weird style issues and unexpected behavior on port 3000.
  • The Solution: The classic developer reset:
    1. lsof -ti:3000 | xargs kill to forcefully terminate all processes on the offending port.
    2. Clear the .next cache directory.
    3. Perform a clean restart of the dev server.
  • Lesson Learned: When things get weird in your local environment, don't hesitate to perform a full cleanup. A fresh start is often the quickest path to debugging phantom issues, especially with caching and port conflicts.

What's Next? Robustness Through Testing

With the features implemented, the immediate next steps involve thorough testing:

  • Verifying styles after the cache clear.
  • Seeding the database with test personas.
  • Creating workflows with multiple personas and comparing providers, ensuring outputs are correct and alternatives are presented as expected.
  • Confirming backward compatibility for generateCount without compareProviders.
  • Testing workflow duplication to ensure all new configurations are preserved.

This ensures that our new features are not just functional, but also stable and intuitive for our users.

Conclusion

This sprint for nyxCore has significantly enhanced our workflow engine, moving us closer to truly intelligent and adaptable AI applications. By enabling dynamic persona injection, we empower users to guide LLMs with specialized knowledge. By integrating multi-provider A/B comparison, we provide the tools to make informed decisions about model performance and cost. These are crucial steps in building a robust, future-proof platform for AI development.

Happy coding!

json
{
  "thingsDone": [
    "Implemented Workflow.personaIds and WorkflowStep.compareProviders in Prisma schema",
    "Created tRPC router for personas (list, get)",
    "Updated workflows tRPC router for personaIds and compareProviders in create, update, steps.update, duplicate, and selectAlternative logic",
    "Enhanced workflow-engine.ts to load and inject persona system prompts into LLM calls",
    "Developed multi-provider fork logic in executeStep() for parallel LLM execution and result collection",
    "Updated cost estimation to account for multi-provider comparisons",
    "Integrated persona picker and compare providers toggle into new workflow creation UI",
    "Added persona badges, provider/model badges, and 'N providers' badges to workflow detail and step views",
    "Updated StepTemplate and related frontend types for compareProviders"
  ],
  "pains": [
    "TypeScript type incompatibility between Zod enum and generic string[] for compareProviders",
    "Missing default value for compareProviders in initial step creation mutation",
    "Development server port conflicts and stale cache issues requiring full process kill and cache clear"
  ],
  "successes": [
    "Successfully implemented dynamic persona injection into LLM system prompts",
    "Enabled robust multi-provider LLM output comparison side-by-side",
    "Ensured type safety across frontend and backend for new features",
    "Maintained backward compatibility for existing workflow configurations",
    "Created intuitive UI components for complex backend logic"
  ],
  "techStack": [
    "TypeScript",
    "Next.js",
    "React",
    "Prisma",
    "tRPC",
    "PostgreSQL",
    "LLMs (OpenAI, Anthropic, Google, Ollama)",
    "Zod"
  ]
}