Supercharging LLM Workflows: Unleashing Expert Personas & Side-by-Side Provider Comparison in nyxCore

Building robust and reliable applications with Large Language Models (LLMs) often feels like a delicate dance between creativity and control. How do you ensure consistent, high-quality output? How do you pick the best model for a specific task, especially when the landscape of providers is constantly evolving? These are the questions we set out to answer in our latest development sprint for nyxCore, our powerful workflow engine.

I'm thrilled to share that we've successfully implemented two game-changing features: Workflow Personas and Multi-Provider A/B Comparison. These additions empower developers and users to exert finer control over their LLM interactions, leading to more predictable, higher-quality, and cost-effective results.

The Vision: Smarter, More Informed LLM Workflows

Our core goal was to inject expert knowledge directly into workflows and provide a clear mechanism for evaluating LLM performance across different providers.

1. Workflow Personas: Injecting Expertise Directly into Prompts

Imagine you're building a workflow that needs to generate technical documentation. Instead of hoping the LLM "understands" the context, what if you could tell it, "Act as a senior software architect specializing in distributed systems, providing concise and accurate explanations"? This is the power of workflow personas.

By allowing users to define and inject custom system prompts associated with specific "expert" roles, we can guide the LLM's behavior and tone for an entire workflow. This leads to:

Consistent Output: Maintain a specific voice and style throughout complex workflows.
Higher Quality: Leverage specialized knowledge to get more relevant and accurate responses.
Reduced Prompt Engineering Overhead: Define personas once and apply them across many workflows.

2. Multi-Provider A/B Comparison: The Side-by-Side Showdown

The LLM market is dynamic, with new models and providers emerging constantly. Choosing between OpenAI, Anthropic, Google, or even self-hosted Ollama models can be a significant decision, impacting cost, latency, and output quality. Our new multi-provider comparison feature solves this by allowing users to:

Compare Outputs Side-by-Side: For a given workflow step, trigger multiple LLM calls using different providers (e.g., OpenAI's GPT-4 vs. Anthropic's Claude 3 Opus).
Make Informed Decisions: Visually inspect the alternatives and select the best one to continue the workflow.
Optimize for Cost & Performance: Easily evaluate which provider offers the best balance for a specific task without manual switching and testing.
Reduce Vendor Lock-in: Maintain flexibility to switch providers as your needs or the market changes.

Under the Hood: Bringing the Vision to Life

This ambitious feature set required a holistic approach, touching our data model, backend services, and frontend UI.

The Data Foundation: Schema Updates

First, we extended our Prisma schema to store the new configurations:

Workflow.personaIds: An array of UUIDs linking a workflow to selected personas.
WorkflowStep.compareProviders: An array of provider names (e.g., ["openai", "anthropic"]) to indicate which providers should generate alternatives for a specific step.

prisma

// prisma/schema.prisma
model Workflow {
  // ... existing fields
  personaIds   String[] @db.Uuid
}

model WorkflowStep {
  // ... existing fields
  stepConfig Json
  compareProviders String[] @default([]) // e.g., ["openai", "anthropic"]
}

Backend Logic: The Brains of the Operation

The core intelligence resides in our src/server/services/workflow-engine.ts and src/server/trpc/routers/workflows.ts.

Persona Management: We introduced a new tRPC router (src/server/trpc/routers/personas.ts) to manage persona definitions (name, system prompt). The workflow-engine now includes a loadPersonaSystemPrompts() function that fetches these records and formats them as a combined system prompt string (e.g., ## Expert: Name\nSystemPrompt). This combined prompt is then injected into the ChainContext for the LLM call.
Dynamic System Prompt Construction: Within executeStep(), the LLM's system prompt is now dynamically built. Persona prompts are prepended, ensuring the "expert" context is established before the step's specific system prompt.
Multi-Provider Forking: This was a key architectural change. When compareProviders is set for a step, executeStep() no longer generates multiple alternatives based on temperature variations. Instead, it forks the execution, making one LLM call per specified provider, each with a providerOverride parameter. This results in a distinct alternative output for each chosen provider, ready for side-by-side comparison.
Cost Awareness: Running multiple LLM calls naturally increases cost. Our estimateWorkflowCost() function was updated to accurately account for this multiplication when compareProviders is active.

Frontend Experience: Intuitive Control

The user interface was updated to expose these powerful new features:

Workflow Creation/Editing (new/page.tsx, [id]/page.tsx):
- A new Persona Picker UI allows users to easily select multiple personas from a checklist, similar to our existing consolidation picker.
- A Compare Providers toggle in the SortableStepCard enables users to select which providers to compare for a given step. This intelligently auto-syncs with the generateCount to ensure the correct number of alternatives are displayed.
Visual Cues:
- Persona Badges in the workflow settings panel clearly indicate which expert roles are active.
- Provider + Model Badges now adorn each alternative card, making it clear which LLM generated which output.
- A concise "N providers" badge appears on step headers when multi-provider comparison is enabled, offering an at-a-glance summary.

Navigating the Rapids: Lessons Learned

No development sprint is without its challenges. Here are a couple of key lessons we learned along the way:

TypeScript's Strictness: Literal Union Types

Initially, we defined compareProviders in our frontend types as string[]. This seemed straightforward, but TypeScript quickly reminded us of its power when it flagged an error: string[] is not assignable to ("anthropic" | "openai" | "google" | "ollama")[].

Lesson: While string[] is technically an array of strings, Zod schemas (which we use for validation) are often more precise, expecting an array of specific literal strings. Adopting the more precise literal union type ("anthropic" | "openai" | "google" | "ollama")[] for StepConfig.compareProviders and StepTemplate.compareProviders aligned our frontend types with our backend schema expectations, ensuring robust type safety from end-to-end.

Default Values: The Silent Schema Requirement

Another common pitfall emerged when creating new workflow steps. Our default step configuration in the workflows.ts create mutation was missing the compareProviders field, leading to TS2741 errors.

Lesson: When adding new fields, especially optional ones, always ensure they have a default value (compareProviders: [] in this case) in any object that represents a complete instance of that type. This prevents downstream type errors and ensures consistency when new records are created.

(As a side note, we did spot an unrelated, pre-existing TypeScript error regarding a Badge variant in src/app/(dashboard)/dashboard/discussions/[id]/page.tsx, but it was confirmed to be outside the scope of this work.)

The Outcome: A More Intelligent & Controllable Workflow Engine

This sprint culminated in a fully implemented and committed feature set (commit aa37799 on main). The nyxCore workflow engine is now significantly more powerful, offering unprecedented control over LLM interactions. Users can now:

Build workflows that consistently leverage expert knowledge.
Empirically compare different LLM providers to find the best fit for their needs.
Make informed decisions that optimize for quality, cost, and performance.

We're incredibly excited about the possibilities these new features unlock and look forward to seeing the innovative ways our users will leverage them!