Building Multi-Provider LLM Comparison: When Your AI Needs Expert Personas

Have you ever wished you could inject domain expertise into your AI workflows, or compare how different LLM providers tackle the same problem? Last week, I tackled exactly that challenge while building two major features for our workflow engine: workflow personas (expert team injection) and multi-provider A/B comparison.

The goal was simple: let users assemble a team of expert personas to guide their workflows, and compare outputs from different LLM providers side-by-side. The implementation? Well, that's where things got interesting.

The Vision: AI Workflows with Expert Teams

Imagine you're building a content strategy workflow. Instead of just prompting a generic AI, you could inject personas like:

Sarah the SEO Expert: "I optimize content for search visibility and user intent..."
Marcus the Brand Strategist: "I ensure all content aligns with brand voice and positioning..."
Dr. Chen the Subject Matter Expert: "I provide technical accuracy and industry insights..."

Each persona becomes part of the system prompt, creating a virtual expert panel that guides the AI's responses. Combined with multi-provider comparison, you can see how Claude, GPT-4, and Gemini each interpret your expert team's guidance.

The Architecture: From Database to UI

Database Foundation

The schema changes were straightforward but powerful:

prisma

model Workflow {
  // ... existing fields
  personaIds    String[] @db.Uuid  // Array of expert persona IDs
}

model WorkflowStep {
  // ... existing fields  
  compareProviders String[] @default([])  // ["anthropic", "openai", "google"]
}

The Persona System

I created a new tRPC router for persona management:

typescript

// src/server/trpc/routers/personas.ts
export const personasRouter = createTRPCRouter({
  list: publicProcedure.query(async ({ ctx }) => {
    return await ctx.db.persona.findMany({
      select: { id: true, name: true, description: true }
    });
  }),
  
  get: publicProcedure
    .input(z.object({ id: z.string() }))
    .query(async ({ ctx, input }) => {
      return await ctx.db.persona.findUnique({
        where: { id: input.id }
      });
    }),
});

The magic happens in the workflow engine, where personas get formatted and injected:

typescript

async function loadPersonaSystemPrompts(personaIds: string[]): Promise<string> {
  const personas = await db.persona.findMany({
    where: { id: { in: personaIds } }
  });
  
  return personas
    .map(p => `## Expert: ${p.name}\n${p.systemPrompt}`)
    .join('\n\n');
}

Multi-Provider Comparison Logic

The real complexity came in the execution engine. When a step has multiple providers configured, instead of generating temperature variations, we fork the execution:

typescript

// Multi-provider comparison mode
if (step.compareProviders.length > 1) {
  const alternatives = await Promise.all(
    step.compareProviders.map(async (provider) => {
      const result = await executeStep(step, context, provider);
      return {
        ...result,
        provider,
        model: getModelForProvider(provider)
      };
    })
  );
  
  return { alternatives, requiresSelection: true };
}

Frontend Polish

The UI needed to handle two new interaction patterns:

Persona Selection: A multi-select checklist during workflow creation
Provider Comparison Toggle: Per-step provider selection with automatic generateCount synchronization

tsx

// Persona picker in workflow creation
<div className="space-y-2">
  <Label>Expert Team</Label>
  {personas.map((persona) => (
    <div key={persona.id} className="flex items-center space-x-2">
      <Checkbox
        checked={selectedPersonas.includes(persona.id)}
        onCheckedChange={(checked) => togglePersona(persona.id, checked)}
      />
      <span>{persona.name}</span>
    </div>
  ))}
</div>

// Provider comparison in step configuration  
<div className="flex gap-2">
  {PROVIDERS.map((provider) => (
    <Button
      key={provider}
      variant={compareProviders.includes(provider) ? "default" : "outline"}
      onClick={() => toggleProvider(provider)}
    >
      {provider}
    </Button>
  ))}
</div>

Lessons Learned: TypeScript Precision Matters

The most interesting challenges weren't architectural—they were about TypeScript's type system being smarter than I initially gave it credit for.

Challenge 1: Union Type Precision

The Problem: I initially defined compareProviders as string[] in the frontend, but the Zod schema expected a specific union type.

typescript

// This failed
interface StepConfig {
  compareProviders: string[];  // ❌ Too generic
}

// This worked
interface StepConfig {
  compareProviders: ("anthropic" | "openai" | "google" | "ollama")[];  // ✅ Precise
}

The Lesson: TypeScript's literal union types aren't just for show—they prevent runtime errors by catching invalid provider names at compile time.

Challenge 2: Default Values Everywhere

The Problem: Adding new required fields to existing interfaces means updating every place those interfaces are constructed.

typescript

// Had to add compareProviders to default step configs
const defaultStepConfig = {
  systemPrompt: "",
  generateCount: 1,
  compareProviders: [],  // ✅ Don't forget the new field!
};

The Lesson: When extending data structures, grep for all construction sites. TypeScript will catch most of them, but default objects can slip through.

The Result: Workflows That Think Like Expert Teams

The finished feature transforms how users interact with AI workflows. Instead of single-shot prompts, they can:

Assemble Expert Teams: Select relevant personas that inject domain expertise
Compare Provider Approaches: See how different LLMs interpret the same expert guidance
Make Informed Decisions: Choose the best output from side-by-side comparisons
Maintain Backward Compatibility: Existing temperature-based variations still work

What's Next?

The foundation is solid, but there's room to grow:

Dynamic Persona Creation: Let users create custom expert personas on the fly
Provider Performance Analytics: Track which providers perform best for different persona combinations
Consensus Mode: Automatically blend outputs from multiple providers instead of requiring manual selection

Key Takeaways

Start with the Data Model: Good schema design makes complex features feel natural
Type Safety is Your Friend: Embrace TypeScript's strictness—it prevents runtime surprises
Build in Layers: Database → API → Engine → UI. Each layer should work independently
Backward Compatibility Matters: New features shouldn't break existing workflows

Building AI tooling means constantly balancing power with usability. Workflow personas and multi-provider comparison add significant capability while keeping the user experience intuitive. Sometimes the best features are the ones that feel obvious in hindsight—even when the implementation details are anything but simple.

Want to see more deep dives into AI workflow architecture? Follow along as we continue building tools that make AI more powerful and accessible for everyone.