Building Multi-Provider LLM Comparison: When Your AI Needs Expert Personas
How we implemented workflow personas and side-by-side LLM provider comparison in our workflow engine, turning AI outputs into expert-guided conversations.
Building Multi-Provider LLM Comparison: When Your AI Needs Expert Personas
Have you ever wished you could inject domain expertise into your AI workflows, or compare how different LLM providers tackle the same problem? Last week, I tackled exactly that challenge while building two major features for our workflow engine: workflow personas (expert team injection) and multi-provider A/B comparison.
The goal was simple: let users assemble a team of expert personas to guide their workflows, and compare outputs from different LLM providers side-by-side. The implementation? Well, that's where things got interesting.
The Vision: AI Workflows with Expert Teams
Imagine you're building a content strategy workflow. Instead of just prompting a generic AI, you could inject personas like:
- Sarah the SEO Expert: "I optimize content for search visibility and user intent..."
- Marcus the Brand Strategist: "I ensure all content aligns with brand voice and positioning..."
- Dr. Chen the Subject Matter Expert: "I provide technical accuracy and industry insights..."
Each persona becomes part of the system prompt, creating a virtual expert panel that guides the AI's responses. Combined with multi-provider comparison, you can see how Claude, GPT-4, and Gemini each interpret your expert team's guidance.
The Architecture: From Database to UI
Database Foundation
The schema changes were straightforward but powerful:
model Workflow {
// ... existing fields
personaIds String[] @db.Uuid // Array of expert persona IDs
}
model WorkflowStep {
// ... existing fields
compareProviders String[] @default([]) // ["anthropic", "openai", "google"]
}
The Persona System
I created a new tRPC router for persona management:
// src/server/trpc/routers/personas.ts
export const personasRouter = createTRPCRouter({
list: publicProcedure.query(async ({ ctx }) => {
return await ctx.db.persona.findMany({
select: { id: true, name: true, description: true }
});
}),
get: publicProcedure
.input(z.object({ id: z.string() }))
.query(async ({ ctx, input }) => {
return await ctx.db.persona.findUnique({
where: { id: input.id }
});
}),
});
The magic happens in the workflow engine, where personas get formatted and injected:
async function loadPersonaSystemPrompts(personaIds: string[]): Promise<string> {
const personas = await db.persona.findMany({
where: { id: { in: personaIds } }
});
return personas
.map(p => `## Expert: ${p.name}\n${p.systemPrompt}`)
.join('\n\n');
}
Multi-Provider Comparison Logic
The real complexity came in the execution engine. When a step has multiple providers configured, instead of generating temperature variations, we fork the execution:
// Multi-provider comparison mode
if (step.compareProviders.length > 1) {
const alternatives = await Promise.all(
step.compareProviders.map(async (provider) => {
const result = await executeStep(step, context, provider);
return {
...result,
provider,
model: getModelForProvider(provider)
};
})
);
return { alternatives, requiresSelection: true };
}
Frontend Polish
The UI needed to handle two new interaction patterns:
- Persona Selection: A multi-select checklist during workflow creation
- Provider Comparison Toggle: Per-step provider selection with automatic
generateCountsynchronization
// Persona picker in workflow creation
<div className="space-y-2">
<Label>Expert Team</Label>
{personas.map((persona) => (
<div key={persona.id} className="flex items-center space-x-2">
<Checkbox
checked={selectedPersonas.includes(persona.id)}
onCheckedChange={(checked) => togglePersona(persona.id, checked)}
/>
<span>{persona.name}</span>
</div>
))}
</div>
// Provider comparison in step configuration
<div className="flex gap-2">
{PROVIDERS.map((provider) => (
<Button
key={provider}
variant={compareProviders.includes(provider) ? "default" : "outline"}
onClick={() => toggleProvider(provider)}
>
{provider}
</Button>
))}
</div>
Lessons Learned: TypeScript Precision Matters
The most interesting challenges weren't architectural—they were about TypeScript's type system being smarter than I initially gave it credit for.
Challenge 1: Union Type Precision
The Problem: I initially defined compareProviders as string[] in the frontend, but the Zod schema expected a specific union type.
// This failed
interface StepConfig {
compareProviders: string[]; // ❌ Too generic
}
// This worked
interface StepConfig {
compareProviders: ("anthropic" | "openai" | "google" | "ollama")[]; // ✅ Precise
}
The Lesson: TypeScript's literal union types aren't just for show—they prevent runtime errors by catching invalid provider names at compile time.
Challenge 2: Default Values Everywhere
The Problem: Adding new required fields to existing interfaces means updating every place those interfaces are constructed.
// Had to add compareProviders to default step configs
const defaultStepConfig = {
systemPrompt: "",
generateCount: 1,
compareProviders: [], // ✅ Don't forget the new field!
};
The Lesson: When extending data structures, grep for all construction sites. TypeScript will catch most of them, but default objects can slip through.
The Result: Workflows That Think Like Expert Teams
The finished feature transforms how users interact with AI workflows. Instead of single-shot prompts, they can:
- Assemble Expert Teams: Select relevant personas that inject domain expertise
- Compare Provider Approaches: See how different LLMs interpret the same expert guidance
- Make Informed Decisions: Choose the best output from side-by-side comparisons
- Maintain Backward Compatibility: Existing temperature-based variations still work
What's Next?
The foundation is solid, but there's room to grow:
- Dynamic Persona Creation: Let users create custom expert personas on the fly
- Provider Performance Analytics: Track which providers perform best for different persona combinations
- Consensus Mode: Automatically blend outputs from multiple providers instead of requiring manual selection
Key Takeaways
- Start with the Data Model: Good schema design makes complex features feel natural
- Type Safety is Your Friend: Embrace TypeScript's strictness—it prevents runtime surprises
- Build in Layers: Database → API → Engine → UI. Each layer should work independently
- Backward Compatibility Matters: New features shouldn't break existing workflows
Building AI tooling means constantly balancing power with usability. Workflow personas and multi-provider comparison add significant capability while keeping the user experience intuitive. Sometimes the best features are the ones that feel obvious in hindsight—even when the implementation details are anything but simple.
Want to see more deep dives into AI workflow architecture? Follow along as we continue building tools that make AI more powerful and accessible for everyone.