Building Dynamic AI Workflows: Personas, Multi-Provider Testing, and Hard-Won Lessons

Last week, I tackled one of those satisfying development sessions where multiple complex features come together—implementing workflow personas and multi-provider A/B testing for an AI workflow engine. What started as a straightforward feature request turned into a masterclass in TypeScript wrestling, cache debugging, and the art of incremental problem-solving.

The Vision: Smarter Workflows with Personality

The goal was ambitious but clear: allow users to inject specific personas into their AI workflows while simultaneously testing different LLM providers side-by-side. Imagine running the same prompt through Claude, GPT-4, and Gemini simultaneously, all while maintaining consistent expert personas across the entire pipeline.

Here's what we built:

Workflow Personas

The persona system allows users to define expert identities that get injected into every step of their workflow. Instead of generic AI responses, you get outputs from a "Senior Software Architect" or "DevOps Security Specialist"—whatever expertise your workflow demands.

typescript

// Added to the Workflow model
personaIds: string[]

// New persona loading system
async function loadPersonaSystemPrompts(personaIds: string[]) {
  const personas = await getPersonas(personaIds);
  return personas.map(p => p.systemPrompt).join('\n\n');
}

The UI got a sleek persona picker that displays selected personas as badges, giving users immediate visual feedback about which experts are "in the room" for their workflow.

Multi-Provider A/B Testing

The second major feature lets users compare LLM providers directly within a single workflow step. Toggle on "Compare Providers," select your providers (Anthropic, OpenAI, Google, Ollama), and watch as each alternative gets processed by a different model.

typescript

// Enhanced step configuration
interface StepConfig {
  compareProviders?: ("anthropic"|"openai"|"google"|"ollama")[];
  // ... other config
}

The workflow engine forks execution when it hits a comparison step, running identical prompts against different providers and presenting the results side-by-side. Perfect for evaluating which model handles your specific use case best.

The Implementation Journey

Building the Foundation

First, we needed a solid data layer. I created a new tRPC router for personas with the usual suspects:

typescript

// src/server/trpc/routers/personas.ts
export const personasRouter = createTRPCRouter({
  list: publicProcedure.query(async ({ ctx }) => {
    return ctx.db.persona.findMany();
  }),
  get: publicProcedure
    .input(z.object({ id: z.string() }))
    .query(async ({ ctx, input }) => {
      return ctx.db.persona.findUnique({ where: { id: input.id } });
    }),
});

The workflow engine needed updates to handle persona injection at runtime. The key insight was making the existing "Assemble the Expert Team" step persona-aware—when personas are provided, use those; otherwise, fall back to the AI's own expert selection.

Solving the Token Truncation Problem

During testing, we discovered outputs were getting cut off mid-sentence. The culprit? Conservative token limits that made sense for quick iterations but not for comprehensive analysis.

typescript

// Bumped limits across deep processing templates
maxTokens: 16384, // Previously 8192

Sometimes the simplest fixes have the biggest impact on user experience.

Lessons Learned: When Code Fights Back

Not everything went smoothly. Here are the challenges that taught me the most:

TypeScript vs. Dynamic Enums

The Problem: I initially tried using string[] for the compareProviders field, thinking TypeScript would be flexible about provider names.

The Reality: Zod's enum validation is strict. Very strict.

The Solution: Explicit literal union types everywhere:

typescript

compareProviders?: ("anthropic"|"openai"|"google"|"ollama")[];

It's more verbose but eliminates an entire class of runtime errors. TypeScript's strictness pays dividends when you're dealing with external API integrations.

Template Literal Wrestling

The Problem: Updating large template literal strings with automated tools proved surprisingly difficult. The edit tool couldn't reliably match content that spanned multiple lines within backticks.

The Solution: Target unique substrings within the templates rather than trying to replace entire blocks. Sometimes the path of least resistance is the right path.

Development Server Mysteries

The Problem: Running multiple dev servers led to port conflicts and stale styling that persisted across restarts.

The Solution: A more aggressive cleanup routine:

bash

# Kill everything on port 3000
lsof -ti:3000 | xargs kill -9
# Clear Next.js cache
rm -rf .next
# Fresh start
npm run dev

This experience reminded me why having reliable development scripts matters. Inconsistent local environments kill productivity faster than complex bugs do.

The Results

After two solid commits to main, we had:

Workflow personas that inject expert knowledge into every step
Multi-provider A/B testing for direct model comparison
Increased token limits that prevent truncated outputs
Persona-aware team assembly that respects user-defined experts
Enhanced UI with clear visual indicators for active personas and providers

What's Next

The immediate roadmap includes:

Creating a robust dev-start.sh script to eliminate environment inconsistencies
Comprehensive testing of persona injection across different workflow types
Performance optimization for multi-provider steps (parallel execution is your friend)
Expanding the persona library based on user feedback

Takeaways for Fellow Developers

Embrace TypeScript's strictness—it catches integration issues before they become user problems
Cache invalidation is still hard—have a nuclear option ready
Template literals need special handling—plan your editing strategy accordingly
Token limits matter—monitor your AI outputs for truncation
Visual feedback is crucial—users need to see what's happening in complex workflows

Building AI-powered developer tools means constantly balancing flexibility with reliability. This session reinforced that the best solutions often come from embracing constraints rather than fighting them.

The codebase is healthier, the features work as intended, and I've got a few more debugging war stories to share. Not a bad way to spend a development session.

Have you built similar AI workflow systems? I'd love to hear about your approach to multi-provider testing and persona management. Find me on Twitter [@yourhandle] or drop a comment below.