Supercharging LLM Workflows: Integrating Expert Personas and Multi-Provider A/B Testing in nyxCore
Dive into the recent development sprint for nyxCore, where we implemented powerful workflow personas and multi-provider LLM comparison, transforming raw ideas into robust features for smarter AI interactions.
Building intelligent applications with Large Language Models (LLMs) often means navigating a complex landscape of prompt engineering, model selection, and iterative refinement. At nyxCore, our mission is to simplify this for developers and users alike, providing a flexible workflow engine that empowers sophisticated AI interactions.
Recently, we pushed a significant update to nyxCore that tackles two critical challenges in LLM-driven workflows: injecting specialized knowledge and enabling robust model comparison. This post chronicles the journey from concept to code, detailing the technical decisions, the implementation, and the inevitable "lessons learned" along the way.
The Challenge: Smarter Prompts, Better Models
Our goal for this sprint was clear:
- Workflow Personas: How can we allow users to inject specific "expert" knowledge or roles into their workflow steps? Imagine a legal expert, a creative writer, or a technical reviewer – each with their own unique system prompt that can be dynamically included in any workflow step. This moves beyond a single static system prompt to a dynamic, composable one.
- Multi-Provider A/B Comparison: With the rapid evolution of LLMs, choosing the "best" model for a specific task is a moving target. We needed a way for users to compare outputs from multiple providers (e.g., OpenAI, Anthropic, Google, Ollama) side-by-side for a given step, allowing them to visually assess and select the most suitable alternative. This is crucial for both robustness and cost-effectiveness.
After a focused development session, these features are now live on main (aa37799), and our dev server is purring on port 3000. Let's break down how we got there.
Building the Brains and the UI
Implementing these features touched almost every layer of the nyxCore stack, from the database schema to the user interface.
1. The Data Foundation: Prisma Schema & tRPC API
First, we needed to store our new configurations. For personas, we added personaIds to the Workflow model, allowing a workflow to reference multiple expert personas. For multi-provider comparison, compareProviders was added to WorkflowStep, an array to hold the identifiers of LLM providers to compare.
// prisma/schema.prisma
model Workflow {
// ... other fields
personaIds String[] @db.Uuid
WorkflowSteps WorkflowStep[]
}
model WorkflowStep {
// ... other fields
compareProviders String[] @default([]) // e.g., ["anthropic", "openai"]
}
With the schema updated, we extended our tRPC API:
- A new
personasRouterwas created (src/server/trpc/routers/personas.ts) to list and retrieve persona definitions. - Our existing
workflows.tsrouter was updated to handlepersonaIdsduring workflow creation and updates, andcompareProvidersfor step creation and updates. We also bumped theselectAlternativelimit to support up to four providers, reflecting the expanded comparison capabilities.
2. The Core Logic: The Workflow Engine
The heart of these features lies within src/server/services/workflow-engine.ts. This is where the magic of prompt composition and parallel execution happens.
-
Persona System Prompt Injection:
- A new
loadPersonaSystemPrompts()function fetches the details of selected personas. - It formats them into a standardized string, like
## Expert: [Name]\n[SystemPrompt]. - When
executeStep()runs, these formatted persona prompts are prepended to the step's own system prompt, creating a powerful, combined instruction set for the LLM. This ensures the expert's perspective is established before the specific step instruction.
- A new
-
Multi-Provider Parallel Execution:
- Inside
executeStep(), we introduced a conditional fork: ifcompareProvidershas more than one entry, we switch into a "one-per-provider" mode. - Instead of just running one LLM call, we now fire off multiple requests in parallel, one for each specified provider, each with its own
providerOverrideparameter. - The results are then collected and presented as distinct alternatives.
- Crucially, the
estimateWorkflowCost()function was updated to account for this, multiplying the estimated cost by the number of providers being compared. This ensures users have a clear understanding of potential expenses.
- Inside
3. Bringing it to Life: The Frontend Experience
User-facing changes were essential to make these powerful features accessible.
-
New Workflow Creation (
/dashboard/workflows/new):- After selecting workflow consolidations, users are now presented with a "Persona picker" – a multi-select checklist where they can assign one or more expert personas to their workflow.
- Within the
SortableStepCardcomponent, a new "Compare Providers" multi-toggle allows users to select which LLM providers they want to compare for that specific step. This toggle intelligently auto-syncs with thegenerateCountfield: if you select 3 providers,generateCountwill automatically adjust to 3 (or more, if you want temperature variations per provider!).
-
Workflow Detail View (
/dashboard/workflows/[id]):- The settings panel now proudly displays "Persona badges" with a
Usersicon, showing at a glance which experts are guiding the workflow. - On the alternative cards generated by a step, we now show distinct "Provider + Model" badges, making it clear which LLM generated which output.
- Step headers also sport an "N providers" badge when comparisons are active, providing quick visual feedback.
- The settings panel now proudly displays "Persona badges" with a
These UI updates ensure that the underlying complexity of persona injection and multi-provider execution is presented intuitively to the user.
Lessons from the Trenches: The "Pain Log" Reframed
No development sprint is without its snags. These moments, while frustrating in real-time, often yield the most valuable lessons.
1. Type Safety vs. Runtime Reality: The Zod Enum Conundrum
- The Problem: We defined
compareProvidersin ourstepConfigSchemausing a Zod enum (e.g.,z.enum(["anthropic", "openai"])). In the frontend, we initially tried to typecompareProvidersasstring[]. TypeScript rightly threw an error:string[]is not assignable to("anthropic"|"openai")[]. Whilestring[]could hold valid enum values, it doesn't guarantee it, which is the point of the enum. - The Solution: We explicitly typed
compareProvidersin our frontendStepConfig,LocalStep, andStepTemplateinterfaces using a literal union type:("anthropic" | "openai" | "google" | "ollama")[]. This aligns the frontend types perfectly with the backend's Zod schema, ensuring compile-time safety. - Lesson Learned: When dealing with strict enums from the backend (like Zod's), ensure your frontend types reflect that strictness. Don't try to loosen it with a generic
string[]; embrace the explicit union for robust type checking.
2. The Silent Killer: Missing Defaults
- The Problem: When creating new workflows, our default step configuration in the
workflows.tsmutation initially omitted the newcompareProvidersfield. TypeScript, being the diligent guardian it is, promptly flaggedTS2741: Property 'compareProviders' is missing in type.... - The Solution: A simple fix: adding
compareProviders: []to our default step configurations. - Lesson Learned: Always consider the default state of new fields, especially when adding them to existing data structures or creation processes. Even optional fields often benefit from an explicit empty array or null default to prevent type errors.
3. Dev Environment Headaches: The Port 3000 Dance
- The Problem: During a rapid iteration cycle, I tried running two dev servers simultaneously (an older branch and the new one) on what I thought were different ports. Turns out, one might have silently reverted or had a conflict, leading to weird style issues and unexpected behavior on port 3000.
- The Solution: The classic developer reset:
lsof -ti:3000 | xargs killto forcefully terminate all processes on the offending port.- Clear the
.nextcache directory. - Perform a clean restart of the dev server.
- Lesson Learned: When things get weird in your local environment, don't hesitate to perform a full cleanup. A fresh start is often the quickest path to debugging phantom issues, especially with caching and port conflicts.
What's Next? Robustness Through Testing
With the features implemented, the immediate next steps involve thorough testing:
- Verifying styles after the cache clear.
- Seeding the database with test personas.
- Creating workflows with multiple personas and comparing providers, ensuring outputs are correct and alternatives are presented as expected.
- Confirming backward compatibility for
generateCountwithoutcompareProviders. - Testing workflow duplication to ensure all new configurations are preserved.
This ensures that our new features are not just functional, but also stable and intuitive for our users.
Conclusion
This sprint for nyxCore has significantly enhanced our workflow engine, moving us closer to truly intelligent and adaptable AI applications. By enabling dynamic persona injection, we empower users to guide LLMs with specialized knowledge. By integrating multi-provider A/B comparison, we provide the tools to make informed decisions about model performance and cost. These are crucial steps in building a robust, future-proof platform for AI development.
Happy coding!
{
"thingsDone": [
"Implemented Workflow.personaIds and WorkflowStep.compareProviders in Prisma schema",
"Created tRPC router for personas (list, get)",
"Updated workflows tRPC router for personaIds and compareProviders in create, update, steps.update, duplicate, and selectAlternative logic",
"Enhanced workflow-engine.ts to load and inject persona system prompts into LLM calls",
"Developed multi-provider fork logic in executeStep() for parallel LLM execution and result collection",
"Updated cost estimation to account for multi-provider comparisons",
"Integrated persona picker and compare providers toggle into new workflow creation UI",
"Added persona badges, provider/model badges, and 'N providers' badges to workflow detail and step views",
"Updated StepTemplate and related frontend types for compareProviders"
],
"pains": [
"TypeScript type incompatibility between Zod enum and generic string[] for compareProviders",
"Missing default value for compareProviders in initial step creation mutation",
"Development server port conflicts and stale cache issues requiring full process kill and cache clear"
],
"successes": [
"Successfully implemented dynamic persona injection into LLM system prompts",
"Enabled robust multi-provider LLM output comparison side-by-side",
"Ensured type safety across frontend and backend for new features",
"Maintained backward compatibility for existing workflow configurations",
"Created intuitive UI components for complex backend logic"
],
"techStack": [
"TypeScript",
"Next.js",
"React",
"Prisma",
"tRPC",
"PostgreSQL",
"LLMs (OpenAI, Anthropic, Google, Ollama)",
"Zod"
]
}