Leveling Up Our AI Workflows: Personas, A/B Tests, and Token Bumps
We just wrapped a significant development session, shipping new features like AI workflow personas, multi-provider A/B comparisons, and crucial performance boosts, all while learning valuable lessons about TypeScript, tooling, and dev server management.
Just finished a pretty intense development sprint, and it's always good to pause, reflect, and document the journey. This session was all about making our AI workflow engine smarter, more robust, and more developer-friendly. We tackled everything from giving our AI agents distinct 'personalities' to enabling side-by-side model comparisons, and even wrestling with some token limit dragons.
Here's a breakdown of what landed, what we learned, and where we're headed.
Empowering AI Agents with Personas
One of the big goals was to allow users to define personas for their AI workflows. Imagine having an "Expert Coder" persona, a "Creative Writer" persona, or a "Security Auditor" persona that can be injected into the AI's system prompt. This allows our AI agents to adapt their tone, knowledge, and focus dynamically based on the task at hand.
How We Built It:
- Core Data Model: Added
personaIds: string[]to ourWorkflowschema. This allows a workflow to reference multiple pre-defined personas. - Prompt Injection Logic: Implemented
loadPersonaSystemPrompts()inworkflow-engine.ts. This function now fetches the system prompts associated with the selected personas and injects them into the initial "Assemble the Expert Team" steps (deepPrompt,extensionPrompt,secPrompts). This ensures the AI's initial setup is persona-aware. - UI Integration: Added a persona picker to
workflows/new/page.tsx, allowing users to select and assign personas when creating or editing a workflow. Persona badges also now appear ondashboard/workflows/[id]/page.tsxfor quick visual context. - API Layer: Created a new tRPC router
src/server/trpc/routers/personas.tswithlistandgetendpoints to manage our persona definitions. This was then registered insrc/server/trpc/router.ts. Ourworkflows.tsrouter was also updated to handlepersonaIdsin create/update/duplicate operations.
The result? Our AI agents can now truly embody different roles, making our workflows far more flexible and powerful.
The Quest for AI Model Certainty: A/B Comparison
In the world of LLMs, choosing the right provider and model for a specific task is often a guessing game. We wanted to make that process data-driven. Enter multi-provider A/B comparison.
The Implementation:
- Step-Level Configuration: Introduced
compareProviders: string[](though with a specific type workaround, more on that in "Lessons Learned") onWorkflowStep. This flag tells our engine to run a particular step against multiple specified providers. - Forking Logic: When
compareProvidersis active, theworkflow-enginenow forks the execution of that step, sending the same prompt to each selected provider. - UI for Comparison: We updated the
SortableStepCardto include a "Compare Providers" toggle. When activated, the alternatives block indashboard/workflows/[id]/page.tsxnow displays the outputs from different providers side-by-side, complete with provider+model badges. This makes evaluating and selecting the best output incredibly straightforward. We also bumped theselectAlternativemax to 3, giving us more room for comparison.
This feature is a game-changer for evaluating model performance and ensuring the robustness of our AI pipelines across different LLM ecosystems.
Battling Truncation and Boosting Limits
During testing, we noticed some of our more complex AI steps were consistently truncating their outputs. The culprit? Hardcoded maxTokens limits.
The Fix:
Our deepExtend, deepWisdom, and deepImprove models in src/lib/constants.ts were capped at 8192 tokens. For involved tasks, especially when injecting verbose personas, this just wasn't enough. We bumped these limits to 16384.
// src/lib/constants.ts (simplified)
export const models = {
deepExtend: {
// ... other properties
maxTokens: 16384, // Previously 8192
},
deepWisdom: {
// ... other properties
maxTokens: 16384, // Previously 8192
},
deepImprove: {
// ... other properties
maxTokens: 16384, // Previously 8192
},
// ... other models
};
This simple but critical change ensures our AI agents can now produce comprehensive, untruncated responses, especially when dealing with richer context provided by personas.
Lessons from the Trenches: Our "Pain Log"
No dev session is complete without hitting a few snags. Here's what we learned along the way:
1. Type Safety vs. Flexibility with Zod Enums
-
The Problem: We initially tried to define
compareProvidersasstring[]in our frontendStepConfigtype. However, our backend Zod schema forWorkflowStepused a literal union like("anthropic"|"openai"|"google"|"ollama")[]to ensure only valid providers could be specified. TypeScript rightly threw an error, complaining thatstring[]wasn't assignable to the more specific Zod enum union type. -
The Workaround: The solution was to explicitly use the literal union type across the frontend
StepConfig,LocalStep, andStepTemplatedefinitions. This ensures type consistency end-to-end.typescript// Before (failed attempt): // compareProviders: string[]; // After (successful workaround): type ProviderName = "anthropic" | "openai" | "google" | "ollama"; compareProviders: ProviderName[]; // Now type-safe with Zod schema -
Lesson Learned: When dealing with Zod schemas and specific enum-like unions, ensure your frontend types mirror that specificity. Generic
string[]might seem convenient but will cause type conflicts with stricter Zod validations.
2. The Fickle Nature of Template Literals and Edit Tools
- The Problem: We were trying to use an "Edit" tool to modify large, backtick-delimited template literal strings (
``) in our code. The tool struggled to match strings that spanned across the internal structure of these literals (e.g., trying to find a substring that was broken by an interpolated variable). - The Workaround: Instead of trying to match long, complex strings, we switched to using smaller, unique substrings within the template literal. This allowed the edit tool to reliably find and replace the targeted content.
- Lesson Learned: When automating code edits, especially with template literals, identify unique, short, and non-interpolated anchor points for your search and replace operations. Don't rely on matching large, structurally complex strings.
3. Dev Server Hygiene is Crucial
- The Problem: Running two dev servers simultaneously led to port conflicts (naturally), but also some unexpected stale styles and inconsistent behavior that wasn't immediately obvious.
- The Workaround: A full teardown and clean restart became our go-to. This involved:
- Killing all processes on port 3000 (
kill-port 3000or similar). - Clearing the Next.js cache (
rm -rf .next). - A clean restart.
- Killing all processes on port 3000 (
- Lesson Learned: When things get weird in local development, especially with frontend frameworks, a clean slate is often the fastest path to resolution. Don't fight stale caches or lingering processes; just nuke 'em and restart. We're now creating a
scripts/dev-start.shto automate this cleanup and restart process.
Active State & Next Steps
All the changes mentioned above have been committed to main (aa3799 and 6623c29). Our dev server is running clean, and the database schema has been pushed.
Our immediate next steps involve:
- Finalizing and testing the
scripts/dev-start.shscript for consistent developer onboarding. - Pushing these commits to origin.
- Thoroughly testing the new persona injection to verify expert team behavior.
- Validating multi-provider A/B comparison with different models.
- Re-running our deep build pipelines to confirm no more truncation issues with the bumped token limits.
- (Minor housekeeping) Fixing a pre-existing
Badgevariant type error indiscussions/[id]/page.tsx.
It was a productive session, pushing our AI workflow capabilities significantly forward. Onwards!
{
"thingsDone": [
"Implemented workflow persona injection (personaIds on Workflow, loadPersonaSystemPrompts, persona picker UI)",
"Implemented multi-provider A/B comparison (compareProviders on WorkflowStep, provider fork, compare providers toggle UI)",
"Created tRPC router for personas (list, get)",
"Updated tRPC workflows router for personaIds and compareProviders",
"Updated dashboard UI for persona badges, provider+model badges, 'N providers' badge",
"Bumped maxTokens from 8192 to 16384 for deepExtend, deepWisdom, deepImprove models to fix truncation",
"Made Step 0 expert team prompts persona-aware",
"Updated system prompts for deepPrompt, extensionPrompt, secPrompts"
],
"pains": [
"TS error: string[] not assignable to Zod enum union type for compareProviders",
"Edit tool failed to match strings crossing template literal boundaries",
"Port conflict and stale styles when running two dev servers simultaneously"
],
"successes": [
"Successfully implemented complex features like persona injection and A/B comparison",
"Resolved critical token truncation issue",
"Improved developer experience by identifying and documenting common dev environment pitfalls",
"Established a clear path for automating dev server startup"
],
"techStack": [
"Next.js",
"TypeScript",
"tRPC",
"Prisma",
"Zod",
"LLMs (Anthropic, OpenAI, Google, Ollama)",
"React"
]
}