The Art of Intelligent LLM Selection: A Deep Dive into Our Latest Feature
Explore how we implemented dynamic LLM provider and model selection, enabling tenant defaults, mid-discussion switching, and a seamless fallback experience for smarter AI conversations.
In the rapidly evolving landscape of AI, relying on a single Large Language Model (LLM) can feel like bringing a knife to a gunfight when you need a whole arsenal. Different models excel at different tasks, offer varying performance characteristics, and come with diverse cost implications. Forcing all conversations through a monolithic LLM is a compromise that limits potential and user agency.
That's why our latest feature focuses on empowering users and tenants with intelligent LLM provider and model selection. From setting tenant-wide defaults to allowing on-the-fly model switching during a discussion, and even offering graceful fallbacks in case of errors, we've built a robust system designed for flexibility and user control. This post details the journey, the technical decisions, and the lessons learned in bringing this dynamic capability to life.
The Core Problem: One Size Doesn't Fit All
Imagine a scenario where one LLM (like Anthropic's Claude) excels at creative writing, another (like OpenAI's GPT-4) at factual summarization, and a third (like Google's Gemini or a specialized Kimi model) offers unparalleled speed at a lower cost. Forcing all conversations through a single model is a compromise. Our users deserve the ability to choose the right tool for the job, and our platform needed the intelligence to guide them toward that choice.
Our goal was clear: implement a system for smart provider and model selection that includes:
- Tenant Defaults: Administrators should be able to set a default LLM for their entire tenant.
- Mid-Discussion Switching: Users should be able to change models during an ongoing conversation.
- Fallback UX: A graceful way to handle model failures and suggest alternatives.
- Model Catalog with Hints: A clear presentation of available models with relevant metadata (cost, speed, "best for").
Building Blocks: A Flexible Foundation
Our journey began at the foundation: the database schema and core data structures.
Database Schema Updates
We extended our prisma/schema.prisma to support this new flexibility:
- Tenant-level Defaults: We added
defaultProvideranddefaultModelfields to ourTenantmodel. This allows administrators to define the preferred LLM experience for their entire tenant, ensuring a consistent starting point for new discussions. - Discussion Overrides: For individual discussions, we introduced a
model_overridefield to theDiscussionmodel. This ensures that specific conversations can deviate from the tenant default when needed, providing granular control.
After these schema changes, we ran prisma db:push and prisma db:generate to apply the updates and regenerate our Prisma client, ensuring our backend services were ready to interact with the new data.
The Dynamic Model Catalog
To manage the growing array of LLMs, we created a static MODEL_CATALOG within src/lib/constants.ts. This catalog currently houses six models across major providers like Anthropic, OpenAI, Google, and Kimi, complete with crucial metadata like cost and speed hints. While static for now, this design offers a clear path for future expansion and ensures consistent model information across our application.
We also developed helper functions like getModelsForProvider, getDefaultModel, and getModelInfo to easily query and present this catalog data throughout the application.
API Endpoints for Control
To orchestrate these choices, we built a suite of tRPC queries and mutations:
discussions.availableProviders: A crucial query that dynamically checks which LLM providers have active API keys configured for the current tenant. This ensures users only see viable options, preventing frustration.discussions.updateProvideranddiscussions.updateModel: For real-time adjustments to an ongoing discussion's LLM choice.discussions.create: Extended to accept amodelOverrideinput, allowing new discussions to be initiated with a specific model from the outset.admin.getDefaultsandadmin.updateDefaults: For administrators, we introduced dedicated endpoints to manage tenant-wide LLM defaults. These endpoints include robust validation and audit logging. Importantly, while all authenticated users can read these defaults (enforceTenant), only administrators have the privilege to update them (enforceRole).
User Experience: Putting Choice in Their Hands
The true power of this feature lies in its user experience. We focused on seamless integration and intuitive controls.
The ProviderPicker Component
The centerpiece of our user interface is the src/components/discussion/provider-picker.tsx. This reusable React component is a testament to thoughtful UX design, offering:
- Provider Grouping and Model Listing: Clear categorization of models by their providers.
- Cost/Speed Badges and Hints: Visual cues to help users make informed decisions (e.g., "Best for creative writing").
- Availability Filtering: Dynamically hides providers for which the tenant doesn't have an API key configured.
- Flexible Props:
defaultOpenandonCloseprops for seamless integration into various parent components.
Integrating the Picker Across the Application
- New Discussion Page (
new/page.tsx): When starting a new discussion, theProviderPickerintelligently pre-selects the tenant's default provider and model, while clearly indicating provider availability and offering hints to guide model selection. - Discussion Detail Page (
[id]/page.tsx): Mid-conversation flexibility was a key requirement. Users can now click on provider labels within the StreamFlow UI to instantly bring up theProviderPicker. This allows them to switch providers or models on the fly, adapting the conversation's intelligence as needed. - Inline Error Retry UI: We also integrated the picker into our error handling. If an LLM call fails (e.g., due to an invalid API key), an inline retry UI appears, presenting the
ProviderPickerto allow users to select an alternative provider/model, alongside a "Retry same" button for convenience. - Admin Page: For tenant administrators, a new "LLM Defaults" tab was added to the admin page. Here, they can easily configure the default provider and model for all new discussions within their tenant using intuitive selection cards and a save button.
Under the Hood: Orchestrating LLM Calls
Ensuring that the chosen model was actually used downstream was critical. Our discussion-service.ts was updated to pass the model_override as LLMCompletionOptions to all provider stream/complete calls. This applies across all four discussion modes: single, parallel, consensus, and autoRound, guaranteeing that the user's or tenant's model preference is respected throughout the entire conversation flow.
Lessons Learned & Overcoming Hurdles
No development journey is without its bumps. One particular challenge arose during the integration of our ProviderPicker component:
-
The Problem: Initially, we tried rendering the
ProviderPickerconditionally inside ashowProviderPickerstate wrapper on the discussion detail page. The idea was to show it only when a user clicked to change the provider. However, this led to a frustrating double-click issue: the first click would render the component, and only the second click would open its internal dropdown. The component's internal click-outside detection mechanism was fighting with its parent's rendering logic. -
The Solution: We addressed this by enhancing the
ProviderPickerwithdefaultOpenandonCloseprops. Now, when the parent component renders theProviderPickerwithdefaultOpen={true}, the dropdown immediately opens, bypassing the need for an extra click. TheonCloseprop allows the parent to be notified when the picker's internal click-outside detection closes the dropdown, enabling clean state management. This experience reinforced the importance of designing reusable components with flexible state control for seamless integration, anticipating how they might be used in different contexts.
Looking Ahead: The Road Continues
With the core functionality now implemented and committed (commit 6744e4a), our immediate next steps involve rigorous testing and deployment:
- Push to origin: Get the code onto our main branch.
- End-to-end testing: Verify tenant defaults, confirm new discussions pre-select the correct model, test the error retry flow, and validate mid-discussion model switching.
- Future considerations: Expanding our
MODEL_CATALOGto potentially include local-only LLMs like Ollama (currently no Ollama entries since it's local-only), and persisting the selected model in theDiscussionMessage.modelfield upon completion for even finer-grained historical context.
Conclusion
This feature marks a significant leap forward in making our platform more adaptable and user-centric. By providing intelligent LLM selection capabilities, we're not just offering more choices; we're empowering users to harness the specific strengths of various AI models, leading to richer, more effective, and ultimately, smarter conversations. We're excited to see how this flexibility enhances the user experience!