From Silent Failures to AI Roundtable: Overhauling Our LLM Discussion Experience
We tackled a critical OpenAI API bug, engineered a dynamic AI-to-AI consensus roundtable, and revamped our chat UI for a more intuitive and engaging user experience. Dive into the challenges and solutions of building sophisticated multi-agent interactions.
Building applications that leverage large language models (LLMs) is an exciting journey, often filled with unexpected twists, silent failures, and the rewarding challenge of orchestrating complex interactions. Our recent development sprint was a prime example, addressing everything from a deceptive API key error to crafting a dynamic AI-to-AI debate and refining our user interface for a smoother experience.
This post will walk you through our solutions for:
- Diagnosing and fixing a silent OpenAI API authentication failure.
- Engineering a multi-turn, AI-to-AI consensus roundtable discussion.
- Redesigning the chat layout for better readability and user control.
Let's dive into the details!
The Silent Killer: Tackling a Deceptive API Key Error
One of the most frustrating types of bugs is the "silent failure." Our application was designed to allow users to involve multiple LLM providers (like OpenAI and Anthropic) in a discussion, but OpenAI simply wasn't responding in parallel discussions. No error messages, no logs, just... silence.
The Challenge
After some deep debugging, we pinpointed the culprit: an invalid OpenAI API key. The API was returning a 401 invalid_api_key error. The real issue, however, wasn't just the invalid key itself (which the user could easily fix), but how our backend was handling the error.
Our streamParallelProviders function, designed to fetch responses concurrently, had a catch { return null } block. This seemingly innocuous line was swallowing all errors from the provider API calls, leading to a null response without any indication of what went wrong. The frontend simply saw no data from OpenAI and assumed it was still "thinking" or had finished without content.
The Fix
The solution involved a two-pronged approach to robust error handling:
- Visible Error Logging: We modified the
catchblocks in bothstreamParallelProvidersandstreamSingleProviderwithinsrc/server/services/discussion-service.tsto log the full error details usingconsole.error. This ensured that even if the UI didn't show it, we'd have a server-side record. - Propagating Errors to the Client: Crucially,
streamParallelProviderswas updated to return a structured error object ({ _error, provider, message }) instead ofnull. This allowed us to display a clear error message in the stream itself, like--- PROVIDER (ERROR) --- <message>, informing the user exactly what went wrong.
Once the user updated their OpenAI API key via the Admin panel, the new key (verified with a sk-proj-Erev... token) worked perfectly, and discussions flowed smoothly.
Lesson Learned: Never silently swallow errors, especially when interacting with external APIs. Log them thoroughly and, where possible, propagate actionable messages to the user.
Orchestrating AI-to-AI Consensus: The Roundtable Discussion
Our initial "consensus" mode was a misnomer; it simply ran both providers in parallel, without any interaction between them. The real goal was to have LLMs engage in a discussion, responding to each other's points.
The Challenge & Vision
We wanted a true "roundtable" where LLMs take turns, each seeing the full conversation history, including the other's responses, before generating their own. This required a significant rewrite of how streamConsensus() operated.
The Engineering
We refactored streamConsensus() in src/server/services/discussion-service.ts to implement a sequential, turn-based system:
- Sequential Turn-Taking: Instead of delegating to
streamParallelProviders, the newstreamConsensus()now iterates through providers, allowing each to respond in sequence. - Configurable Rounds: A
CONSENSUS_ROUNDSconstant (currently2) dictates the depth of the discussion, meaning each user prompt now triggers 4 AI messages (2 providers × 2 rounds). - Contextual Prompts: The system prompt is critical. It now explicitly identifies each provider by name and instructs them to engage with the other participants' points. For example, "You are Anthropic. Engage with OpenAI's points."
- Conversation History Formatting: To help LLMs distinguish speakers in the conversation history, we prefixed messages with
[PROVIDER_NAME]: content.
A Quirky LLM Challenge
Initially, the LLMs, ever-eager to follow patterns, started echoing the [ANTHROPIC]: or [OPENAI]: prefix format in their own responses. This was an unwanted leak into the saved content.
The Fix: We added a specific instruction to the system prompt: "Do NOT prefix your own response with your name or brackets." This simple but crucial addition ensured that the [NAME]: prefix was used only for internal context, not for the LLM's generated output.
The result is a dynamic, multi-turn discussion where LLMs genuinely build upon each other's arguments, moving closer to a true "consensus."
A Smoother Conversation: Redesigning the Chat UI
While the backend was busy orchestrating AI debates, the frontend needed an overhaul to match the enhanced interaction and provide a more intuitive user experience.
The Vision
Our goals for src/app/(dashboard)/dashboard/discussions/[id]/page.tsx were clarity, readability, and user control over the scrolling experience.
The Redesign
- Dynamic Layout: We implemented a clear visual distinction for speakers:
- Provider A (first to respond): Left-aligned,
max-w-[45%]. - Provider B (second to respond): Right-aligned,
max-w-[45%]. - User messages: Centered,
max-w-[60%]. This spatial separation makes it easy to follow who said what.
- Provider A (first to respond): Left-aligned,
- Intelligent Scrolling: The old
useEffectauto-scroll was often jarring. We replaced it with a more user-friendly system:- We track the scroll position using
scrollContainerRefand anisNearBottom()check (100px threshold). - A floating pill button appears at the bottom center. When new messages arrive while the user is scrolled up, it displays "New messages." Otherwise, it says "Scroll to bottom" and offers a quick way to jump to the latest content.
- We track the scroll position using
- Streaming Indicators: During live streaming, a small bubble appears, positioned left or right to match the active provider. It shows the provider's name and an animated pulse dot, clearly indicating which AI is currently "thinking."
- Header Cues: The discussion header now visually reinforces the layout with
← anthropicandopenai →to clearly map providers to their respective sides of the conversation. - Dynamic Provider Switching: If the active streaming provider changes mid-stream (e.g., in a sequential consensus, moving from Anthropic to OpenAI), we trigger a
refetch()to ensure the previous provider's content is flushed and saved to the database before the new provider's stream begins.
These UI improvements significantly enhance the readability and interactive feel of the discussions, making the multi-AI experience much more engaging.
Lessons Learned & Future Steps
This session was a testament to the iterative nature of development. We learned crucial lessons:
- Robust Error Handling is Non-Negotiable: Silent failures are insidious. Always log errors, and provide clear feedback to users when things go wrong.
- Prompt Engineering is Key for Multi-Agent Systems: Carefully crafted system prompts are essential not just for guiding individual LLMs, but for orchestrating their interactions and preventing unwanted behaviors (like prefix echoing).
- UI/UX Matters Immensely: Even complex backend AI logic needs a thoughtful frontend. Small design choices can significantly impact user comprehension and satisfaction.
Looking ahead, our immediate next steps include thoroughly testing the continue flow in consensus mode, verifying parallel and single-provider discussions, and ensuring our production build compiles correctly. We're also considering making CONSENSUS_ROUNDS configurable per discussion, offering even more flexibility to our users.
It's been a challenging but incredibly rewarding sprint, pushing the boundaries of what's possible with multi-LLM interactions. Stay tuned for more updates as we continue to refine and expand our AI discussion platform!