Debugging the Silence: Orchestrating AI Debates and a Smarter Chat UI

Every developer knows the unique blend of frustration and triumph that comes with a deep debugging session. Recently, our team embarked on one such journey, tackling a trifecta of challenges: a mysteriously silent OpenAI provider, the ambitious goal of getting AIs to debate each other, and a critical UI redesign for our discussion chat. What started as a bug hunt quickly evolved into a full-stack renovation, and we're excited to share the insights and lessons learned.

The Case of the Silent AI: Unmasking the 401

Our primary headache was a seemingly unresponsive OpenAI provider within our discussion feature. Anthropic would chime in, but OpenAI remained stubbornly quiet, offering no explanation. The streamParallelProviders function, designed to fetch responses from multiple LLMs simultaneously, was supposed to be robust.

After some digging, the culprit emerged: a 401 invalid_api_key error. The real kicker? This error was being swallowed silently. Our catch block in streamParallelProviders was simply returning null:

typescript

// Before: A silent killer
async function streamParallelProviders(...) {
  try {
    // ... API call logic ...
  } catch (error) {
    // 🤫 No one will ever know...
    return null;
  }
}

This is a classic "lesson learned" moment: never swallow errors silently, especially in critical paths. It leads to hours of head-scratching and debugging.

Our fix involved two key steps:

Logging the Error: A simple console.error(error) now ensures we see what's going on server-side.
Propagating the Error to the Client: Instead of null, the catch block now returns a structured error object, allowing the client to display a user-friendly message like --- PROVIDER (ERROR) --- <message>. This immediate feedback is invaluable for diagnosing issues like an invalid API key.

typescript

// After: Errors speak up!
async function streamParallelProviders(provider, ...) {
  try {
    // ... API call logic ...
  } catch (error: any) {
    console.error(`Error streaming from ${provider}:`, error);
    return {
      _error: true,
      provider: provider.name,
      message: error.message || 'Unknown error'
    };
  }
}

Once the user updated their OpenAI API key via the Admin panel, everything sprang to life. Crisis averted, and a more robust error handling system was born.

Orchestrating a Digital Debate: The AI Roundtable

With our providers finally speaking, the next challenge was to get them to speak to each other. Our "consensus" mode initially just ran streamParallelProviders, meaning both AIs responded independently to the user, without acknowledging each other's presence. The goal was a true AI-to-AI roundtable.

We completely rewrote streamConsensus() in src/server/services/discussion-service.ts to implement a sequential, turn-based discussion:

Sequential Turns: Instead of parallel calls, providers now take turns.
Shared Context: Each provider's turn receives the full conversation history, including the previous AI's responses.
Round Depth: A CONSENSUS_ROUNDS = 2 constant (currently hardcoded, but we're considering making it configurable) means each AI gets two turns, resulting in four AI messages per user prompt.

The magic, however, lay in prompt engineering. To facilitate a genuine debate, we crafted system prompts that:

Clearly identified each AI by name (e.g., "You are Anthropic.")
Instructed them to engage with and build upon the other's points.
Crucially, to distinguish speakers in the conversation history sent to the LLM, we prefixed messages like [ANTHROPIC]: This is Anthropic's point.

The Prefix Pitfall and the Prompt Fix

This [NAME]: prefix in the history led to an interesting, and initially frustrating, bug: the LLMs started echoing the prefix in their own responses, like [OPENAI]: I agree with Anthropic... This meant our saved content was polluted with these internal markers.

Lesson Learned: LLMs are powerful pattern matchers. If you show them a pattern in the conversation history, they might mimic it in their output.

The fix was elegant: we added an explicit instruction to the system prompt: "Do NOT prefix your own response with your name or brackets." This simple addition immediately resolved the issue. The [NAME]: prefix remains in the internal history passed to the LLM but is never leaked into the final output.

A UI Refresh for Smarter Conversations

Backend logic is crucial, but the user experience makes or breaks a feature. Our discussion chat layout needed a significant overhaul to accommodate the multi-agent interactions and provide a smoother experience.

We redesigned src/app/(dashboard)/dashboard/discussions/[id]/page.tsx with several key improvements:

Dynamic Layout:
- Provider A (first to respond): Left-aligned, max-w-[45%]
- Provider B (second to respond): Right-aligned, max-w-[45%]
- User Messages: Centered, max-w-[60%] This visual separation makes it incredibly easy to follow who said what in the debate.
- The header also visually indicates this: ← anthropic and openai →.
Smart Scrolling & "New Messages" Pill:
- We replaced the aggressive useEffect auto-scroll with a more user-friendly approach.
- A scrollContainerRef now tracks the user's scroll position.
- A floating pill button at the bottom center appears when needed:
  - "New messages" (with a down arrow) if new content arrives while the user is scrolled up (beyond a 100px threshold from the bottom).
  - "Scroll to bottom" if the user is at the bottom but wants to explicitly scroll.
Real-time Streaming Bubbles:
- To indicate which AI is currently "thinking," a streaming bubble (provider name + animated pulse dot) now appears, correctly positioned left or right to match the active provider.
Mid-Stream Provider Switching:
- When the streamProvider changes mid-stream (e.g., Anthropic finishes, and OpenAI begins its turn), we trigger a refetch() to flush the previous provider's content to the saved messages, ensuring a clean slate for the new provider's stream.

This combination of backend orchestration and frontend polish has transformed the discussion feature into a truly engaging and intuitive experience.

What's Next?

This session laid down some critical infrastructure. Our immediate next steps involve rigorous testing of all discussion modes (consensus, parallel, single provider), ensuring the "continue" flow works seamlessly, and tackling some minor but persistent issues like missing icons and placeholder error pages. We're also considering making CONSENSUS_ROUNDS configurable, which would open up even more dynamic discussion possibilities.

Building complex AI-driven applications is a journey of continuous learning. From debugging silent errors to fine-tuning prompt engineering for multi-agent interactions and refining the user interface, every step presents its own set of challenges and rewarding breakthroughs. We're excited to see where these digital debates lead!

json

{"thingsDone":["Fixed silent OpenAI 401 errors with robust logging and client-side error propagation.","Rewrote streamConsensus for sequential AI-to-AI roundtable discussions with shared context.","Implemented turn-based AI interactions with configurable consensus rounds.","Engineered system prompts to guide AI debates and prevent output prefix echoing.","Redesigned discussion chat layout for clear visual separation of messages (left/right/center).","Developed smart scrolling with 'New messages' pill button for improved UX.","Added real-time streaming bubbles indicating active AI provider.","Handled mid-stream provider switching for smooth content display."],"pains":["Silent API key 401 error in streamParallelProviders due to swallowed exceptions.","LLMs echoing internal conversation history prefixes in their responses.","Lack of clear visual distinction for multi-provider messages in chat UI.","Aggressive auto-scrolling disrupting user experience."],"successes":["OpenAI provider fully functional and integrated.","Successful implementation of interactive AI-to-AI debates.","Robust error logging and user-facing error messages.","Clean, intuitive, and dynamic chat UI for multi-agent discussions.","Effective prompt engineering to control LLM output formatting."],"techStack":["TypeScript","React.js","Next.js","SSE (Server-Sent Events)","OpenAI API","Anthropic API","PostgreSQL","Redis","Docker"]}