From Gate Bug to Green Lights: Shipping Dual-Provider AI Workflows with CI/CD
Join me on a recent dev session where we tackled a subtle gate bug in our dual-provider AI workflow, deployed a robust CI/CD pipeline, and verified multi-model LLM inference end-to-end on production.
Building robust AI-driven applications often means navigating complex logic, integrating external services, and ensuring reliable deployment. Recently, I dedicated a session to address a critical bug in our dual-provider AI workflow and, in the same breath, establish an automated deployment pipeline. It was a journey from a subtle logical error to a fully verified, production-ready system, complete with a few detours through the thorny landscape of server configurations and platform limitations.
Let's dive into how we transformed a development session into a significant leap forward for our system.
The Elusive Dual-Provider Gate Bug
Our system is designed to leverage multiple Large Language Models (LLMs) from different providers, comparing their outputs and selecting the best one based on predefined criteria. This "dual-provider" capability is a core feature, offering resilience, cost optimization, and quality assurance. However, despite the logic being in place, we noticed the dual-provider path wasn't always being hit.
The culprit? A subtle logical gate within our workflow engine.
// src/server/services/workflow-engine.ts:1292 (simplified)
// Original condition (bugged):
// if (step.generateCount && step.generateCount > 1) {
// // ... dual-provider logic ...
// } else {
// // ... single-provider logic ...
// }
// The fix:
if (
(step.generateCount && step.generateCount > 1) ||
(step.compareProviders && step.compareProviders.length > 1)
) {
// Dual-provider logic: engage multiple models for comparison
// ...
} else {
// Single-provider logic: use the primary model
// ...
}
The original condition step.generateCount && step.generateCount > 1 meant that if generateCount was 1 (which is often the default for a single generation request), the dual-provider logic was never reached, even if compareProviders was explicitly set.
The fix was straightforward: include step.compareProviders.length > 1 in the alternatives block condition. This ensures that if we've specified multiple providers for comparison, the system correctly engages the dual-provider workflow, regardless of the generateCount.
Commit: 1cfbad0 - "Fix dual-provider gate: include compareProviders in alternatives condition"
Beyond the Bug: Building an Automated Deployment Pipeline
With the core logic corrected, the next logical step was to ensure rapid, reliable deployment. Manual SSH and Docker commands are fine for ad-hoc fixes, but for a production system, CI/CD is non-negotiable.
We set up a new GitHub Actions workflow: .github/workflows/deploy.yml. This pipeline is designed to:
- Trigger: After CI passes on the
mainbranch. - SSH to Hetzner: Connects to our production server.
- Pull Latest Code: Fetches the latest changes from the repository.
- Build Docker Image: Rebuilds our application's Docker image.
- Recreate App: Tears down and brings up the application containers using
docker compose. - Health Check: Verifies that both HTTP and HTTPS endpoints are responding correctly.
To secure the deployment, we generated an ed25519 deploy key on the production server (~/.ssh/deploy_key) and added its public counterpart to ~/.ssh/authorized_keys. The private key, along with our deployment host and user, were securely stored as GitHub secrets: DEPLOY_HOST, DEPLOY_USER, and DEPLOY_SSH_KEY.
Commit: 80f18cd - "Add CI/CD deploy workflow for automated production deployment"
This automation dramatically reduces the friction of pushing updates and provides a clear, repeatable deployment process.
Putting It to the Test: Production Verification
The ultimate proof is in the pudding. After deploying the fix and the new CI/CD pipeline, it was time to test the dual-provider feature end-to-end on production.
I created a specific workflow, "Dual-Provider Test v2 (Gate Fix)," and ran it. The results were exactly what we hoped for:
- Anthropic (claude-sonnet-4): 1,423 tokens, $0.0124
- Google (gemini-2.5-flash): 1,241 tokens, $0.0004
- Cael review (claude-sonnet-4): 2,332 tokens, $0.0088
- Total: 4,996 tokens, $0.0217, 23.5s
Crucially, our internal "Cael" review (an automated evaluation step) correctly selected Anthropic as the preferred provider. The reason? Anthropic's output offered "more concrete technical details, SQL examples, cost tables" – exactly the kind of nuanced quality we aim to capture with dual-provider comparisons.
The workflow checkpoint also correctly contained caelReview and selectedProvider fields, confirming the entire process worked as expected. Green lights all around!
Lessons from the Trenches: The "Pain Log" Transformed
No development session is without its challenges. Here's a look at some of the snags we hit and the valuable lessons learned:
1. The Perils of Production-Specific Compose Files
- Attempt: Ran
docker compose build appon production. - Failure:
no such service: app. - Lesson Learned: Production environments often use specialized Docker Compose files (e.g.,
docker-compose.production.yml) to manage services differently (e.g., using Nginx, Certbot, different resource limits). Always specify the correct compose file with-fwhen interacting with production Docker setups. - Takeaway: Always use
docker compose -f docker-compose.production.ymlon production.
2. Taming the SSH Beast: Rate Limits and Workflow Efficiency
- Attempt: Multiple rapid SSH connections from GitHub Actions to Hetzner.
- Failure: Connection drops after 2-3 rapid connections (SSH rate limiting on the server).
- Lesson Learned: Servers often have security measures like
MaxStartupsinsshd_configto prevent brute-force attacks. Rapid, successive SSH calls from an automated system can trigger these. - Takeaway: Combine multiple commands into a single SSH call, or strategically add
sleepcommands between distinct SSH sessions to avoid hitting rate limits.
3. Navigating Next.js Standalone Builds: Auth Tokens and External Systems
- Attempt: Tried generating JWT tokens for SSE authentication using
joseornext-auth/jwtinside a standalone Docker container. - Failure: These libraries are often part of the Next.js development bundle and not included in the highly optimized, minimal standalone output.
- Lesson Learned: When dealing with Next.js's standalone output, external dependencies for specific tasks (like JWT generation) might need to be handled outside the application container or in a separate utility. For Auth.js, generating JWEs locally using HKDF derivation is the robust approach.
- Takeaway: For future programmatic triggering, generate the JWE locally using HKDF derivation:
hkdf("sha256", AUTH_SECRET, salt="__Secure-authjs.session-token", info="Auth.js Generated Encryption Key (__Secure-authjs.session-token)", length=64). TheAUTH_SECRETand cookie name (__Secure-authjs.session-token) are crucial for this.
4. GitHub Environments on a Budget: Free Tier Limitations
- Attempt: Used
gh apito create a GitHub environmentproductionwithwait_timer: 0for deployment protection. - Failure: GitHub's free plan doesn't support environment protection rules (like wait timers or required reviewers).
- Lesson Learned: Enterprise features often come with enterprise pricing. It's important to understand the limitations of your current plan when designing CI/CD workflows, especially concerning advanced security or approval steps.
- Takeaway: For now, repository-level secrets are sufficient. If environment protection rules become a critical need, it will necessitate an upgrade to a paid GitHub plan.
What's Next?
With the dual-provider gate fixed and CI/CD humming, our focus shifts to enhancing the user experience and further system hardening:
- Certbot Auto-Renewal: Set up a cron job for automated SSL certificate renewal on production.
- SSHD Configuration: Adjust
MaxStartupsinsshd_configto prevent future SSH connection drops. - UI Toggle: Add a simple checkbox to the workflow builder UI to easily enable/disable
dualProviderAutoSelect. - NerdStats: Complete the per-provider cost table UI (
src/components/shared/nerd-stats.tsx) to visualize the cost breakdown. - Expand Dual-Provider: Test this feature in other critical pipelines (auto-fix, refactor, docs, code-analysis) to ensure broad compatibility.
This session was a prime example of how tackling a specific bug can cascade into significant infrastructure improvements. By addressing the dual-provider gate and simultaneously building out our CI/CD, we've made our system more reliable, efficient, and ready for future growth.
Happy coding!
{
"thingsDone": [
"Fixed dual-provider gate bug in workflow-engine.ts",
"Created and deployed CI/CD pipeline for automated production deployment",
"Configured GitHub secrets for deployment credentials",
"Successfully tested and verified dual-provider functionality on production",
"Documented pain points and their solutions/lessons learned"
],
"pains": [
"Incorrect Docker Compose file usage on production",
"SSH rate limiting causing connection drops during rapid deployments",
"Inability to generate JWT/JWE tokens within Next.js standalone Docker container for SSE auth",
"GitHub Free plan limitations on environment protection rules"
],
"successes": [
"Dual-provider feature now fully functional and verified",
"Automated production deployments via GitHub Actions",
"Clear insights into LLM performance and cost comparison",
"Improved understanding of production environment specificities and GitHub plan features"
],
"techStack": [
"TypeScript",
"Next.js",
"Docker",
"Docker Compose",
"GitHub Actions",
"SSH",
"Hetzner",
"Anthropic (Claude Sonnet)",
"Google (Gemini Flash)",
"NextAuth.js"
]
}