Adversarial AI for Integrations: Building Our New 10-Step Workflow

Modern software systems are rarely monoliths. They're intricate tapestries woven from countless services, APIs, and micro-applications, often spanning multiple repositories and even different organizations. Understanding how these pieces truly integrate, identifying hidden dependencies, and uncovering potential vulnerabilities is a non-trivial, often manual, and error-prone task.

That's why we embarked on a mission to automate and enhance this process with AI. Our latest achievement: the "Integration Analysis" workflow template. This isn't just another checklist; it's a sophisticated, 10-step pipeline designed to unearth the deepest truths about cross-repo integrations, culminating in an adversarial challenge (Ipcha Mistabra) and a hardened synthesis (Cael).

The Challenge: Navigating the Integration Maze

Imagine trying to map the full surface area of an integration between two complex services. It's not just about API endpoints; it's about data flow, trust boundaries, security implications, potential side effects, and even organizational wisdom accumulated over years. Doing this manually is slow, inconsistent, and often misses critical nuances.

Our goal was to create a structured, AI-assisted approach that could:

Discover: Automatically map integration points and data flows.
Analyze: Perform security and trust boundary assessments.
Strategize: Propose robust integration strategies.
Adversarially Test: Challenge the proposed integration with a "red team" LLM agent (Ipcha Mistabra).
Synthesize: Produce a hardened, resilient integration report (Cael).

Designing the "Integration Analysis" Workflow

We designed a comprehensive 10-step pipeline, each step building on the last, leveraging our internal workflow engine's capabilities for multi-provider LLM comparisons and human review gates.

Here's a look at the core steps we implemented in src/lib/constants.ts:

intRecon — Dual Structural Reconnaissance: A broad initial sweep to understand the project contexts (think Google-scale context window).
intSurfaceMap — Integration Surface Discovery: Identifies explicit and implicit integration points, comparing outputs from Anthropic, OpenAI, and Google LLMs for a comprehensive view. This step is crucial, as its structured output (### N.) feeds directly into subsequent fan-out steps.
intSecurityAnalysis — Security & Trust Boundary Analysis: A deep dive into potential vulnerabilities and trust assumptions, again leveraging multiple LLM providers.
intReview1 — Integration Review Gate: A human pause to review the initial findings before proceeding.
intStrategies — Integration Strategy Development: Based on the intSurfaceMap, this step fans out to propose multiple integration strategies.
intWisdom — Cross-Project Wisdom Protocol: Incorporates historical project wisdom and memory to refine strategies.
intAdvisory — Integration Advisory Report: Generates an initial advisory report, comparing insights from Anthropic and OpenAI.
intIpchaAnalysis — Ipcha Mistabra Analysis: This is where things get interesting. Our adversarial LLM agent (powered by Anthropic, OpenAI, and Google) actively seeks to break or find flaws in the proposed integration.
intIpchaReview — Ipcha Mistabra Review: A critical human review step to evaluate the adversarial findings and decide on necessary mitigations.
intCaelSynthesis — Cael Final Hardened Report: The ultimate output – a synthesized, hardened report incorporating all findings, strategies, and adversarial mitigations, comparing outputs from Anthropic and OpenAI for robustness.

This entire workflow is now a built-in template, ready for use by our teams.

Lessons from the Trenches: Wrestling with the Workflow Engine

While the design was clear, translating it into a functional workflow within our engine presented a few interesting challenges. This is where the rubber met the road during our post-code review session.

Lesson 1: Template Limitations & Provider Strategies

The Problem: We initially designed our StepTemplates to include fields like providerFanOutConfig (for parallel execution across providers) and dualProviderAutoSelect (for automatic best-provider selection). However, our StepTemplate interface and the stepTemplateToLocalStep() mapping function in new/page.tsx didn't support these fields directly. The workflow engine (workflow-engine.ts) only executes true fan-out on specific llm step types with explicit configurations.

The Workaround: We leveraged compareProviders instead. While compareProviders doesn't fan out in parallel to pick the best automatically, it allows the user to see and compare outputs from multiple LLM providers (e.g., Anthropic, OpenAI, Google) for a given step and then manually select the most relevant one. This still achieves a multi-perspective analysis, just with a human in the loop for final selection.

Takeaway: Always understand the existing interface contracts and engine capabilities. Sometimes a "good enough" alternative that fits the current system is better than blocking progress for a full engine re-architecture. Plan for future enhancements to extend interfaces when the need becomes critical.

typescript

// Simplified StepTemplate for illustration
interface StepTemplate {
  id: string;
  name: string;
  description: string;
  stepType: "llm" | "review" | "human";
  promptTemplate: string;
  // ... other core fields
  
  // This was available and proved crucial for multi-provider insights!
  compareProviders?: ("anthropic" | "openai" | "google")[]; 

  // We wanted these, but they weren't directly supported on the StepTemplate interface
  // providerFanOutConfig?: { providers: string[]; outputFormat: string; };
  // dualProviderAutoSelect?: boolean;
}

Lesson 2: The Nuance of `review` Steps and LLM Interactions

The Problem: Our initial design had a single intIpchaChallenge step, intended as a review step that would also run compareProviders to show adversarial outputs from different LLMs. We quickly discovered that compareProviders (and indeed, any LLM interaction beyond simple display) only executes on stepType: "llm" steps. review steps are designed purely for human interaction and decision-making, not for initiating new LLM calls.

The Solution: We split the concept into two distinct steps:

intIpchaAnalysis (stepType: "llm"): This step now handles the actual LLM call to perform the adversarial Ipcha Mistabra challenge, complete with compareProviders.
intIpchaReview (stepType: "review"): This step is a pure human pause point for reviewing the adversarial findings generated by intIpchaAnalysis and approving the next steps.

Takeaway: Be precise with stepType definitions. Each type has specific behaviors and limitations within the workflow engine. Designing your workflow to align with these inherent behaviors prevents unexpected runtime issues.

Lesson 3: The Untamed `insightScope`

A Known Limitation: We wanted to explicitly tag insights generated by our Ipcha Mistabra steps with an insightScope: "ethic". Currently, this tagging primarily happens implicitly if the workflow name contains "Ipcha Mistabra" or if providerFanOutConfig is present (which we couldn't use directly on the template). While we can ensure the workflow name includes "Ipcha Mistabra" when created, an explicit insightScope field on StepTemplate would provide more robust control.

Takeaway: Document these minor limitations. They aren't blockers but represent opportunities for future enhancements to improve explicitness and developer control.

What's Next?

With the "Integration Analysis" workflow template implemented and all code review issues addressed, we're ready for the next phase:

Commit & Deploy: Get these changes into main!
End-to-End Testing: Create an actual workflow in the UI, link some repos, and run it to verify everything works as expected, especially the fan-out pattern from intSurfaceMap and the multi-provider comparisons.
Future Enhancements: We'll be considering extending the StepTemplate interface to explicitly support providerFanOutConfig, dualProviderAutoSelect, and insightScope to give developers even more granular control over workflow behavior.

Building robust, AI-powered developer tools is a continuous journey of design, implementation, and learning. Each challenge, like wrestling with StepTemplate limitations, provides valuable insights that ultimately make our platform more powerful and resilient. The "Integration Analysis" workflow is a significant step forward in helping our teams navigate the complex world of modern software integrations with greater confidence and clarity.

json

{
  "thingsDone": [
    "Designed 10-step Integration Analysis workflow pipeline",
    "Implemented 10 step templates in src/lib/constants.ts",
    "Added 'Integration Analysis' to BUILT_IN_WORKFLOW_TEMPLATES",
    "Wrote comprehensive design doc at docs/plans/2026-03-08-integration-analysis-design.md",
    "Fixed all code review issues (splitting Ipcha, replacing config fields, adding fan-out instruction, removing dead references)",
    "Updated design doc to reflect all fixes"
  ],
  "pains": [
    "providerFanOutConfig and dualProviderAutoSelect not supported on StepTemplate interface",
    "compareProviders only runs on 'llm' step types, not 'review'",
    "insightScope: 'ethic' cannot be set directly via StepTemplate"
  ],
  "successes": [
    "Successfully implemented a complex 10-step AI workflow template",
    "Achieved multi-provider comparison functionality using compareProviders",
    "Addressed all critical code review feedback",
    "Maintained clean TypeScript and no schema/DB changes"
  ],
  "techStack": [
    "TypeScript",
    "Workflow Engine (custom)",
    "LLM Providers (Anthropic, OpenAI, Google)",
    "Markdown",
    "Mermaid Diagrams"
  ]
}