Building Ipcha's Watchdog: Crafting a Smart Repository Audit System

The codebase is the heart of any software project. As it grows, maintaining quality, consistency, and catching subtle regressions becomes a monumental task. This is the challenge we're tackling with our self-testing system, Ipcha. While Ipcha already excels at testing specific components, we recognized a critical gap: comprehensive, repository-level auditing and stress testing.

This past session, late on March 9th, 2026, was entirely dedicated to bridging that gap. We moved from initial brainstorming to a fully approved and committed design for Ipcha's new repo-level audit feature. It was a focused session, laying down the architectural blueprints before we dive into implementation.

The Challenge: Auditing a Living, Breathing Repository

How do you effectively audit an entire repository? It's not just about running a linter. We needed a system that could:

Understand Context: Differentiate between core application code, configuration files, and tests.
Prioritize: Not all files are equally important or risky. We needed to focus our analytical power where it matters most.
Scale: Handle large repositories efficiently without overwhelming our systems or generating noise.
Integrate: Seamlessly extend Ipcha's existing self-testing and reporting capabilities.

Designing the Solution: Smart Prioritization & Layered Analysis

Our design addresses these challenges head-on with a multi-pronged approach.

Hybrid Scope: Local vs. Linked Repos

First, we defined the scope. Ipcha needs to audit its own core (nyxCore) which lives on the local filesystem. But it also needs to extend its reach to linked project repositories, which are typically hosted on platforms like GitHub.

Key Decision:

nyxCore: Local filesystem access for deep introspection.
Linked Projects: GitHub API integration for remote repository metadata and content fetching. This allows us to connect Ipcha to any project we're actively developing or supporting.

The Brains of the Operation: Smart File Prioritization

Auditing every single file in every single run is inefficient and often unnecessary. We needed a heuristic-driven approach to identify files most likely to benefit from an audit. Our solution combines user-defined globs with a smart prioritization algorithm.

We settled on a weighted scoring system based on four key factors:

Churn (0.35 weight): Files that change frequently are often areas of active development and thus, potential new bugs or regressions.
Size (0.25 weight): Larger files tend to be more complex and harder to reason about, increasing the risk of hidden issues.
Imports (0.20 weight): Files with many import/dependency relationships are critical integration points; issues here can have cascading effects.
Staleness (0.20 weight): Old, untouched files might contain deprecated patterns, security vulnerabilities, or simply be ripe for refactoring.

This means a small, frequently changed file with many imports will score higher than a large, stable configuration file that's rarely touched.

typescript

// Conceptual scoring logic for file prioritization
interface FileAuditScore {
  path: string;
  score: number;
}

function calculateFilePriority(file: FileMetadata): FileAuditScore {
  // FileMetadata would include:
  // - file.churn (e.g., commit count in recent history)
  // - file.sizeNormalized (0-1, relative to max file size)
  // - file.importCountNormalized (0-1, relative to max imports)
  // - file.stalenessDaysNormalized (0-1, inverse of last modified date)

  const churnWeight = 0.35;
  const sizeWeight = 0.25;
  const importsWeight = 0.20;
  const stalenessWeight = 0.20;

  const score = (file.churn * churnWeight) +
                (file.sizeNormalized * sizeWeight) +
                (file.importCountNormalized * importsWeight) +
                (file.stalenessDaysNormalized * stalenessWeight);

  return { path: file.path, score };
}

Layered Analysis & Two-Tier Workflows

Once files are prioritized, how do we analyze them? We adopted a layered approach to optimize resource usage:

Summary Pass (Tier 1): This is a batched, directory-level analysis. It performs quick, high-level checks (e.g., linting, basic static analysis, identifying suspicious patterns) on a broader set of files. The goal is to quickly flag potential areas of concern.
Deep-Dive Analysis (Tier 2): Only triggered for files flagged by Tier 1, or for the highest-priority files. This is a per-file, in-depth analysis. Crucially, this tier leverages Axiom RAG (Retrieval Augmented Generation) chunking. This means we intelligently break down the file content into relevant chunks, retrieve additional context (e.g., related documentation, previous audit findings), and feed this highly focused information to our LLM-powered analysis engine. This helps overcome token limits and significantly reduces hallucinations, leading to more accurate and actionable insights.

Integrating with Ipcha's Existing System

To make this a seamless extension, we integrated the new functionality directly into Ipcha's existing target system:

New repo Target Type: A dedicated target type for specifying a repository to audit.
Flexible source Field: The source field for repo targets will now store a JSON configuration. This allows us to specify details like the GitHub organization/repo, specific file globs to include/exclude, and other audit parameters.

json

// Example 'source' configuration for a 'repo' audit target
{
  "type": "github",
  "owner": "our-org",
  "repo": "critical-api-service",
  "includeGlobs": ["src/**/*.ts", "config/*.yaml", "package.json"],
  "excludeGlobs": ["src/__tests__/**", "node_modules/**"],
  "auditProfile": "strict" // Could define different audit profiles
}

Shared Schedule Rotation: repo targets will join the existing targetsPerRun rotation, ensuring they are regularly audited alongside other Ipcha targets without requiring a separate scheduling mechanism.

Lessons Learned from a Smooth Design Session

Interestingly, our "Pain Log" for this session was empty. This isn't a sign of naivety, but rather a testament to the value of structured brainstorming and a clear problem definition upfront. By dedicating significant time to answering the six core design questions and documenting key decisions in docs/plans/2026-03-09-repo-audit-design.md (commit a350ad8), we managed to pre-empt many potential architectural roadblocks. A smooth design phase often translates to a smoother, faster implementation.

What's Next: From Blueprint to Code

With the design locked in, the real fun begins: implementation. Our immediate next steps involve:

Writing the Implementation Plan: Leveraging our superpowers:writing-plans skill to detail the execution strategy.
Subagent-Driven Development: We'll be using our subagent system to drive the coding process, breaking down the plan into manageable tasks.
New Files: Expect to see repo-audit-service.ts for core logic, file-prioritizer.ts for our smart scoring, and accompanying tests.
Modifications: Updates to audit-service.ts, the main audit.ts router, workflow-engine.ts to handle the new tiers, schema changes, and the Ipcha UI page.
Schema Evolution: The AuditRun model will gain parentRunId, tier, and filePath to accurately track the layered audit process.
Operational Setup: We still need to configure the hourly cron job for audit triggers and set the AUDIT_CRON_SECRET on our production server – a critical pending task from the previous session.

We're incredibly excited to bring this new capability to Ipcha. Repository-level auditing isn't just about finding bugs; it's about fostering a culture of continuous quality and gaining deeper insights into the health of our entire codebase. Stay tuned for updates as we move into the development phase!

json

{"thingsDone":["Brainstormed repo audit feature","Answered 6 design questions","Wrote and committed design doc (a350ad8)","Decided on local fs for nyxCore, GitHub API for linked repos","Defined hybrid file selection with user globs and smart prioritization (churn, size, imports, staleness)","Established layered analysis with summary pass and deep-dive on flagged files","Designed two-tier workflows: batched by directory (Tier 1) and per-file with Axiom RAG chunking (Tier 2)","Extended existing target system with new 'repo' target type and JSON config 'source' field","Integrated repo targets into existing schedule rotation","Approved design for implementation"],
"pains":["No major issues encountered during design phase; design was smooth and efficient"],
"successes":["Successfully completed and approved a comprehensive design for a complex new system feature","Achieved consensus on key architectural decisions","Designed a smart prioritization algorithm for efficient resource use","Integrated advanced techniques like Axiom RAG chunking for deep analysis"],
"techStack":["TypeScript","Node.js","GitHub API","Axiom RAG (LLM integration)","Cron (for scheduling)","PostgreSQL (for schema changes)","Ipcha (internal self-testing system)"]}