Automating Compliance: From Raw Data to GitHub PRs (and the Lessons We Learned)
Dive into how we built an automated compliance report export, generating structured Markdown and pushing it directly to GitHub PRs, tackling tricky TypeScript and testing challenges along the way.
Compliance reports. The words alone often conjure images of endless spreadsheets, manual copy-pasting, and the tedious process of turning raw data into an auditable document. For developers, it often means writing scripts, generating files, and then—the ultimate manual step—creating a pull request to get that report into source control.
That's precisely the challenge we set out to tackle in our recent development sprint: automating the generation and submission of compliance reports for our internal analysis workflows. The goal was ambitious: take the structured output of a completed workflow, format it into a human-readable Markdown report, and then offer a seamless way to either download it or, even better, push it directly to a GitHub repository as a new branch and pull request.
This wasn't just about saving clicks; it was about ensuring consistency, auditability, and freeing up valuable time for more strategic work. And like any interesting feature, it came with its own set of technical puzzles and "aha!" moments.
Building the Engine: The Compliance Report Formatter
The heart of this feature is the ComplianceReportFormatter. We chose Markdown as our output format for its simplicity, readability, and version control friendliness. The formatter's job was to take the complex, nested data from our compliance workflow steps and transform it into a well-structured document.
This involved several key sections:
- Executive Metrics Table: A high-level overview of key performance indicators from the workflow.
- Numbered Compliance Sections: Detailed breakdowns, dynamically generated from the workflow's outputs.
- Quality Gates: Analysis of pass/fail criteria.
- Hallucination & Consistency Analysis: Crucial sections for AI-driven workflows, detailing potential inconsistencies or fabricated information.
- Review Key Points: Summarized findings.
Here’s a conceptual peek at what the formatter produces:
# Compliance Report for Workflow: "Q4 Financial Audit"
## Executive Summary
| Metric | Value |
|---|---|
| Total Steps Executed | 12 |
| Total Duration | 1h 45m |
| Critical Findings | 2 |
| Quality Gate Status | ✅ Passed |
## 1. Section: Data Ingestion & Validation
### Step: "Load Financial Records"
* **Status:** Completed
* **Output:** Successfully loaded 8,000 records from ERP.
* **Key Finding:** All records passed initial schema validation.
## 2. Section: Hallucination Analysis
### Step: "AI-Powered Discrepancy Detection"
* **Analysis:** No significant hallucination detected in generated summaries.
* **Confidence Score:** 0.98
Connecting the Dots: The exportComplianceReport Mutation
Once we had our formatter, the next step was to expose this functionality via our API. We used tRPC, which provides a fantastic developer experience by offering end-to-end type safety. We added an exportComplianceReport mutation to our workflows router.
This mutation had to support two distinct modes:
- Download-only: Simply return the Markdown content for direct download.
- GitHub PR Creation: This was the more complex part. It involved:
- Fetching the report content.
- Interacting with the GitHub API to check for an existing branch, create one if needed, commit the file, and then create a pull request.
- Crucially, this entire flow needed to be idempotent. If a user tried to create a PR twice, it should update the existing one rather than creating duplicates. This meant handling
422 Unprocessable Entityerrors from GitHub when a branch or PR already existed.
Here's a simplified view of the tRPC mutation signature:
// src/server/trpc/routers/workflows.ts
import { z } from 'zod'; // For input validation
exportComplianceReport: publicProcedure
.input(z.object({
workflowId: z.string(),
mode: z.enum(['download', 'github-pr']),
githubRepoOwner: z.string().optional(),
githubRepoName: z.string().optional(),
// ... other optional fields for PR title, branch name, etc.
}))
.mutation(async ({ input, ctx }) => {
const reportContent = await complianceReportFormatter.format(input.workflowId);
if (input.mode === 'download') {
return { type: 'download', content: reportContent };
} else if (input.mode === 'github-pr') {
// Logic for creating/updating branch, file, and PR on GitHub
// Handles idempotency: check if branch exists, get file SHA for updates
return { type: 'github-pr', prUrl: 'https://github.com/...' };
}
}),
Bringing it to Life: The ComplianceExportPanel
On the frontend, we built a self-contained React component, ComplianceExportPanel. This component provides an expandable card in the workflow details page, appearing only for completed compliance-focused workflows. It features:
- A clear "Export Report" button.
- An optional checkbox to "Create GitHub Pull Request," which reveals additional input fields for repository details.
- Loading, success, and error states, providing immediate feedback to the user.
Integrating this panel into our dashboard was straightforward, ensuring it only rendered when relevant, keeping our UI clean and contextual.
Ensuring Quality: Comprehensive Testing & Refinements
No feature is complete without robust testing. We added 12 dedicated unit tests for the ComplianceReportFormatter, covering every section, edge cases like digest fallbacks, and even ensuring Markdown table cell escaping worked correctly (a critical detail to prevent rendering issues).
Code review also played a vital role in refining the feature. We identified and fixed several issues, including:
- Improving Markdown table cell escaping to handle special characters.
- Strengthening the idempotent GitHub flow by properly handling
422errors when a branch already exists and fetching the file's SHA for updates. - Wrapping
TRPCErrorinstances to ensure safe and user-friendly error messages were exposed to the client. - Refining runtime validation for Prisma
Jsonfields and removing potentially unsafe non-null assertions.
The Bumpy Road: Lessons Learned from the Trenches
Even with careful planning, development always throws a few curveballs. These "pain points" often become the most valuable "lessons learned."
Lesson 1: Taming Prisma's JsonValue for Runtime Type Safety
When working with Prisma's Json fields, you often store arbitrary data. In our case, a docRefs field was a JsonArray containing objects with owner and url properties. TypeScript initially types this as JsonValue, which is a union of JsonObject | JsonArray | string | number | boolean | null.
Our initial attempt to validate elements within this array looked something like this:
// Initial, problematic approach (simplified)
const docRefs = workflow.data.docRefs as Prisma.JsonArray; // Cast for array type
if (Array.isArray(docRefs)) {
docRefs.forEach(item => {
// TS2339: Property 'owner' does not exist on type 'JsonObject | JsonArray'.
// Even with a cast to { owner: string }, TypeScript complains because
// `JsonObject` doesn't guarantee 'owner' exists.
const owner = (item as { owner: string }).owner;
console.log(owner);
});
}
The issue is that JsonObject (which item could be) doesn't guarantee the presence of an owner property. Directly casting item as { owner: string } doesn't satisfy TypeScript's strictness here, as it knows item could be any JsonObject, not necessarily one with owner.
The robust workaround involved a two-step approach: first, casting the element to a generic Record<string, unknown>, and then using typeof checks within a type guard predicate.
// The robust solution for runtime validation of Prisma Json fields
if (Array.isArray(docRefs)) {
const validatedDocRefs = docRefs.filter((item): item is { owner: string; url: string } => {
// 1. Cast to a generic object to safely access properties at runtime
const obj = item as Record<string, unknown>;
// 2. Perform runtime type checks
return typeof obj === 'object' && obj !== null &&
typeof obj.owner === 'string' &&
typeof obj.url === 'string';
});
// Now, `validatedDocRefs` is correctly typed as Array<{ owner: string; url: string }>
validatedDocRefs.forEach(ref => {
console.log(ref.owner, ref.url); // Type-safe access!
});
}
Takeaway: When runtime-validating dynamic JSON structures from Prisma, always use an intermediate Record<string, unknown> cast combined with typeof checks in a type guard. Avoid direct as casts that don't align with the actual runtime possibilities.
Lesson 2: Locale-Agnostic Testing for toLocaleString() Output
Our compliance report included numbers formatted with toLocaleString(), like "8,000" for a count of records. A unit test asserted the presence of this formatted number:
// The brittle assertion
expect(reportContent).toContain("8,000"); // This looks fine, right?
This test passed locally on my machine (US locale). However, when run in a CI environment or by a colleague with a different default locale (e.g., German), it failed! Why? Because toLocaleString() outputs 8.000 in many European locales for the number eight thousand.
// The robust, locale-agnostic assertion
expect(reportContent).toMatch(/8[,.]000/);
Takeaway: Never rely on exact string matching for toLocaleString() output in tests. Use regular expressions to account for locale-specific variations in number formatting (e.g., comma vs. period for thousands separators). Better yet, if possible, test the raw number before formatting, or mock the toLocaleString behavior if the formatted string itself is crucial.
What's Next on Our Journey?
With the core feature implemented and tested, our immediate next steps involve:
- Pushing to origin: Getting these changes out of my local branch and into our shared codebase.
- End-to-End Testing: Running actual compliance workflows, verifying report downloads, and thoroughly testing the GitHub PR creation flow.
- Audit Logging: Considering adding audit logs for PR creation actions, a valuable suggestion from code review.
- Utility Consolidation: Looking into creating a shared
formatDurationutility to avoid duplication across services.
This sprint was a fantastic blend of building a high-impact feature and deepening our understanding of TypeScript's type system and robust testing practices. Automating compliance reports not only streamlines a critical process but also paves the way for even more sophisticated workflow management in the future.
{"thingsDone":["Implemented compliance report formatter (Markdown)","Added tRPC mutation for report export (download & GitHub PR)","Created a React UI component for export functionality","Integrated UI into dashboard for compliance workflows","Wrote comprehensive unit tests for the formatter","Fixed code review issues: Markdown escaping, idempotent GitHub flow, TRPCError wrapping, Prisma Json validation"],"pains":["Runtime validation of Prisma JsonValue array elements (TS2339)","Locale-specific toLocaleString() output breaking tests"],"successes":["Achieved feature completeness for automated compliance report export","Successfully implemented idempotent GitHub PR creation flow","Developed robust type-safe validation for dynamic JSON data","Created locale-agnostic testing strategies for number formatting"],"techStack":["TypeScript","Prisma","tRPC","React","Next.js","GitHub API","Jest"]}