Unlocking Code Intelligence: A Deep Dive into Our New CKB Integration
I just wrapped up a monumental sprint, integrating our new Code Knowledge Backend (CKB) from core infrastructure to a shiny new UI. Here's a look at the journey, the tech stack, and the invaluable lessons learned along the way.
It’s just past midnight, and I've finally hit a major milestone: full integration of our new Code Knowledge Backend (CKB). This wasn't just another feature; it was a multi-phase, full-stack endeavor to bake deep code intelligence directly into our platform. We’re talking everything from Docker containers and Prisma models to a snazzy new UI and even a custom template variable for AI prompts.
Let's break down how we got here, the technical decisions we made, and some of the thorny problems we navigated.
Phase 1: Laying the Foundation – The Core CKB Integration
The CKB is designed to be an external, heavy-lifting analysis tool. Our primary goal in Phase 1 was to integrate it seamlessly into our existing backend architecture.
Dockerizing the Brains
First up, getting the CKB itself running. We opted for a dedicated Docker worker container (ghcr.io/simplyliz/ckb:latest). This keeps the CKB isolated and scalable.
# Excerpt from docker-compose.yml
ckb:
image: ghcr.io/simplyliz/ckb:latest
command: ["sleep", "infinity"] # Keep it alive, we'll `docker exec` into it
volumes:
- ckb_repos:/app/repos # Shared volume for codebases
healthcheck:
test: ["CMD", "ckb", "version"]
interval: 30s
timeout: 10s
retries: 3
The sleep infinity command is a classic pattern for worker containers we intend to interact with via docker exec. This way, the container stays up, but doesn't consume CPU until we explicitly tell it to do something. A shared volume (ckb_repos) was crucial for the CKB to store and access cloned repositories. And, of course, a healthcheck ensures we know the CKB binary is actually available.
Data Modeling with Prisma
To track the analysis status and cache results for each project, we introduced a new ProjectCkbIndex model in Prisma. This model links directly to our Project and Tenant entities, ensuring proper data relationships and enabling our Row-Level Security (RLS) policies.
// prisma/schema.prisma
model ProjectCkbIndex {
id String @id @default(uuid())
projectId String @unique
project Project @relation(fields: [projectId], references: [id], onDelete: Cascade)
status CkbStatus @default(PENDING)
analysisCache Json?
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
tenantId String
tenant Tenant @relation(fields: [tenantId], references: [id])
@@index([tenantId])
}
enum CkbStatus {
PENDING
PROCESSING
COMPLETED
FAILED
}
The analysisCache field, a Json? type, is where we store the aggregated results of various CKB analyses. This allows us to serve cached data quickly without re-running computations on every request.
The CKB Client Service
This was the heart of the integration: src/server/services/ckb-client.ts. This service acts as our internal wrapper, orchestrating docker exec commands to interact with the CKB container. It handles repository cloning, pulling, deletion, and exposes 13 distinct analysis functions (e.g., architecture, hotspots, coupling, complexity). A central runFullAnalysis() method orchestrates a sequential execution of these.
// Excerpt from src/server/services/ckb-client.ts
// A simplified example of how we shell out to Docker
async function runCkbCommand(
projectId: string,
command: string,
args: string[]
): Promise<string> {
const containerName = process.env.CKB_CONTAINER_NAME || 'nyxcore-ckb-1';
// Basic path validation to prevent traversal attacks
validateProjectPath(projectId);
const { stdout, stderr } = await execFile(
'docker',
['exec', containerName, 'ckb', command, ...args],
{ timeout: CKB_COMMAND_TIMEOUT }
);
if (stderr) {
logger.warn(`CKB command stderr for project ${projectId}: ${stderr}`);
}
return stdout;
}
Security was paramount here, especially with docker exec and arbitrary project IDs. Robust path validation (validateProjectPath) was implemented to prevent any potential path traversal vulnerabilities.
A Robust tRPC API
To expose CKB functionality to our frontend, we built a comprehensive tRPC router (src/server/trpc/routers/ckb.ts). This includes procedures for checking status, triggering re-indexing, and fetching cached analysis results. Critically, some analyses like coupling or callGraph are run live by the CKB client for real-time insights based on user input.
Intelligent Content Loading with {{ckb}}
One of the cooler features is integrating CKB insights directly into our AI prompt engine. We introduced a {{ckb}} template variable that the workflow engine resolves. This allows users to dynamically inject relevant code insights (like security audit findings or hotspots) into their AI prompts, providing crucial context.
This content is loaded via src/server/services/ckb-content-loader.ts, which includes Redis caching (1-hour TTL, 8K char max) for performance and invalidateCkbCache() for freshness. We also added specific formatting and truncation logic, with a keen eye on security to ensure sensitive data (like full file paths in audit findings) isn't accidentally leaked.
Auto-Indexing & RLS
Finally, we wired up automatic CKB indexing to project creation/updates when a GitHub repository is linked. This "fire-and-forget" operation ensures that projects are analyzed without manual intervention. On the security front, Row-Level Security (tenant_isolation_project_ckb_indexes policy) was applied to the new project_ckb_indexes table, ensuring strict multi-tenant data isolation.
Phase 2: Bringing it to Life – The Code Intelligence Page
With the backend humming, Phase 2 was all about making these insights accessible and actionable for our users. This culminated in the new Code Intelligence tab (src/components/projects/code-intelligence-tab.tsx).
This 500+ line component is a powerhouse, offering a holistic view of a project's codebase:
- Overview Cards: Quick summaries of Architecture (module/layer count), Hotspots (top 10 by risk, color-coded), Security Audit (severity breakdown + finding details), and Dead Code (list of unused symbols).
- Detail Sections: Deeper dives into specific aspects:
- Coupling Analysis: Search for a file and see its co-change partners.
- File Complexity: Cyclomatic and cognitive complexity metrics per function.
- Ownership: Author percentages for files and modules.
- Robust UX: We built in graceful degradation for various states: "CKB not configured," "Link a GitHub repository," a processing spinner during analysis, and clear error displays. A "Re-analyze" button triggers the
ckb.reindexmutation, giving users control.
Adding this tab to our dashboard/projects/[id]/page.tsx was the final touch, making Code Intelligence a first-class citizen in our project views.
Navigating the Treacherous Waters: Lessons Learned
No complex integration goes off without a hitch. Here are a few "gotchas" and the solutions we implemented:
1. Prisma Client Type Inference in Helper Functions
- Problem: When creating helper functions for our tRPC router, I initially tried to use
import("@prisma/client").PrismaClientas an inline type for the Prisma parameter. This quickly became verbose and fragile, especially with nested types. - Solution: After feedback, we opted for
anywith aneslint-disablecomment for the Prisma parameter in helper functions. This might sound counter-intuitive, but tRPC's context inference ensures type safety at the call sites, making the helper's internal type less critical for overall system safety and significantly reducing boilerplate. - Takeaway: Sometimes, pragmatic workarounds are necessary when type inference is strong at the boundaries, especially if a more robust type utility isn't immediately feasible.
2. The Elusive /s Regex Flag in Production
- Problem: Our
harden-persona-prompts.tsscript used the/sregex flag (dotAll) for pattern matching. This worked fine locally, but production builds failed withTS1501: This regular expression flag is only available when targeting 'es2018' or later. Ourtsconfigwas set toes2017. - Solution: Replaced
/pattern/swith/pattern[\\s\\S]*/. This[\\s\\S]character class is a well-known equivalent for dotAll behavior that works across older JavaScript environments. - Takeaway: Always be mindful of your target ES version in
tsconfig.jsonand verify language feature compatibility, especially for newer syntax like regex flags.
3. Vitest Mock Hoisting with child_process.execFile
- Problem: When trying to mock
child_process.execFilein ourckb-client.test.tsusingvi.fn(), Vitest'svi.mock()factories were hoisting above variable declarations, leading to "variable not defined" errors. - Solution: We used
vi.hoisted()to explicitly declare the mock function, ensuring it's available before thevi.mockcall. - Takeaway: Understanding your test runner's mocking and hoisting mechanisms is crucial.
vi.hoisted()is your friend for intricate mocking scenarios in Vitest.
4. for...of and --downlevelIteration
- Problem: Using
for...ofonArray.entries()in the content loader caused issues because ourtsconfigdidn't enable--downlevelIteration. - Solution: Switched to the more traditional
forEach((item, index) => ...)loop. - Takeaway: TypeScript's
--downlevelIterationcompiler option is important forfor...ofloops targeting older ES versions. If it's not enabled, stick toforEachor ensure your target is modern enough.
Where We Stand & What's Next
The good news is that Phase 1 and Phase 2 are fully deployed to production! The project_ckb_indexes table exists, RLS is active, and our 17 new CKB-related tests are all passing.
Immediate next steps:
- Activate CKB Container: The CKB container is defined but needs to be started on production (
docker compose -f docker-compose.production.yml up -d ckb). - Phase 3: Webhook Auto-Reindex: This is the next big push. We'll implement a webhook endpoint (
POST /api/v1/webhooks/ckb) for GitHub push events to trigger automatic re-indexing, keeping analyses fresh. - Webhook UI & Secrets: A UI for generating and setting up webhook secrets will be needed.
- PR Summaries: Leveraging the CKB for automatic PR summaries on
pull_requestevents. - CKB Image Verification: A quick check to ensure
ghcr.io/simplyliz/ckb:latestis robust and working as expected. - End-to-End Test: The ultimate validation: link a repo, trigger a reindex, and verify the UI updates correctly.
This integration marks a huge leap forward in our platform's ability to provide deep, actionable insights into codebases. It was a challenging but incredibly rewarding journey, and I'm excited to see the impact it has on our users.
{"thingsDone":[
"Full CKB (Code Knowledge Backend) integration deployed to production (Phase 1 & 2)",
"Docker CKB worker container setup with shared volume and healthcheck",
"Prisma `ProjectCkbIndex` model with `@unique`, `Json?` cache, and `CkbStatus` enum",
"Service layer (`ckb-client.ts`) for `docker exec` interaction, repo management, and 13 analysis functions",
"Comprehensive tRPC API (`ckb.ts`) for all CKB procedures",
"Template variable `{{ckb}}` wired into workflow engine with Redis caching and secure formatting",
"Automatic CKB indexing on project create/update for GitHub-linked repos",
"Row-Level Security (RLS) applied to `project_ckb_indexes` table",
"Full Code Intelligence UI page (`code-intelligence-tab.tsx`) with overview cards and detailed analysis sections",
"Graceful degradation and robust UX for CKB states in the UI",
"Critical bug fixes including regex `/s` flag compatibility and Vitest mocking"
],"pains":[
"Fragile inline Prisma client type imports in helper functions",
"Production build failure due to `/s` regex flag compatibility with `es2017` target",
"Vitest `vi.fn()` hoisting issues when mocking `child_process.execFile`",
"`for...of` iteration issues due to missing `--downlevelIteration` in `tsconfig`"
],"successes":[
"Successfully integrated a complex external service (CKB) into a full-stack application",
"Developed a robust and secure `docker exec`-based client service",
"Implemented a dynamic templating system (`{{ckb}}`) for AI context injection",
"Created a rich, interactive Code Intelligence UI with multiple analysis views",
"Applied best practices like RLS, caching, and graceful degradation",
"Successfully debugged and resolved critical build and testing issues, leading to actionable lessons"
],"techStack":[
"Docker",
"Docker Compose",
"Prisma",
"PostgreSQL",
"TypeScript",
"Next.js",
"tRPC",
"React",
"Redis",
"Vitest",
"Node.js",
"child_process.execFile",
"GitHub API (implicit for repo linking)"
]}