Unlocking Code Intelligence: Our Journey to a Full CKB Integration
Dive into a recent development sprint where we fully integrated our Code Knowledge Backend (CKB), bringing powerful code insights and AI-driven capabilities to our platform. Learn about the technical challenges we tackled and the exciting features we rolled out.
The pace of modern software development demands more than just writing code; it requires understanding it deeply, anticipating issues, and leveraging intelligence to build better products faster. That's where our Code Knowledge Backend (CKB) comes in. Think of CKB as the brain that processes your codebase, extracts crucial insights, and makes them available to both developers and our AI-powered workflows.
Recently, we embarked on an ambitious sprint: achieving a full CKB integration. Our goal was clear – from containerizing the CKB engine to exposing its insights via API, storing its data, and presenting it beautifully in our UI, all while making it accessible to our AI. I'm thrilled to share that we've successfully deployed Phase 1 and Phase 2 of this integration to production, laying a robust foundation for truly intelligent development.
The Vision: What is CKB and Why Does It Matter?
At its core, CKB is designed to be a powerful analytical engine. It ingests your project's source code and performs various analyses: identifying architectural patterns, pinpointing hotspots, auditing for security vulnerabilities, detecting dead code, analyzing coupling, measuring complexity, and even attributing code ownership.
Why is this critical?
- Empowering Developers: Provides immediate, data-driven insights to help developers understand complex codebases, make informed decisions, and improve code quality.
- Supercharging AI: Feeds our AI workflows with rich, contextual code knowledge, allowing it to generate more accurate, relevant, and intelligent responses – whether it's for code reviews, refactoring suggestions, or prompt resolutions.
This recent sprint was all about making these capabilities a tangible reality within our platform.
Phase 1: Building the Foundation – The Core Integration
Our journey began by laying the bedrock for CKB. This phase was all about connecting the raw analytical power of CKB to our application's backend, data layer, and AI engine.
Containerizing CKB: A Dedicated Worker
To ensure CKB operates reliably and scalably, we containerized it. A dedicated CKB worker container was defined within our docker-compose.yml and docker-compose.production.yml files. It includes a sleep infinity entrypoint to keep it alive, a shared ckb_repos volume for storing cloned repositories, and a healthcheck using ckb version to ensure it's always ready for action. This setup allows us to manage CKB as an independent, robust service.
Data Persistence: The ProjectCkbIndex Model
For CKB to be truly useful, its analysis results need to be stored and accessible. We introduced the ProjectCkbIndex model in our prisma/schema.prisma. This model stores a JSON blob of analysis data (analysisCache), tracks the indexing status (e.g., PENDING, COMPLETED, FAILED), and is uniquely linked to a projectId, ensuring tenant isolation and data integrity.
The CKB Bridge: Our Service Layer
The heart of the backend integration lives in src/server/services/ckb-client.ts. This service acts as our primary interface with the CKB container. It skillfully wraps docker exec commands using child_process.execFile to execute CKB operations directly within the running container. It handles crucial tasks like repository management (cloning, pulling, deleting) and orchestrates 13 distinct analysis functions, culminating in a sequential runFullAnalysis() method. To ensure its reliability, we backed this client with 10 unit tests, meticulously checking command building, JSON parsing, and path traversal rejection.
API Exposure: tRPC Procedures
To make CKB's capabilities accessible to our frontend and other services, we exposed its functionality via a ckbRouter in our tRPC API. This router offers 13 procedures, allowing us to query CKB's status, trigger re-indexing, and retrieve various analysis results (architecture, hotspots, audit, dead code from cache; coupling, complexity, call graph, search, impact, ownership, PR summary live).
AI Integration: The {{ckb}} Template Variable
One of the most exciting aspects is how CKB integrates with our AI workflows. We wired the {{ckb}} template variable into our workflow-engine.ts. This means our AI can now dynamically pull rich, contextual code insights directly into its prompts. To optimize performance and resource usage, loadCkbContent() (in ckb-content-loader.ts) leverages Redis caching with a 1-hour TTL and an 8K character limit, ensuring quick access to CKB summaries. Seven dedicated tests ensure proper formatting, truncation, and security (e.g., preventing file path leaks in audit sections).
Automation: Auto-Indexing on Project Changes
To keep CKB data fresh, we implemented fire-and-forget CKB indexing within src/server/trpc/routers/projects.ts. Now, whenever a project is created or updated with githubOwner and githubRepo defined, a CKB analysis is automatically triggered in the background.
Security and Reliability
Data security is paramount. We implemented Row-Level Security (RLS) with the tenant_isolation_project_ckb_indexes policy in prisma/rls.sql to ensure that tenants can only access their own CKB data. The project_ckb_indexes table was also created directly in production with RLS applied, ready for action.
Phase 2: Visualizing Insights – The Code Intelligence Page
With the backend infrastructure in place, Phase 2 brought CKB's power to life for our users through the new Code Intelligence tab. This comprehensive UI, implemented in src/components/projects/code-intelligence-tab.tsx (a hefty 519 lines!), provides a rich visual experience.
We added a SidebarTab for "Code Intelligence" within the Development group in src/app/(dashboard)/dashboard/projects/[id]/page.tsx, making it easily discoverable.
Key Insights at a Glance
The Code Intelligence tab offers immediate value through several overview cards:
- Architecture: Visualizes module and layer counts, providing a high-level structural overview.
- Hotspots: Identifies the top 10 riskiest areas in the codebase, color-coded for quick identification.
- Security Audit: Breaks down findings by severity and provides detailed insights into potential vulnerabilities.
- Dead Code: Lists symbols identified as unused, helping to streamline the codebase.
Deep Dives into Code Quality
Beyond the overview, users can explore detailed sections:
- Coupling Analysis: Search for a file and see its co-change partners, revealing hidden dependencies.
- File Complexity: Get cyclomatic and cognitive complexity metrics per function, aiding in refactoring efforts.
- Ownership: Understand author percentages for specific files or modules, useful for team coordination.
User Experience: Graceful Degradation and Control
We prioritized a smooth user experience. The UI gracefully handles various states:
- "CKB not configured" if the CKB container isn't running.
- "Link a GitHub repository" if the project isn't connected.
- A processing spinner during analysis.
- Clear error messages if something goes wrong.
Users also have control with a "Re-analyze" button, triggering the
ckb.reindexmutation to refresh data on demand.
Lessons Learned & Challenges Overcome
No complex integration is without its hurdles. Here are some of the key "pains" that became valuable lessons:
1. Type Safety vs. Pragmatism with Prisma Client
Challenge: Initially, we tried using import("@prisma/client").PrismaClient as an inline type for helper functions in our CKB tRPC router to ensure strict type safety.
Lesson: This pattern proved overly verbose and fragile, drawing flags from reviewers. We learned that sometimes, pragmatism wins.
Workaround: We opted for any with an eslint-disable for the Prisma parameter in helper functions. Thanks to tRPC's context inference, type safety is still maintained at the call sites, providing a good balance.
2. Cross-Environment Regex Compatibility
Challenge: Using the /s regex flag (dotAll) in scripts/harden-persona-prompts.ts caused production builds to fail with TS1501: This regular expression flag is only available when targeting 'es2018' or later. Our build environment was targeting an earlier ES version.
Lesson: Always be mindful of language feature compatibility across different build targets and environments.
Workaround: We replaced /pattern/s with /pattern[\\s\\S]*/, which achieves the same dotAll behavior without relying on the specific flag, ensuring ES2017 compatibility.
3. Vitest Mocking Quirks with vi.hoisted()
Challenge: When trying to mock child_process.execFile using vi.fn() in our tests, Vitest's hoisting mechanism caused issues, as vi.mock() factories are hoisted above variable declarations.
Lesson: Vitest's module mocking requires careful handling of variable scope and hoisting.
Workaround: We learned to use vi.hoisted() to explicitly declare the mock function, ensuring it's hoisted correctly and available where needed in our tests.
4. TypeScript Iteration Flags and for...of
Challenge: Using for...of loops on Array.entries() in our content loader led to issues because our tsconfig wasn't configured with --downlevelIteration.
Lesson: Be aware of TypeScript compiler options that affect how modern JavaScript features are transpiled, especially when targeting older environments.
Workaround: We switched to the more universally compatible forEach((item, index) => ...) pattern, avoiding the need for specific compiler flags.
What's Next: Looking Ahead to Phase 3
While Phase 1 and 2 are deployed, our CKB journey continues. Here's what's immediately on the horizon:
- Activate CKB on Production: The CKB container is defined but needs to be manually started on our production server:
docker compose -f docker-compose.production.yml up -d ckb. - Phase 3: Webhook Auto-Reindex: Implement a
POST /api/v1/webhooks/ckbendpoint to automatically trigger CKB re-indexing on GitHubpushevents. - Webhook Setup UI: Build out the UI on the Code Intelligence page for generating and configuring webhook secrets.
- Phase 3: PR Summaries: Extend CKB to provide insightful summaries on
pull_requestevents, further integrating AI into our code review process. - CKB Image Verification: Ensure the
ghcr.io/simplyliz/ckb:latestimage is robust and functional. - End-to-End Testing: A critical step to link a repository, trigger a re-index, and verify that all analyses appear correctly in the UI.
Beyond CKB, we're also working on extracting attack scenarios from workflow 8e7356b7 and keeping our Anthropic API credits topped up!
Conclusion
This sprint has been a monumental step forward in our mission to bring advanced code intelligence and AI capabilities to our platform. By fully integrating our Code Knowledge Backend, we've not only enhanced our ability to understand and analyze code but also significantly empowered our AI to deliver more insightful and actionable assistance. The journey to truly intelligent development is long, but with milestones like this, we're building a powerful future, one line of code – and one analysis – at a time.