Unlocking Code Intelligence: Our Journey Integrating CKB from Scratch to Production
A detailed account of our journey integrating CKB for code intelligence, covering core backend services, a dynamic UI, and crucial lessons learned from overcoming deployment and data challenges on the path to production.
Today marks a significant milestone: CKB, our new Code Intelligence Engine, is fully operational on production! This wasn't just a simple library integration; it was an end-to-end journey, from spinning up Docker containers to building a dynamic UI, all while navigating a maze of technical challenges.
Our goal was ambitious: integrate CKB to provide deep code insights – architecture, hotspots, security audits, and dead code detection – directly within our platform. This meant not just running the analysis, but making it accessible, understandable, and actionable for our users. We tackled this in two main phases, both now proudly deployed.
Phase 1: The Core Integration – Laying the Foundation
The first phase was all about connecting the dots, establishing the backbone for CKB's operations. This involved a series of meticulous steps to ensure CKB could live and breathe within our existing infrastructure.
- Dockerization: We containerized CKB, building
ghcr.io/simplyliz/ckb:latestfrom its source, ensuring it was a self-contained, deployable unit on our production server. - Database Schema:
ProjectCkbIndexmodel was added to Prisma, complete with Row-Level Security (RLS) and back-relations, providing a robust storage layer for CKB's output. - Service Layer: Our
ckb-client.tsservice became the central hub, adocker execwrapper exposing 13 distinct analysis functions to interact with the CKB container. - API Exposure: Thirteen tRPC procedures in
ckb.tsprovided the API endpoints, orchestrating asynchronous, fire-and-forget indexing processes. - Workflow Integration: We wired the
{{ckb}}template into our workflow engine viackb-content-loader.ts, allowing CKB insights to be dynamically injected into reports and summaries. - Automated Indexing: A critical UX improvement: CKB analysis automatically kicks off whenever a new repository link is added or updated in
projects.ts. - Testing: With 17 passing tests (10 client, 7 content loader), we had a solid foundation of confidence in our backend.
Phase 2: Bringing Code Intelligence to Life – The UI
With the backend humming, Phase 2 focused on delivering these powerful insights to our users through an intuitive interface.
- Dedicated Code Intelligence Tab:
code-intelligence-tab.tsxbecame the home for all CKB data, featuring engaging overview cards and detailed sections for deep dives. - Sidebar Integration: A new "Code Intel" tab was added to the Development group in the project sidebar, making it easily discoverable.
- Dynamic Progress Bar: To keep users informed during analysis, we implemented a 5-step progress bar (Clone → Architecture → Hotspots → Audit → Dead Code), providing real-time feedback.
- Real-time Polling: A
useEffectanduseStatedriven polling mechanism refreshes data every 2 seconds during processing, ensuring users always see the latest analysis status.
Navigating the Treacherous Waters: Lessons Learned from the Trenches
No integration of this scale comes without its share of head-scratching moments. Our "Pain Log" quickly transformed into a "Lessons Learned" diary, highlighting critical insights gained during testing and deployment.
1. The Elusive CLI Command & Workdir
- The Challenge: Initially, we tried
docker exec nyxcore-ckb-1 ckb architecture --repo /data/repos/xxx --format json. This failed with "unknown command 'architecture'" and an unrecognized--repoflag. - The Lesson: External tools often have subtle CLI syntax differences. Always double-check the exact command names (
architecturevs.arch) and how they expect repository paths (via--repoflag vs. setting theWORKDIRfordocker exec). - The Fix: We switched to
docker exec -w /data/repos/xxx nyxcore-ckb-1 ckb arch --format json, setting the working directory directly in thedocker execcommand.
2. Docker-in-Docker (ish) Permissions
- The Challenge: Our main application container needed to run
docker execagainst the CKB container. This led to "docker: executable not found in $PATH" and then "permission denied" errors on/var/run/docker.sock. - The Lesson: When your application container needs to interact with the Docker daemon, it requires both the
docker-cliinstalled within the app container and proper permissions to access the Docker socket. Simply mounting the socket isn't enough; the user running the app inside the container needs to be part of thedockergroup. - The Fix: We added
docker-clito our app's Dockerfile and, crucially, mounted/var/run/docker.sock:rowhile addinggroup_add: "988"(the Docker GID on the host) to the app container's configuration.
3. Robust GitHub Token Resolution
- The Challenge: CKB needed to clone private GitHub repositories. Our initial
resolveGitHubToken()function only queried thegithub_tokenstable (for individual user tokens), failing for tenant-level API keys stored inapi_keys. - The Lesson: Authentication logic needs to be comprehensive, accounting for all possible token sources and implementing intelligent fallbacks. Don't assume one token type fits all scenarios.
- The Fix: We implemented a fallback mechanism: first, try
github_tokens, and if that fails, queryapi_keysspecifically forprovider: "github".
4. Defensive Data Handling for External APIs
- The Challenge: CKB's JSON output sometimes varied from our initial assumptions. For instance,
hotspotswere wrapped in an object{ hotspots: [...] }instead of being a raw array, and field names likefile,risk,severitywere actuallyfilePath,score,riskLevel. Also, numerical values sometimes came back as strings ornull. - The Lesson: Always treat data from external APIs with suspicion. Implement strong type guards (
Array.isArray()), useRecord<string, unknown>for initial parsing, and apply nullish coalescing or explicit type conversions (Number(h.score) || 0) to ensure data integrity. - The Fix: We refactored our data parsing to defensively check for array types, correctly map field names, and cast values to their expected types, providing resilience against minor API output variations.
5. Conditional Polling in React Query
- The Challenge: We attempted to use
refetchInterval: (query) => { ... }for conditional polling withreact-query, but encountered an "(intermediate value) is not iterable" error, suggesting a version incompatibility or misuse. - The Lesson: While powerful, advanced features of libraries can sometimes be tricky or have unexpected behaviors across versions. Sometimes, a simpler, more direct approach using React's core hooks is more reliable.
- The Fix: We reverted to a simpler
useState(polling)anduseEffectto toggle a boolean, then usedrefetchInterval: polling ? 2000 : false. This achieved the desired conditional polling without fighting the library.
Current State & The Road Ahead
As of today, our production environment (root@46.225.232.35, commit 8af3329) is running strong. The CKB container (nyxcore-ckb-1) is live with CKB v8.1.0, built fresh on the server. The project_ckb_indexes table is populated, and even the background ckb-docs agent is hard at work generating technical and executive summaries.
Our immediate next steps include:
- Waiting for the documentation agent to finish its work.
- Committing and pushing the generated docs.
- Phase 3: Implementing a webhook endpoint for automated re-indexing on GitHub pushes.
- Updating our
content-loadertests to reflect the actual CKB field names. - Extracting valuable attack scenarios from a specific workflow.
- Exploring SCIP index support for even richer symbol analysis, addressing current "INDEX_MISSING" warnings.
This journey has been a testament to iterative development, problem-solving, and the power of a dedicated team. We're excited to see how CKB empowers our users with deeper insights into their codebases!