The Quest for Real Paths: Grounding LLMs in Your Codebase
LLMs are powerful, but their tendency to hallucinate file paths and architecture can derail code generation. Discover how we tackled this challenge by grounding our LLM prompts with real codebase context using `{{claudemd}}` and `{{fileTree}}`.
As developers, we're constantly pushing the boundaries of what AI can do, especially when it comes to assisting with code generation and understanding. Large Language Models (LLMs) are incredible tools, capable of generating sophisticated code, refactoring suggestions, and even entire architectural proposals. But there's a recurring nightmare for anyone building LLM-powered developer tools: hallucination.
Specifically, when an LLM tries to generate code or suggest changes within an existing codebase, it often invents file paths, assumes directory structures, or references non-existent modules. This isn't just annoying; it's a productivity killer. You get a brilliant suggestion, only to find half the paths are completely made up, forcing you to manually correct them or discard the output entirely.
We faced this exact challenge in our workflow engine. Our goal was to empower users to leverage LLMs for complex tasks like extending features or analyzing security vulnerabilities within their GitHub repositories. The problem? Our LLM was a bit too creative with its file system knowledge, often fabricating paths that simply didn't exist in the target repository. We needed a way to ground its responses in reality.
The Problem: LLMs Living in a Dream World
Imagine asking an LLM to "Add a new API endpoint to handle user profiles." Without direct knowledge of your project's structure, it might confidently suggest creating src/api/user/profile.ts when your actual structure uses src/routes/users/profile.ts or server/controllers/userController.ts. This mismatch means extra work, breaking the flow of an otherwise helpful AI assistant.
Our previous sessions saw path accuracy hovering around 40-50%. That's a lot of manual correction, and it undermined trust in the AI's output. We needed to bridge the gap between the LLM's vast general knowledge and the specific, nuanced reality of a user's unique codebase.
Our Solution: Grounding with {{claudemd}} and {{fileTree}}
The core idea was simple: provide the LLM with explicit, up-to-date context directly from the target repository before it generates a response. We decided to introduce two new template variables into our prompt resolution engine:
{{claudemd}}: This variable fetches the content of aCLAUDE.mdfile (or falls back toREADME.md) from the root of each linked repository. This file is intended to provide a high-level overview of the project, its architecture, key technologies, and any specific LLM interaction guidelines. It's like giving the LLM a project brief.{{fileTree}}: This variable provides a structured, current snapshot of the repository's file tree. This is the crucial piece for eliminating path hallucinations. By showing the LLM exactly what files and directories exist, we give it a concrete map to navigate.
Behind the Scenes: How We Built It
Implementing these context providers required a few key modifications to our workflow engine:
-
GitHub Integration: We extended our
src/server/services/github-connector.tswith a newfetchRepoTree()function. This leverages the GitHub Git Trees API, which is incredibly efficient for fetching repository structure. We added logic to filter out common noise directories (likenode_modules,.git,dist) and to gracefully fall back from themainbranch tomasterifmainisn't found. -
Context Loading: In
src/server/services/workflow-engine.ts, we introducedloadClaudeMdContent()andloadFileTreeContent().loadClaudeMdContent()intelligently checks forCLAUDE.mdfirst, thenREADME.md, giving preference to the LLM-specific documentation.loadFileTreeContent()fetches the tree (capped at 500 entries to prevent overwhelming the LLM with massive repositories) and formats it into a clean Markdown code block, perfect for prompt injection.
-
Parallel Execution: To ensure these context loaders didn't slow down the workflow, we designed them to run in parallel using
Promise.allduring the initial workflow startup. -
Context Extension: Our
ChainContext(the object holding all the dynamic data for prompt resolution) was extended to includeclaudeMdContentandfileTreeContentfields. -
Prompt Resolution: The core of our system,
resolvePrompt(), was updated to recognize and substitute{{claudemd}}and{{fileTree}}alongside existing variables like{{docs}}and{{consolidations}}.typescript// Simplified example of resolvePrompt logic async function resolvePrompt(template: string, context: ChainContext): Promise<string> { let resolved = template; resolved = resolved.replace(/{{claudemd}}/g, context.claudeMdContent || ''); resolved = resolved.replace(/{{fileTree}}/g, context.fileTreeContent || ''); // ... handle other variables like {{docs}}, {{consolidations}} return resolved; } -
Prompt Template Updates: We updated five critical prompt templates (
extensionAnalyze,extensionPrompt,secRecon,secPrompts,deepPrompt) insrc/lib/constants.tsto include these new variables. Crucially, we also modified the associated system prompts to explicitly instruct the LLM: "MUST reference real file paths from the provided file tree — never invent or guess paths." This direct instruction is vital for guiding the LLM's behavior.
The Results: A Leap in Accuracy!
After implementing and testing, the difference was dramatic. We ran an end-to-end "Extension Builder" workflow, which involves multiple steps of code generation and analysis.
- Before Grounding: Path accuracy was in the 40-50% range.
- After Grounding:
- Step 1: 97% real paths (28 out of 29 paths were correct!)
- Step 3: 91% real paths (31 out of 34 paths were correct!)
This is a massive win. The LLM is now consistently referencing actual files and directories, making its suggestions directly actionable and significantly reducing the post-generation cleanup work.
Lessons Learned (from the "Pain Log")
No development session is without its bumps. Here are a few takeaways from our struggles that might save you some headaches:
npx tsxand Top-Level Await: Whiletsxis fantastic for quick TypeScript script execution, running top-level await directly vianpx tsx -e '...'can be tricky. We hit "cjs output format" errors. If you need top-level await, it's often more reliable to write your script to a file (e.g.,temp-script.ts) and then runnpx tsx temp-script.ts. This ensurestsxcan correctly handle the module environment.- Module Resolution and Project Root: When running ad-hoc scripts, especially ones that depend on packages like
@prisma/client, always ensure you're executing them from your project's root directory. Running from a temporary directory like/tmp/will often lead to "Cannot find module" errors because yournode_modulesare not in the expected path. - Pre-existing Tech Debt: It's a common developer experience: discovering unrelated existing errors (
"outline"not assignable to Badge variant indiscussions/[id]/page.tsx:139). It's a good reminder that not every corner of a large codebase is perfectly polished, and sometimes you just have to note it and move on (or create a separate ticket!).
What's Next?
Our journey to a perfectly grounded LLM isn isn't over. We're already looking at:
- Expanding Context Usage: Integrating
{{claudemd}}and{{fileTree}}into more prompt templates if we observe further hallucinations in downstream workflow steps. - Documenting the Power: Updating our
CLAUDE.mddocumentation to clearly list{{claudemd}}and{{fileTree}}as supported variables, encouraging users to leverage them. - Refining Workflow Costs: Updating our
estimateWorkflowCostfunction to properly account forgenerateCountmultipliers, ensuring accurate resource predictions.
This session marked a significant step forward in making our LLM-powered workflows truly robust and reliable. By explicitly grounding the LLM with real codebase context, we've transformed its output from creative fiction into actionable reality. If you're building similar systems, don't underestimate the power of direct, structured context!
{"thingsDone":[
"Added fetchRepoTree() using GitHub Git Trees API, with branch fallbacks",
"Implemented loadClaudeMdContent() for CLAUDE.md/README.md loading",
"Implemented loadFileTreeContent() for structured file tree fetching (capped at 500 entries)",
"Extended ChainContext with claudeMdContent and fileTreeContent",
"Added {{claudemd}} and {{fileTree}} resolution to resolvePrompt()",
"Enabled parallel loading of context variables with Promise.all",
"Updated 5 key prompt templates to include new variables",
"Updated system prompts to explicitly instruct LLM to use provided file paths",
"Verified path accuracy in end-to-end test (97% and 91% accuracy)"
],"pains":[
"npx tsx -e '...' with top-level await fails with 'cjs output format'",
"Scripts from /tmp/ fail with 'Cannot find module @prisma/client'",
"Pre-existing TS error in discussions/[id]/page.tsx"
],"successes":[
"Dramatic improvement in LLM path accuracy (from ~40-50% to >90%)",
"Successfully grounded LLM prompts in real codebase context",
"Workflow engine is more robust and reliable for code generation tasks"
],"techStack":[
"TypeScript",
"Node.js",
"GitHub API",
"LLM Workflow Engine",
"Prompt Engineering",
"Prisma"
]}