The Case of the Missing Upload: Debugging ENOENT in a Distributed File Flow
We chased a mysterious ENOENT error in our document upload system, unraveling a classic race condition and a crucial missing API route. Here's how we fixed it and what we learned about robust file handling.
The Case of the Missing Upload: Debugging ENOENT in a Distributed File Flow
There are few errors as frustratingly vague and yet definitively fatal as ENOENT: No such file or directory. Especially when you're sure you just uploaded that file. Recently, my team was wrestling with just such a phantom ENOENT in our Axiom document upload system, specifically when users tried to upload critical ISO 27001 compliance documents. The files simply weren't there when our backend tried to process them.
This wasn't just a minor glitch; it was a blocker for a core feature. What started as a simple document upload spiraled into an investigation revealing a classic race condition and a crucial gap in our API design.
The Setup: Presigned URLs, tRPC, and a Local Storage Adapter
Our application, Axiom, needed to allow users to upload various document types. To handle this efficiently and securely, we adopted a common pattern:
- Client Request: The frontend, built with Next.js and tRPC, initiates an upload.
- Presigned URL: The backend generates a "presigned" URL. This URL isn't for direct S3 access (yet, we're using a local adapter for development), but rather a temporary, authenticated endpoint on our own server to receive the file data. This keeps the actual file transfer out of the main tRPC request-response cycle, allowing for larger files and better separation of concerns.
- Client PUT: The client then uses this presigned URL to
PUTthe file data directly to our server. - Confirmation: Once the client thinks the file is uploaded, it sends a final confirmation to the backend, triggering document processing.
This seemed like a solid plan. The backend was supposed to store these files locally in /tmp/nyxcore-uploads/axiom/{tenantId}/{projectId}/{timestamp}-{filename}.
The Symptom: ENOENT and the Phantom File
Users would upload their ISO 27001 documents, the UI would show a success message, but then later processing jobs would fail with ENOENT. The files just weren't there. It was as if they vanished into the ether.
My initial thoughts drifted to permissions, disk space, or perhaps a misconfigured path. But a quick check confirmed none of these were the culprits. The error consistently pointed to the file simply not existing when our processDocument function (which internally uses fs.readFile()) tried to access it.
The Investigation: Two Critical Flaws Uncovered
This debugging session quickly became a deep dive into our system's interactions. The "pain log" from this session tells the story best:
Pain Point 1: The Missing Route Handler
Our upload flow looked like this: tRPC axiom.upload → presigned URL → client PUT → confirmUpload → processDocument.
The presigned URL pointed to /api/v1/uploads/{storageKey}. This was the critical insight. While we had a REST API path (/api/v1/rag/ingest) that handled direct file writes within its own request handler, the specific PUT request generated by the tRPC-presigned URL flow was hitting a non-existent route handler.
The client was dutifully sending the file data, but there was no one home to receive it. The PUT request was silently 404'ing, meaning the file data was never actually written to disk. Naturally, fs.readFile() would then throw ENOENT.
Lesson Learned: When introducing new data flow paths, especially those involving external HTTP requests (even to your own server), rigorously verify that every single endpoint in that chain is correctly implemented and reachable. Don't assume a similar-looking path covers all bases. A silent 404 on an upload can be incredibly misleading!
Pain Point 2: The Client-Side Race Condition
Even if the route handler had existed, we had another problem. The client-side logic in src/app/(dashboard)/dashboard/projects/[id]/page.tsx was structured like this:
// ... inside uploadMutation.onSuccess ...
// This fires when *our client-side upload mutation* completes,
// NOT necessarily when the file has been successfully PUT to the server.
confirmMutation.mutate(); // This was the culprit!
// ... rest of the client-side logic ...
The confirmMutation.mutate() call, which tells the backend to start processing the document, was being triggered immediately after our client-side uploadMutation finished its initial setup (getting the presigned URL). The actual PUT request of the file data to the presigned URL was happening asynchronously and independently of this onSuccess callback.
This meant confirmUpload was often racing ahead, telling the backend to processDocument before the PUT request had even finished, or in our case, before it had even started (due to the 404!).
Lesson Learned: Be extremely careful with onSuccess callbacks in asynchronous operations, especially in multi-step processes. Ensure onSuccess truly signifies the completion of the critical step it's meant to confirm, not just the initiation of a sub-process. Distributed operations require careful choreography.
The Fix: Building the Receiver and Orchestrating the Client
With these two critical issues identified, the solution became clear.
1. The Missing PUT Handler: src/app/api/v1/uploads/[...path]/route.ts
We created a dedicated PUT handler to receive the file data. This handler is more than just a file writer; it's a gatekeeper, ensuring incoming data is authenticated, authorized, and safe.
// src/app/api/v1/uploads/[...path]/route.ts
import { NextRequest, NextResponse } from 'next/server';
import { authenticateRequest } from '@/lib/auth'; // Custom auth middleware
import { storeFileLocally } from '@/lib/storage/localStorageAdapter'; // Our storage utility
import path from 'path'; // For path resolution and security
const ALLOWED_EXTENSIONS = new Set([
'.md', '.txt', '.pdf', '.ts', '.js', '.py', '.json', '.yaml', '.yml', '.toml', '.html', '.css'
]);
export async function PUT(req: NextRequest, { params }: { params: { path: string[] } }) {
const { tenantId, userId } = await authenticateRequest(req); // Authenticate & get user context
// 1. Path Validation & Security: Prevent path traversal
const storageKey = params.path.join('/');
const resolvedPath = path.resolve('/tmp/nyxcore-uploads/axiom/', storageKey); // Ensure no `..`
if (!resolvedPath.startsWith('/tmp/nyxcore-uploads/axiom/')) {
return new NextResponse('Path traversal attempt detected.', { status: 400 });
}
// 2. Tenant Isolation: Ensure the upload path belongs to the authenticated tenant
const pathTenantId = params.path[0]; // Assuming tenantId is the first part of the path
if (pathTenantId !== tenantId) {
return new NextResponse('Tenant mismatch.', { status: 403 });
}
// 3. Extension Allowlist
const fileExtension = path.extname(storageKey).toLowerCase();
if (!ALLOWED_EXTENSIONS.has(fileExtension)) {
return new NextResponse(`File type ${fileExtension} not allowed.`, { status: 400 });
}
// 4. Database Verification (e.g., check if a ProjectDocument with this storageKey is pending)
// This step ensures we're only accepting uploads for documents we expect.
// (Simplified for blog post, actual implementation would query DB for status: "pending")
const isExpectedUpload = await checkPendingDocumentInDB(storageKey);
if (!isExpectedUpload) {
return new NextResponse('No pending document found for this upload key.', { status: 404 });
}
try {
const buffer = await req.arrayBuffer(); // Read the file data
await storeFileLocally(storageKey, Buffer.from(buffer)); // Write to disk
const response = new NextResponse('File uploaded successfully.', { status: 200 });
response.headers.set('X-Content-Type-Options', 'nosniff'); // Security header
return response;
} catch (error) {
console.error(`Error uploading file ${storageKey}:`, error);
return new NextResponse('Internal server error during upload.', { status: 500 });
}
}
// Dummy function for illustration
async function checkPendingDocumentInDB(storageKey: string): Promise<boolean> {
// In a real app, you'd query your database (e.g., Prisma) here
// to find a ProjectDocument where `storageKey` matches and `status` is 'pending'.
// This prevents arbitrary uploads.
return Promise.resolve(true); // Always true for this example
}
This handler ensures that any file arriving via a presigned URL is properly authenticated, validated, and stored securely.
2. Client-Side Choreography: src/app/(dashboard)/dashboard/projects/[id]/page.tsx
We adjusted the client-side logic to ensure confirmMutation is only called after the PUT request to the presigned URL successfully completes.
// src/app/(dashboard)/dashboard/projects/[id]/page.tsx (Simplified)
// ... inside the component ...
const uploadFile = async (file: File, storageKey: string) => {
// 1. Get the presigned URL from our tRPC backend
const presignedUrl = await axiom.upload.getPresignedUrl.mutate({ storageKey });
// 2. Perform the actual file PUT request
try {
const putResponse = await fetch(presignedUrl, {
method: 'PUT',
headers: {
'Content-Type': file.type,
// Potentially add auth headers if presigned URL doesn't fully embed auth
},
body: file,
});
if (!putResponse.ok) {
console.error('File PUT failed:', putResponse.statusText);
// IMPORTANT: Continue on PUT failure so we don't trigger processing for a broken upload
// This prevents the backend from trying to process a non-existent file.
return;
}
// 3. ONLY NOW, after successful PUT, confirm the upload with the backend
confirmMutation.mutate({ storageKey }); // This was moved here!
} catch (error) {
console.error('Error during file PUT:', error);
// Handle network errors, etc.
}
};
// ... elsewhere in the component, when an upload is triggered ...
// e.g., in an onChange handler for an input type="file"
const handleFileUpload = async (event: React.ChangeEvent<HTMLInputElement>) => {
const file = event.target.files?.[0];
if (file) {
// Generate a unique storage key (simplified for example)
const storageKey = `tenant-abc/project-123/${Date.now()}-${file.name}`;
await uploadFile(file, storageKey);
}
};
By moving confirmMutation.mutate() to after the fetch call's success check, we eliminated the race condition. We also added a return on putResponse.ok failure, ensuring that partially failed uploads don't trigger backend processing and lead to more ENOENT errors.
Immediate Next Steps and Future Enhancements
While the critical ENOENT bug is squashed (commit baf22ab is ready to push!), our session also highlighted a few areas for improvement:
- Download Functionality: Our
LocalStorageAdapter.getDownloadUrl()currently won't work because there's noGEThandler for/api/v1/uploads/[...path]. This will be crucial for users to retrieve their documents. - Actual Deletion:
LocalStorageAdapter.delete()is a no-op. For proper data hygiene and compliance, implementing actual file deletion is a must. - Streaming Uploads: For very large files,
req.arrayBuffer()can lead to Out-Of-Memory (OOM) errors. Moving to streaming body reads would make our upload system more robust.
Conclusion
This debugging journey was a stark reminder that even seemingly straightforward features like file uploads can hide complex interactions in distributed systems. The ENOENT error, often dismissed as a simple file-not-found, can be a symptom of deeper architectural misunderstandings or subtle race conditions. By meticulously tracing the data flow, implementing robust API handlers with security in mind, and carefully orchestrating client-side interactions, we transformed a critical bug into a more resilient and secure upload system.
Happy coding, and may your files always be found!
{
"thingsDone": [
"Created PUT handler for local file uploads at `/api/v1/uploads/[...path]`",
"Implemented authentication and tenant isolation in the PUT handler",
"Added path traversal prevention using `path.resolve()` boundary assertion",
"Enforced an extension allowlist for uploaded files",
"Included database verification for pending documents in the upload flow",
"Set `X-Content-Type-Options: nosniff` header for security",
"Fixed client upload flow by moving `confirmMutation.mutate()` to after successful PUT response",
"Added a `return` mechanism on PUT failure to prevent premature processing"
],
"pains": [
"Presigned URL pointed to a non-existent API route handler, causing silent 404s for actual file PUT requests",
"Client-side `confirmMutation` was racing ahead, triggering backend processing before the file upload itself had completed or even started",
"The combination of the missing handler and race condition led to consistent `ENOENT` errors when the backend tried to read non-existent files"
],
"successes": [
"Successfully implemented a secure and functional local file upload API endpoint",
"Resolved critical `ENOENT` errors that were preventing document processing",
"Improved client-side upload workflow robustness and error handling for multi-step operations",
"Enhanced understanding of distributed system interactions, API design, and race conditions within our team"
],
"techStack": [
"Next.js",
"tRPC",
"Node.js",
"TypeScript",
"File System (fs)",
"API Routes",
"React Query (for mutations)"
]
}