Deep dive

Why large file uploads fail and how CsvKit handles them

Network timeouts, browser memory limits, and single-request constraints are the main reasons large file uploads fail. Here's how chunked uploads solve each of these problems — and how CsvKit implements them.

April 18, 20268 min read

The three reasons large uploads fail

If you have ever tried uploading a multi-gigabyte file through a web interface and had it fail, it was almost certainly one of three problems: a timeout, a memory limit, or a network hiccup. Understanding each one makes it clear why the conventional single-request upload approach breaks down at scale.

1. HTTP request timeouts

Every HTTP request has a timeout — a maximum amount of time the server will wait before giving up and closing the connection. For most web servers and proxies, this is somewhere between 30 seconds and a few minutes. Uploading a 10GB file over a typical broadband connection (say, 50 Mbps upload) takes roughly 27 minutes. That is an order of magnitude longer than any standard timeout.

When the timeout fires, the connection drops, the partial upload is discarded on the server side, and the browser reports a generic network error. There is no way to resume. You start over.

2. Browser memory limits

When a browser reads a file for upload, it typically loads the entire file into memory as an ArrayBuffer or Blob. Modern browsers impose a per-tab memory cap — usually between 2GB and 4GB depending on the browser, OS, and available RAM. A 10GB file simply cannot fit. The tab crashes or the upload fails silently before a single byte is sent.

Even if the file is smaller than the memory cap, loading gigabytes of data into a browser tab competes with every other tab and system process for RAM. The result is a slow, unresponsive machine while the upload is in progress.

3. Network instability

A long-running upload is a single HTTP request. If the network drops for even a second anywhere along the way — a brief Wi-Fi hiccup, a packet loss event, a router rebooting — the entire request fails. You have no way to resume from the point of failure. The only option is to start over from zero.

This is especially painful for large files: a 30-minute upload that fails at minute 28 costs you 28 minutes of upload bandwidth, and you get to do it all over again.

The solution: chunked multipart upload

Chunked uploading solves all three problems by breaking the file into smaller pieces and uploading them independently. Here is how it works conceptually:

Initialise the upload. The client tells the server it wants to upload a file. The server creates a placeholder and returns an upload_id that identifies this particular upload session.
Upload each chunk. The client reads the file in small segments (CsvKit uses 10MB per chunk) and uploads each one as a separate HTTP request, tagged with its upload_id and a part number.
Complete the upload. Once all chunks have been sent, the client sends a final request listing all the parts. The server reassembles them in the correct order into the original file.

Why this matters: Each individual chunk upload takes seconds, not minutes. It is well within any timeout window. If one chunk fails, only that chunk needs to be re-sent — not the entire file.

How CsvKit implements chunked uploads

CsvKit uses a three-stage API behind the scenes for files above 5MB. Here is what happens when you upload a large file:

1 Upload initialisation

CsvKit calls the backend to register the upload. The backend returns an upload_id, a file_id (the UUID that will identify this file in future operations), and an s3_key pointing to the cloud storage destination. This request takes milliseconds.

2 Parallel chunk uploads

The file is sliced into 10MB chunks using the browser's built-in Blob.slice() API — this is done on-the-fly and never loads the full file into memory. CsvKit uploads up to 5 chunks simultaneously using a concurrency limiter. This maximises throughput without overwhelming the network or the server.

Each chunk upload includes automatic retry logic: if a chunk fails, CsvKit retries it up to 3 times with exponential backoff (1 second, then 2 seconds, then 3 seconds between attempts) before reporting an error. Transient network blips are handled transparently.

3 Upload completion

Once all chunks have been confirmed, CsvKit sends a completion request with the list of all parts (each part has an ETag — a checksum returned by the server when the chunk was received). The backend verifies the parts are all present and reassembles the file. The file_id is now ready for processing.

Small files: a simpler path

For files under 5MB, chunked upload is unnecessary overhead. CsvKit uses a direct upload instead: the file is sent as a single multipart form request, and an operation stream is polled to confirm when the upload processing is complete. This path is faster for small files and avoids the three-step handshake.

What about upload progress?

With chunked uploads, progress tracking is straightforward: as each chunk completes, CsvKit increments the progress counter. If 4 of 10 chunks have uploaded, the progress bar shows 40%. This gives you a reliable, accurate indicator of where the upload stands.

Compare this to a single-request upload, where the browser's XMLHttpRequest.upload.onprogress event fires as bytes leave the browser — but there is no feedback on whether those bytes actually reached the server intact. The progress bar can show 100% while the server is still processing or the connection is about to time out.

Security and session isolation

Every request to CsvKit — including each individual chunk upload — includes a session key in the X-Session-Key header. This key is generated when you first visit the tool and is unique to your browser session. It ensures that your chunks cannot be mixed up with another user's upload, and that your files are only accessible during your active session.

Practical limits

CsvKit currently supports files up to 50GB. At 10MB per chunk, a 50GB file requires 5,000 individual chunk uploads. With 5 concurrent uploads and a typical latency of around 200ms per chunk, the total upload time is limited primarily by your available upload bandwidth — not by any server-side constraint.

For reference, a 10GB file over a 100 Mbps upload connection takes approximately 14 minutes. The same file over a 10 Mbps connection takes around 2.5 hours — slow, but it will complete reliably because no single chunk upload approaches the timeout window.

Summary

Large file uploads fail because of timeouts, browser memory limits, and network instability — all of which are properties of trying to move a huge file in a single HTTP request. Chunked uploads solve this by breaking the file into small, independently uploadable segments. CsvKit applies this approach automatically for files above 5MB: initialise, upload 10MB chunks in parallel with retry logic, then complete. The result is reliable uploads for files up to 50GB, accurate progress tracking, and no crashes.

Ready to try it yourself?

Free, no sign-up required. Works with files up to 50 GB.

Try CSV Split Try CSV Merge