Merging CSVs with mismatched columns — a practical guide
When your CSV files come from different sources, the columns rarely line up perfectly. Here's how to handle naming differences, extra columns, and missing fields — and merge everything into one clean file.
The real-world problem
You have exported customer records from two different systems. One file has columns named first_name, last_name, and email_address. The other has FirstName, Surname, and Email. Both contain the same kind of data, but the column names are completely different.
A naive merge — just stacking the files on top of each other — will produce a file with six columns instead of three, because the tool treats each unique name as a separate column. Rows from the first file will have empty values in FirstName, Surname, and Email, and rows from the second file will have empty values in first_name, last_name, and email_address. The resulting file is unusable.
This is the mismatched columns problem, and it comes up constantly when you work with data from multiple sources: different CRMs, different regions, different teams, different eras of the same product.
Two approaches to merging
Simple merge (when headers already match)
If your files share the same column names — even if the values are in different orders, or there are some extra columns — a simple merge stacks rows from all files into one output. CsvKit aligns columns by name, so email in file A will land in the same column as email in file B even if their column positions differ.
Use simple merge when:
- All files are exports from the same system
- You have already standardised column names manually
- The files have identical schemas (same columns, same names)
Mapped merge (when headers differ)
When column names differ across files, you need to tell the tool: "the column called Surname in file B is the same thing as last_name in file A." This is called column mapping, and it gives you precise control over the output schema.
With mapped merge you:
- Define the target columns — what the output file's headers should be
- For each input file, map each source column to one of the target columns (or mark it as skipped)
The result is a single clean file with exactly the columns you defined.
Step-by-step: mapped merge with CsvKit
Go to csvkit.net/merge and upload each file you want to merge. You can add as many files as you need. Each one uploads independently with its own progress bar.
Once all files are ready, click Configure Merge. On the configuration screen, select Mapped Merge. This unlocks the column mapping interface.
In the Output Column Names section, add the columns you want in the final file. You can type them manually, or click Copy from [filename] to pull the headers directly from one of your uploaded files and use that as the starting point.
For each file, a mapping table appears. The left column shows the file's source columns. The right column is a dropdown where you choose which target column it maps to — or (skip) to exclude it from the output.
Give your merged file a name (e.g. all_customers_merged.csv) and click Run Merge. When complete, a download button appears for the merged file.
Handling common scenarios
One file has extra columns the others don't
Include those columns in your target headers. For the other files, leave those columns unmapped — CsvKit will fill them with empty values for rows from files that do not have that column. This is the correct behaviour: it faithfully represents that the data was not available in those files.
You want to drop certain columns from all files
Simply do not include those columns in your target headers. Any source column that is not mapped to a target is automatically excluded from the output. There is no need to edit the original files.
Two columns from the same file should merge into one
This is not something a merge tool can handle — it is a data transformation problem. You would need to pre-process the file (for example, concatenating first_name and last_name into a full_name column) before uploading it to the merge tool.
Files have different date formats
CsvKit merges files as-is without modifying values. If file A has dates as 2024-01-15 and file B has 15/01/2024, the merged file will contain both formats in the same column. Date normalisation is a separate step best done in your database or with a data wrangling tool after the merge.
Tips for cleaner merges
- Standardise before merging where possible. If you control the exports, configure them to use the same column names up front — it eliminates the mapping step entirely.
- Use lowercase with underscores. Consistent naming conventions (
snake_case) prevent case-sensitivity issues in databases and analytics tools that receive the merged file. - Keep a note of your mappings. If you merge these files regularly, document the source-to-target mapping so future you (or a colleague) can reproduce it quickly.
- Check row counts after merging. The merged file should have a row count equal to the sum of all input files minus the header rows. If it is lower, a file may have been partially processed.
Summary
Mismatched columns are normal when merging data from different sources. Simple merge works when headers already align. Mapped merge gives you full control when they do not — you define the output schema and map each file's columns to it. The result is a single, clean file that is ready to import, analyse, or pass to the next stage of your pipeline.
Ready to try it yourself?
Free, no sign-up required. Works with files up to 50 GB.