/etc

I thought I would write a short post describing some common operations you can do with simple CSV files on the command line, using only POSIX utilities which should be available on every *nix operating system (Mac OS X, Linux, etc.).

For more robust operations, see csvkit or csvfix.

Filter CSV Columns

cut -d, -fCOLUMNS

Sort CSV By Column

sort -t, -kCOLUMN,COLUMN [-kCOLUMN,COLUMN]

See examples here.

Set operations on CSVs

Note that these operations expect lexically sorted input. If your files aren’t sorted, you can achieve this by process substitution, e.g. comm -12 <(sort < A.csv) <(sort < B.csv). You can then sort the output by your desired column(s).

A.csv:

a,b,c
d,e,f
g,h,i

B.csv:

g,h,i
j,k,l
m,n,o

Intersection of two CSVs

comm -12 A.csv B.csv

Output:

g,h,i

Complement of two CSVs

comm -13 A.csv B.csv

Output:

j,k,l
m,n,o

comm -23 A.csv B.csv

Output:

a,b,c
d,e,f

Union of two CSVs

sort -u A.csv B.csv

Output:

a,b,c
d,e,f
g,h,i
j,k,l
m,n,o

Merge of two CSVs

sort -m A.csv B.csv

Output:

a,b,c
d,e,f
g,h,i
g,h,i
j,k,l
m,n,o

Merge CSV columns

paste -d, A.csv B.csv

Output:

a,b,c,g,h,i
d,e,f,j,k,l
g,h,i,m,n,o

This can also be used to perform a sort of key-based join of two CSV files, assuming they are already sorted by the key you want to join on (that is, each line in each file corresponds to the line you want to join on in the other file).

CSV to TSV

tr , "\\t" < FILENAME.csv > FILENAME.tsv

TSV to CSV

tr "\\t" , < FILENAME.tsv > FILENAME.csv

Re-order/remove/modify CSV columns

awk -F, '{print $3 "," $1 ".txt," $2}' A.csv

Output:

c,a.txt,b
f,d.txt,e
i,g.txt,h