Linux’s AWK

October 15, 2014 · by dbversity · in Linux

Linux: Adding Numbers in a File

Here’s a Linux command to add numbers inside a file:
[root@myserver misc]# cat num.test
1
2
3
4
5

This is a sample file with only 5 lines. Think of a file with millions of lines. Either you can do with MS – Excel, which may hang(!!) or use Linux command – ‘awk’- which is powerful and easier:

[root@myserver misc]# awk ‘{s+=$1} END {print s}’ num.test
15

Linux: ‘awk’ Command – Group By !

While its pretty easy to do ‘Group By’ at database level, ‘awk’ enables us to do same at file level.

Consider the below file:

# cat test.csv
aa 1 qwer
ab 2 tyui
aa 3 poiu
ab 2 mnb
bb 1 njio
ba 2 njtwe

test.csv is a tab separated file with 3 columns. Here, I want to segregate the whole lines with matching 1st and 2nd columns into separate files.

Like below:

# cat file_bb_1.csv

bb 1 njio

# cat file_ba_2.csv

ba 2 njtwe

# cat file_ab_2.csv

ab 2 tyui

ab 2 mnb

# cat file_aa_3.csv

aa 3 poiu

# cat file_aa_1.csv

aa 1 qwer

Though you can do this manually, think of a file with more than million lines.

Here, ‘awk’, being an powerful data manipulation tool,comes to our help.

Below is the command we can use:

#cat test.csv | awk ‘{a=$1;b=$2; print $0 >> “file_” a “_” b “.csv”}’

[You can give any name instead of ‘file_’]

One Response

Jackson · May 22, 2015 at 07:00:00 · →

You don’t need to pipe cat into awk, since awk can read the file just fine.

These examples are exactly the same.
cat test.csv | awk ‘{a=$1; b=$2; print $0 >> “file_” a “_” b “.csv”}’
awk ‘{a=$1; b=$2; print $0 >> “file_” a “_” b “.csv”}’ test.csv

Also, Awk has “grep” like abilities. Just place the text in between two slashes.

# cat file1
hello world
bye later
good nice
bad mean

# awk ‘/b/’ file1
bye later
bad mean

What if you want get the word “bye” but not “bad”?

# awk ‘/b/ && !/ad/’ file1
bye later

What if you want to also get the word “world”?

# awk ‘/b|w/ && !/ad/’ file1
hello world
bye later

Just print the second column?
# awk ‘/b|w/ && !/ad/ {print $2}’ file1
world
later

What if we had a fifth column and wanted to print that with the second column as well?
# awk ‘/b|w/ && !/ad/ {print $2, $5}’ file1

Also in my experience some old versions of AWK cannot parse a file that may be too large. Try using NAWK instead. Don’t have NAWK? Try GAWK.