Linux’s AWK
Linux: Adding Numbers in a File
Here’s a Linux command to add numbers inside a file:
[root@myserver misc]# cat num.test
1
2
3
4
5
[root@myserver misc]# cat num.test
1
2
3
4
5
This is a sample file with only 5 lines. Think of a file with millions of lines. Either you can do with MS – Excel, which may hang(!!) or use Linux command – ‘awk’- which is powerful and easier:
[root@myserver misc]# awk ‘{s+=$1} END {print s}’ num.test
15
Linux: ‘awk’ Command – Group By !
While its pretty easy to do ‘Group By’ at database level, ‘awk’ enables us to do same at file level.
Consider the below file:
# cat test.csv
aa 1 qwer
ab 2 tyui
aa 3 poiu
ab 2 mnb
bb 1 njio
ba 2 njtwe
aa 1 qwer
ab 2 tyui
aa 3 poiu
ab 2 mnb
bb 1 njio
ba 2 njtwe
test.csv is a tab separated file with 3 columns. Here, I want to segregate the whole lines with matching 1st and 2nd columns into separate files.
Like below:
# cat file_bb_1.csv
bb 1 njio
# cat file_ba_2.csv
ba 2 njtwe
# cat file_ab_2.csv
ab 2 tyui
ab 2 mnb
# cat file_aa_3.csv
aa 3 poiu
# cat file_aa_1.csv
aa 1 qwer
Though you can do this manually, think of a file with more than million lines.
Here, ‘awk’, being an powerful data manipulation tool,comes to our help.
Below is the command we can use:
#cat test.csv | awk ‘{a=$1;b=$2; print $0 >> “file_” a “_” b “.csv”}’
[You can give any name instead of ‘file_’]
You don’t need to pipe cat into awk, since awk can read the file just fine.
These examples are exactly the same.
cat test.csv | awk ‘{a=$1; b=$2; print $0 >> “file_” a “_” b “.csv”}’
awk ‘{a=$1; b=$2; print $0 >> “file_” a “_” b “.csv”}’ test.csv
Also, Awk has “grep” like abilities. Just place the text in between two slashes.
# cat file1
hello world
bye later
good nice
bad mean
# awk ‘/b/’ file1
bye later
bad mean
What if you want get the word “bye” but not “bad”?
# awk ‘/b/ && !/ad/’ file1
bye later
What if you want to also get the word “world”?
# awk ‘/b|w/ && !/ad/’ file1
hello world
bye later
Just print the second column?
# awk ‘/b|w/ && !/ad/ {print $2}’ file1
world
later
What if we had a fifth column and wanted to print that with the second column as well?
# awk ‘/b|w/ && !/ad/ {print $2, $5}’ file1
Also in my experience some old versions of AWK cannot parse a file that may be too large. Try using NAWK instead. Don’t have NAWK? Try GAWK.