Today at work, I had to munge some bash output together and pass it in the data as an argument to another bash function. I had always seen coworkers use awk
and examples of its use on StackOverflow, but I never learned how to really use it or how it worked. As part of my effort to do more exploring, I decided to do a quick tutorial on how to use awk
.
awk
follows a simple workflow: read, execute, and repeat. It reads the first line of a stream of data, executes the awk
command on it, and repeats till it hits the end of the file. The program structure of an awk
command consists of a BEGIN
, a body, and end END
section. Before the text is processed, the BEGIN
section of the awk
command executes. After the text is processed, the END
section of the awk
command executes. For every line of input in the body, the body section of the awk
command executes. Additionally, the body section of the awk
command can be prefixed with a pattern so that there are different commands that are executed based on the pattern.
In my specific case, I wanted to take some PostgreSQL query and turn it into a CSV. I know that psql
has capabilities to write directly to a CSV file, but I wanted to try it with awk
. I took my output:
email | date
kunal@mightyapp.com |
kunal@gmail.com | 1-26-2021
...
There were some columns with null
values in the date
column. I wanted to take this output and turn it into a CSV.
I wrote this command to do it:
awk '/@/{ print $1 "," $3 }' data.txt
This command takes any line that contains the @
character and prints its first column, a tab, and then the third column. In this case, it would output something like
kunal@mightyapp.com
kunal@gmail.com 1-26-2021
Additionally, I wanted to ignore any rows that didn’t have a date in it. I could have modified the SQL query but that wouldn’t have been as fun.
awk '/@.*-/{ print $1 "," $3 }' data.txt
This included any row that had an @
and a -
in it which most likely meant date (good enough for me!). I was succesfully able to take my data and turn it into a CSV format.
However, I wanted to go deeper down the rabbit hole — I wanted to try formatting my output so that it would be more readable to me. awk
has another print
function which is basically the same as the C-style printf
formatting arguments. For example, awk '/@.*-/ { printf "%-50s %10s\n", $1, $3 }' data.txt
would print the first column left justified with 50 characters and the second column right justified with a string of 10 characters. My output looked similar to this:
kunal@gmail.com 1-26-2021
kunal+hi@gmail.com 1-26-2021
kunal+hello@gmail.com 1-26-2021
kunal+goodbye@gmail.com 1-26-2021
This was much more readable!
While I was figuring out text formatting with awk
I also came across some other cool functions that you could use awk
for. For example, I could include an index next to each line that I print.
awk '/@.*-/ { cnt++ } /@.*-/ { printf "%-6s %-50s %10s", cnt, $1, $3 }' data.txt
This command increments the cnt
variable any time a line with that pattern was matched. For every one of those lines, it would print the cnt
variable and then also print the email
and date
.
1 kunal@gmail.com 1-26-2021
2 kunal+hi@gmail.com 1-26-2021
3 kunal+hello@gmail.com 1-26-2021
4 kunal+goodbye@gmail.com 1-26-2021
You can do many other powerful things with awk
with loops, control flow, and arrays as well.