AWK - Text Processing Utility

AWK - Text Processing Utility

Jan 26, 2022 6:36 AM

Today at work, I had to munge some bash output together and pass it in the data as an argument to another bash function. I had always seen coworkers use awk and examples of its use on StackOverflow, but I never learned how to really use it or how it worked. As part of my effort to do more exploring, I decided to do a quick tutorial on how to use awk.

awk follows a simple workflow: read, execute, and repeat. It reads the first line of a stream of data, executes the awk command on it, and repeats till it hits the end of the file. The program structure of an awk command consists of a BEGIN, a body, and end END section. Before the text is processed, the BEGIN section of the awk command executes. After the text is processed, the END section of the awk command executes. For every line of input in the body, the body section of the awk command executes. Additionally, the body section of the awk command can be prefixed with a pattern so that there are different commands that are executed based on the pattern.

In my specific case, I wanted to take some PostgreSQL query and turn it into a CSV. I know that psql has capabilities to write directly to a CSV file, but I wanted to try it with awk. I took my output:

email | date | | 1-26-2021

There were some columns with null values in the date column. I wanted to take this output and turn it into a CSV.

I wrote this command to do it:

awk '/@/{ print $1 "," $3 }' data.txt

This command takes any line that contains the @ character and prints its first column, a tab, and then the third column. In this case, it would output something like    1-26-2021

Additionally, I wanted to ignore any rows that didn’t have a date in it. I could have modified the SQL query but that wouldn’t have been as fun.

awk '/@.*-/{ print $1 "," $3 }' data.txt

This included any row that had an @ and a - in it which most likely meant date (good enough for me!). I was succesfully able to take my data and turn it into a CSV format.

However, I wanted to go deeper down the rabbit hole — I wanted to try formatting my output so that it would be more readable to me. awk has another print function which is basically the same as the C-style printf formatting arguments. For example, awk '/@.*-/ { printf "%-50s %10s\n", $1, $3 }' data.txt would print the first column left justified with 50 characters and the second column right justified with a string of 10 characters. My output looked similar to this:                1-26-2021             1-26-2021          1-26-2021        1-26-2021

This was much more readable!

While I was figuring out text formatting with awk I also came across some other cool functions that you could use awk for. For example, I could include an index next to each line that I print.

awk '/@.*-/ { cnt++ } /@.*-/ { printf "%-6s %-50s %10s", cnt, $1, $3 }' data.txt

This command increments the cnt variable any time a line with that pattern was matched. For every one of those lines, it would print the cnt variable and then also print the email and date .

1                1-26-2021
2             1-26-2021
3          1-26-2021
4        1-26-2021

You can do many other powerful things with awk with loops, control flow, and arrays as well.