How to Write Scripts Using Awk Programming Language?


Awk is a powerful text-processing language named after its three original authors: Alfred Aho, Peter Weinberger, and Brian Kernighan. It's a versatile language primarily used for pattern scanning and processing. Awk is a staple of Unix scripting and is commonly used for tasks like data extraction, reporting, and data transformation.

Awk scripts are quick to write and perform well for small to medium-sized tasks. In this article, we will introduce you to the basics of writing scripts using the Awk programming language.

Basic Syntax

An Awk program consists of a sequence of pattern-action pairs, written as −

pattern { action }

Here, the pattern is a condition. If the input line matches the pattern, then the action is performed.

For example −

awk '/search_pattern/ { print $0 }' file_name

In this example, awk will search for the line that includes search_pattern from file_name, and if it matches, it will print the whole line ($0).

Using Variables

Awk has built-in variables that you can use to format your output. Some of the most common are −

  • $0 − The entire line.

  • $1, $2, ... − Each individual field (default delimited by whitespace).

  • FS − Field separator (defaults to a space).

  • OFS − Output field separator (defaults to a space).

  • NR − Number of records processed.

  • NF − Number of fields in the current record.

Let's look at a practical example using some of these variables. Assume we have a text file named 'students.txt' with the following content −

John Doe 18
Jane Smith 19

We can use awk to print the names and ages separately −

awk '{ print "Name: " $1 " " $2 ", Age: " $3 }' students.txt

The output will be −

Name: John Doe, Age: 18
Name: Jane Smith, Age: 19

Control Flow

Awk also supports common control flow mechanisms like if, else, while, and for. Here's an example using if and else −

awk '{ if ($3 > 18) print $1 " is an adult"; else print $1 " is a minor"}' students.txt

The output will be −

John is an adult
Jane is a minor

Functions

Awk has built-in functions for string manipulation, arithmetic operations, and input/output, among others. You can also define your own functions.

Here's an example of a user-defined function that converts temperatures from Fahrenheit to Celsius −

function toCelsius(fahrenheit) {
   return (fahrenheit - 32) * 5/9
}

BEGIN { print "Fahrenheit Celsius" }
{ print $1, toCelsius($1) }

If we have an input file 'temperatures.txt' with Fahrenheit temperatures −

32
212

The output will be −

Fahrenheit Celsius
32 0
212 100

Regular Expressions

Awk supports regular expression syntax which can be used in pattern matching. Here is a basic example where we are searching for lines in our 'students.txt' that start with the letter 'J' −

awk '/^J/ { print $0 }' students.txt

In this case, the caret (^) symbol represents the start of a line. This script will output −

John Doe 18
Jane Smith 19

Arrays

Awk supports one-dimensional arrays that can be used for more complex data manipulation. Let's consider a case where we want to count the occurrence of ages in our 'students.txt' file. Here's how you can do it −

awk '{ count[$3]++ } END { for (age in count) print age " appears " count[age] " times." }' students.txt

This will output −

18 appears 1 times.
19 appears 1 times.

In this script, count[$3]++ uses the age (third field) as the key to the array and increments its value each time it appears.

Advanced Data Manipulation

Awk also provides several built-in functions for more advanced data manipulation. For example, it provides the split() function, which can split a string into an array −

awk '{ split($1, array, ""); print "First letter of the name: " array[1] }' students.txt

This script will output −

First letter of the name: J
First letter of the name: J

Combining Awk with Other Unix Commands

You can combine Awk scripts with other Unix commands using pipes (|), which makes it an even more powerful tool −

cat students.txt | awk '{ print $1 }' | sort | uniq

This command will print the first names of the students, sort them, and then remove any duplicates. In this case, the output will be −

Jane
John

Using Scripts in Awk

While using Awk directly in the terminal is common for simple tasks, for more complex operations it can be more convenient to write scripts. Awk scripts follow the same pattern-action structure, but are written in a separate file.

First, create a new file with the .awk extension. The top line of the script should be the shebang line, pointing to the Awk interpreter −

#!/usr/bin/awk -f

Let's create an Awk script called 'students.awk' that calculates the average age of students −

#!/usr/bin/awk -f
BEGIN { 
   sum = 0
   count = 0
}
{ 
   sum += $3
   count++ 
}
END {
   print "Average age: " sum/count
}

To run the script, make it executable with chmod +x students.awk, and then run it with ./students.awk students.txt. This will print −

Average age: 18.5

Debugging Awk Scripts

Debugging Awk scripts can be a bit tricky due to the lack of built-in debugging tools. However, using print statements to display the value of variables at different points in the script can be helpful.

Also, the -W dump-variables[=file] option can be used to dump all variables and arrays to a file for debugging. To use this option, you would execute awk -W dump-variables=dump.txt script.awk.

Advanced Pattern Matching

Awk also supports advanced pattern matching with regular expressions. For example, you can use the ~ operator to match a field against a regular expression.

Consider a students.txt file with an additional field for the course they're studying −

John Doe 18 ComputerScience
Jane Smith 19 Mathematics

To find students studying Computer Science, you can write −

awk '$4 ~ /ComputerScience/ { print $1 " " $2 " is studying Computer Science." }' students.txt

This will output −

John Doe is studying Computer Science.

Conclusion

Awk is a powerful tool for text processing on Unix-based systems. Its power lies in its simplicity and the straightforward nature of its syntax. Whether you are manipulating text or performing arithmetic computations, Awk is an excellent tool to have in your programming toolkit.

Remember, the best way to learn Awk (or any language) is to use it. Try creating your own Awk scripts, starting with simple tasks and gradually moving to more complex ones as you get more comfortable with the language.

Updated on: 17-Jul-2023

51 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements