Guide to the AWK Programming Language on Linux


Introduction

AWK is a scripting language utilized for text processing in Linux. It's designed to manipulate data in text files, making it an ideal tool for data analysis and management tasks. With a range of built-in functions and operators, AWK can perform simple search and replace tasks or complex data transformations. Its concise syntax and versatility allow for easy manipulation of text files. AWK is a powerful tool for those who work with text files in Linux environments.

Let us provide a beginner's guide to the AWK programming language on Linux. We will cover the basic syntax of the language, the different types of operations that can be performed using AWK, and how to use AWK to process text files. We will also provide code examples and output for each of the operations covered.

Installing AWK in Linux Operating System

One way to install it is by utilizing the package manager provided by our Linux distribution. Here is the installation command −

$ sudo apt-get install gawk

After installing we can check the version of AWK in Linux −

$ awk --version

If AWK is installed, the command will display the version number.

Variables in AWK Programming Language

AWK provides a number of built-in variables that can be used in patterns and actions. The most used variables are −

  • NR − represents the current record (line) number

  • NF − represents the total number of fields in the input record.

  • $0 − The entire current record.

  • $1, $2, $3, … − The first, second, third, … field in the current record.

In addition to these built-in variables, AWK also allows user-defined variables. User-defined variables can be assigned values using the = operator.

AWK Command-Line Options

AWK is typically invoked from the command line using the awk command, which accepts various options and arguments. Here is a list of commonly used options −

  • -F − specifies the field separator for input files.

  • -v − Sets a value for a variable.

  • -f − specifies the AWK script file to be executed.

  • -n − disables automatic printing of lines.

  • -W − enables warnings.

Extracting Fields From a File Using AWK

Suppose we have a CSV file name ‘filename.csv’ with the following format −

name, age, gender
Papan, 22, Male
Priya, 27, Female

To extract the age and gender fields, we can use the following command −

$ awk -F, '{ print $2, $3 }' filename.csv

This AWK code will print the second and third fields (columns) of a CSV file called ‘filename.csv’, with the delimiter being a comma (‘,’).

Here is the explanation −

  • ‘awk’ − The command used to run an AWK script

  • ‘-F,’ − This option sets the field separator to a comma. This tells AWK to treat the CSV file as having commas as delimiters between fields.

  • “ ' { print $2, $3 }' ” − This is the AWK script itself. It tells AWK to print the second and third fields of each line in the file, separated by a space.

age gender
22 Male
27 Female

This command sets the field separator to comma (-F,) and prints the second and third fields ($2, $3) for each line in the file.

Filtering Data Using AWK Command

We have a file called "data.txt" that contains information about students such as their name, age, and grade. We want to filter out only the students who have a grade of "A".

Example

Put the below content on the "data.txt" file −

John, 18, A
Sara, 19, B
Mike, 20, A
Lisa, 18, C
Tom, 19, A

To filter out the students who have a grade of "A", we can use the following AWK command −

$ awk '$3=="A" {print}' data.txt

Output

John, 18, A
Mike, 20, A
Tom, 19, A

As we can see, only the rows that have a grade of "A" have been printed.

Example

We can also use the if statement in the AWK command to perform more complex filtering. For example, if we want to filter out only the students who are older than 18 and have a grade of "A", we can use the following AWK command −

$ awk '$2>18 && $3=="A" {print}' data.txt

Output

Mike, 20, A
Tom, 19, A

As we can see, only the rows that meet both conditions have been printed.

Implementing for Loops using AWK command

There are 3 loops −

  • while Loop

  • do-while Loop

  • for Loop

Here we are implementing ‘for loop’ only.

First make a ‘input.txt’ file and put the below content −

Soumen,25
Bob,30
Papan,22
Aditya,40
Joy,35

Now, let's explore how we can use loops in AWK to process the data in this file.

Make another file name ‘program.awk’

The following AWK program uses a for loop to print each line of the file −

{
   for (i = 1; i <= NF; i++) {
      printf("%s ", $i);
   }
   printf("
"); }

In this program, the pattern is blank, which means that the action will be applied to each line of the file. The action consists of a for loop that iterates over each field (column) of the line using the NF (number of fields) variable. Inside the loop, the printf function is used to print each field followed by a space. After the loop, the printf function is used again to print a newline character.

To run this program, we can use the following command −

$ awk -F, -f program.awk input.txt

Where "program.awk" is the filename of the AWK program and "-F," specifies that the field separator is a comma.

Soumen,25
Bob,30
Papan,22
Aditya,40
Joy,35

We can see that it prints all the lines from the ‘input.txt’ file.

Conclusion

This article introduces the AWK programming language on Linux, offering practical code examples and their corresponding output. AWK is a highly efficient tool for processing text data and extracting relevant information from it. The guide covers AWK's fundamental programming concepts, including loops. With with this knowledge, we will be equipped to create our own AWK programs on Linux.

Updated on: 29-Mar-2023

172 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements