Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Process Multiple Input Files Using Awk
Awk is a powerful text processing tool widely used by developers, system administrators, and analysts to manipulate data in various ways. It can process text files, extract data, and transform it into different formats. One of its key features is the ability to process multiple input files simultaneously, making it ideal for batch processing tasks.
How Awk Handles Multiple Input Files
When processing multiple input files, Awk treats each file as a separate stream of input data. It reads each file in sequence, processing the input data from each file in turn. This allows you to process files with the same type of data all at once, rather than processing each file individually.
Awk provides several built-in variables to track file processing
FILENAMEContains the name of the current input fileFNRLine number in the current file (resets for each new file)NRTotal line number across all files
Basic File Processing
To read data from multiple input files, specify the filenames as arguments to Awk. Consider these sample files
file1.txt
apple banana orange
file2.txt
carrot potato
Process both files with this command
awk '{print FILENAME ": " $0}' file1.txt file2.txt
This prints each line prefixed with its filename, producing
file1.txt: apple file1.txt: banana file1.txt: orange file2.txt: carrot file2.txt: potato
Processing Structured Data
For structured data like CSV files, you can process multiple files while maintaining field separation. Consider these files
sales1.csv
product,quantity,price apple,10,0.50 banana,15,0.40
sales2.csv
product,quantity,price orange,8,0.60 grape,12,0.80
Calculate total revenue from both files
awk -F',' 'NR==1 || FNR==1 {next} {total += $2 * $3} END {print "Total Revenue: $" total}' sales1.csv sales2.csv
This command skips header rows and calculates the total revenue by multiplying quantity by price for each product.
File-Specific Processing
You can perform different operations based on which file is being processed
awk '{
if (FILENAME == "file1.txt")
print "Fruit: " $0
else if (FILENAME == "file2.txt")
print "Vegetable: " $0
}' file1.txt file2.txt
Advanced Examples
Merging CSV Files with Headers
To merge multiple CSV files while keeping only one header
awk 'FNR==1 && NR!=1 {next} {print}' file1.csv file2.csv > merged.csv
This skips the header row from the second file onwards, ensuring only one header appears in the merged output.
Calculating Statistics Across Files
Process log files to count errors per file
awk '/ERROR/ {errors[FILENAME]++} END {
for (file in errors)
print file ": " errors[file] " errors"
}' log1.txt log2.txt log3.txt
Combining Data with File Tracking
Create a summary that tracks which file each record came from
awk '{print $0 "," FILENAME}' data1.txt data2.txt > combined_with_source.csv
Best Practices
| Technique | Use Case | Example |
|---|---|---|
| Use FILENAME variable | File-specific processing | if (FILENAME == "config.txt") |
| Check FNR vs NR | Handle headers in multiple files | FNR==1 && NR!=1 {next} |
| Use associative arrays | Track data by filename | data[FILENAME]++ |
| END block processing | Generate final reports | END {print summary} |
Conclusion
Awk's ability to process multiple input files makes it an excellent tool for batch processing and data analysis tasks. By leveraging built-in variables like FILENAME, FNR, and NR, you can create sophisticated data processing workflows that handle multiple files efficiently while maintaining full control over the processing logic.
