Article Categories

Selected Reading

Extracting a substring using Linux bash

Linux Open Source Operating System

Extracting a substring from a string is a fundamental text processing operation in Linux. This involves isolating specific portions of text based on character positions (index-based) or patterns (delimiter-based).

We'll explore different methods to extract substrings from strings using the Linux command line, covering both index-based and pattern-based approaches.

Index-Based Substring Extraction

Index-based extraction involves specifying the starting position and length of the desired substring. Here are four common methods:

Using the cut Command

The cut command extracts characters from position N to position M using the -c option. Note that cut uses 1-based indexing.

cut -c START-END

Example extracting "Linux" from positions 5-9:

$ cut -c 5-9 <<< '0123Linux9'
Linux

Using the awk Command

The awk command provides the substr() function with three parameters:

s The input string
i The start index (1-based)
n The length of substring (optional)

$ awk '{print substr($0, 5, 5)}' <<< '0123Linux9'
Linux

Using Bash Substring Expansion

Bash provides built-in substring expansion using the syntax ${string:position:length}. This uses 0-based indexing.

$ STR="0123Linux9"
$ echo ${STR:4:5}
Linux

Using the expr Command

The expr substr command extracts substrings using 1-based indexing:

expr substr <input_string> <start_index> <length>

$ expr substr "0123Linux9" 5 5
Linux

Pattern-Based Substring Extraction

Pattern-based extraction uses delimiters or field separators to isolate specific portions of text. This is particularly useful for structured data like CSV files.

Using cut with Delimiters

The cut command can split input using delimiters (-d) and extract specific fields (-f):

$ cut -d ',' -f 3 <<< "Eric,Male,28,USA"
28

Using awk with Field Separators

The awk command excels at field-based processing using the -F option to specify field separators:

$ awk -F',' '{print $3}' <<< "Eric,Male,28,USA"
28

For more flexible pattern matching, awk supports regular expressions as field separators:

$ awk -F', ?' '{print $3}' <<< "Eric, Male, 28, USA"
28

Advanced Pattern Extraction

For complex patterns like extracting text between specific markers, awk provides powerful solutions:

$ STR="whatever dataBEGIN:Interesting dataEND:something else"
$ awk -F'BEGIN:|END:' '{print $2}' <<< "$STR"
Interesting data

Alternative approach using substitution:

$ awk '{ sub(/.*BEGIN:/, ""); sub(/END:.*/, ""); print }' <<< "$STR"
Interesting data

Comparison of Methods

Method	Indexing	Best For	Flexibility
cut	1-based	Simple character/field extraction	Limited
awk	1-based	Complex text processing	High
Bash expansion	0-based	Shell scripting	Moderate
expr	1-based	Basic substring operations	Low

Conclusion

Linux provides multiple powerful methods for substring extraction, each with specific strengths. Choose cut for simple field operations, awk for complex pattern matching, Bash expansion for shell scripts, and expr for basic substring needs.

Satish Kumar

Updated on: 2026-03-17T09:01:38+05:30

6K+ Views

Previous Next