Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Extracting a substring using Linux bash
Extracting a substring from a string is a fundamental text processing operation in Linux. This involves isolating specific portions of text based on character positions (index-based) or patterns (delimiter-based).
We'll explore different methods to extract substrings from strings using the Linux command line, covering both index-based and pattern-based approaches.
Index-Based Substring Extraction
Index-based extraction involves specifying the starting position and length of the desired substring. Here are four common methods:
Using the cut Command
The cut command extracts characters from position N to position M using the -c option. Note that cut uses 1-based indexing.
cut -c START-END
Example extracting "Linux" from positions 5-9:
$ cut -c 5-9 <<< '0123Linux9' Linux
Using the awk Command
The awk command provides the substr() function with three parameters:
s The input string
i The start index (1-based)
n The length of substring (optional)
$ awk '{print substr($0, 5, 5)}' <<< '0123Linux9'
Linux
Using Bash Substring Expansion
Bash provides built-in substring expansion using the syntax ${string:position:length}. This uses 0-based indexing.
$ STR="0123Linux9"
$ echo ${STR:4:5}
Linux
Using the expr Command
The expr substr command extracts substrings using 1-based indexing:
expr substr <input_string> <start_index> <length>
$ expr substr "0123Linux9" 5 5 Linux
Pattern-Based Substring Extraction
Pattern-based extraction uses delimiters or field separators to isolate specific portions of text. This is particularly useful for structured data like CSV files.
Using cut with Delimiters
The cut command can split input using delimiters (-d) and extract specific fields (-f):
$ cut -d ',' -f 3 <<< "Eric,Male,28,USA" 28
Using awk with Field Separators
The awk command excels at field-based processing using the -F option to specify field separators:
$ awk -F',' '{print $3}' <<< "Eric,Male,28,USA"
28
For more flexible pattern matching, awk supports regular expressions as field separators:
$ awk -F', ?' '{print $3}' <<< "Eric, Male, 28, USA"
28
Advanced Pattern Extraction
For complex patterns like extracting text between specific markers, awk provides powerful solutions:
$ STR="whatever dataBEGIN:Interesting dataEND:something else"
$ awk -F'BEGIN:|END:' '{print $2}' <<< "$STR"
Interesting data
Alternative approach using substitution:
$ awk '{ sub(/.*BEGIN:/, ""); sub(/END:.*/, ""); print }' <<< "$STR"
Interesting data
Comparison of Methods
| Method | Indexing | Best For | Flexibility |
|---|---|---|---|
| cut | 1-based | Simple character/field extraction | Limited |
| awk | 1-based | Complex text processing | High |
| Bash expansion | 0-based | Shell scripting | Moderate |
| expr | 1-based | Basic substring operations | Low |
Conclusion
Linux provides multiple powerful methods for substring extraction, each with specific strengths. Choose cut for simple field operations, awk for complex pattern matching, Bash expansion for shell scripts, and expr for basic substring needs.
