Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to replace string in a large one line, text file in Linux?
Some software reads an entire input file into memory before processing it. If the input file contains a very long single-line string, the software may crash due to insufficient memory to hold the entire string.
We'll examine methods to replace strings in very large one-line files in Linux. Since some applications cannot handle extremely large single-line files efficiently, we need specialized approaches that don't load the entire file into memory at once.
Target File
Modern JavaScript frameworks often compress all code into a single line. Consider a one-line JavaScript file called original.js with an error ? it calls fliter instead of filter. We'll correct this mistake using memory-efficient techniques.
Using tr and sed
We can split the long line into smaller segments using tr, then substitute strings using sed, and finally rejoin the segments.
Splitting Long Lines
While sed -i is typically used for single-line replacement, it loads the entire file into RAM. To overcome this limitation, we break the line into multiple smaller lines, process them with sed, and join the results back together.
The key is choosing a delimiter character that doesn't exist in the content we want to modify. The tr command processes each character individually, making it memory-efficient for large files.
To replace semicolons with newlines, use this command:
$ echo "This is line one;This is line two" | tr ";" "<br>"
This is line one This is line two
If the original file contains newlines, we need to preserve them by swapping semicolons and newlines bidirectionally:
$ echo "This is line one;This is line two" | tr ";<br>" "<br>;" | tr "<br>;" ";<br>"
This is line one;This is line two
The output matches the original input, confirming our transformation preserves the file structure.
Using awk
The awk command provides another approach using its gsub function for string substitution. This involves two steps: setting up custom line delimiters and performing the string replacement.
Changing the Line Delimiter
We can replace the default newline character () with any character not present in our target string. This is done by setting the RS (record separator) variable in awk's BEGIN block.
To use semicolons as line delimiters:
$ echo "This is line one;This is line two" | awk 'BEGIN{RS=";"}{print}'
This is line one This is line two
The print function adds extra newlines. Using printf avoids this:
$ echo "This is line one;This is line two" | awk 'BEGIN{RS=";"}{printf "%s", $0}'
This is line oneThis is line two
To restore the original semicolon delimiters, we prepend them to all records except the first:
$ echo "This is line one;This is line two" | awk 'BEGIN{RS=";"}{
if (NR != 1) {
printf "%c", RS
}
printf "%s", $0
}'
This is line one;This is line two
The NR variable tracks the current record number, while RS contains our delimiter character.
Replacing the String
Now we can combine line splitting with string replacement. The gsub function in awk works similarly to sed's substitute command, taking a regular expression pattern and a replacement string.
To replace .fliter( with .filter( in our JavaScript file:
$ awk 'BEGIN{RS=";"} {
gsub("\.fliter\(", ".filter(")
if (NR != 1) {
printf "%c", RS
}
printf "%s", $0
}' < original.js > fixed.js
Note that awk requires different escaping than sed ? we need double backslashes and must escape parentheses and dots in the regular expression.
Comparison
| Method | Memory Usage | Complexity | Best For |
|---|---|---|---|
| tr + sed | Low | Medium | Simple character-based delimiters |
| awk | Low | Low | Complex pattern matching and replacement |
| sed alone | High | Low | Small files only |
Conclusion
Both tr + sed and awk provide memory-efficient methods to replace strings in extremely large one-line files. The key is splitting the content using delimiters, processing smaller segments, and reconstructing the original format. Choose awk for complex patterns or tr + sed for simpler character-based operations.
