How to replace string in a large one line, text file in Linux?


Some software reads an entire input file into memory before processing it. If the input file contains a very long string, the software may crash if there isn't enough memory to hold the entire string.

We’ll look at ways to change a single character in a very large one-liner file in Linux. Some applications cannot handle very large one-liners, so we’ll examine our options.

Target File

Some modern JavaScript frameworks compress all of the code onto a single statement. Let’s say we have a one-liner of JavaScript code called original.js with an error in it. It calls “fliter“ instead of “filtre“. We‘ll correct this mistake in the next section.

Using tr and sed

We can split the line into two parts using tr and then we can substitute the first part for the second part using sed.

Splitting Long Lines

We usually use sed -i to replace a single line, but sed will try loading the entire file into RAM. To overcome this, let's break our line into multiple smaller lines, then pass them to sed. Finally, join the results back together again.

In Linux, by default, lines are separated with
ewline. In our case,
ewline is replaced with
ewline and fed into sed. We have to choose an element that is not in the line we wish to change. Also, the output should be relatively short after replacement.

If we want to split the one line into multiple lines, we can use a command called tr which processes each letter individually. So, for example, if we wanted to replace every space with a new line, we could type tr'' '
' where n represents any number of spaces.

To replace ; with
, use the following command −

Command

$ echo "This is line one;This is line two" | tr ";" "
"

Output

This is line one
This is line two

In case there are any newlines in our document, we should replace both ";" with "
" and "
" with ";". Doing this will allow us to keep the original newline sequences intact. We'll then run tr ";" "
"; to convert the first line into a single newline sequence, and tr "
" ";" to convert the second line into a single newline sequence.

We're going to add newlines to our input, which means we need to swap out the t parameter for the inverse one. So let's say we want to replace every ; with
, and every
with ;. Then we'd just write something like this −

Command

$ echo "This is line one;This is line two" | tr ";
" "
;" | tr "
;" ";
"

Output

This is line one;This is line two

We can see, our inputs were identical.

Using awk

There are other programs than sed that can replace strings in files. We can use awk and it's gsub function to perform these steps. This will be a two step procedure to set up awk's line delimiters and substitute the string.

Changing the Line Delimiter

We can replace the default newline character (the
) with any character that isn't part of the string we're trying to split by. For example, if we wanted to split our input into words, we could replace the newline character with an underscore (_).

To use different line delimiters in awk, we'll set the RS (record separator) to the desired character within the BEGIN block. If we want to use semicolons as our newline delimiters, for example, we'd set RS=";". Let's take a look at an example −

Command

$ echo "This is line one;This is line two" | awk 'BEGIN{RS=";"}{print}'

Output

This is line one
This is line two

As mentioned in the last section, we must produce an outcome that matches the inputs. Even though awk splits lines using the “;“ character, the result has to match the original inputs. We can see that awk’s printf function prints out a new line that wasn't in the original inputs.

Let’s use the printf function instead, so no newlines are added −

Command

$ echo "This is line one;This is line two" | awk 'BEGIN{RS=";"}{printf "%s", $0}'

Output

This is line oneThis is line two

We can see, we are only missing the “;” character. We know that all lines start with a line delimiter, except the first one. So, let’s prepend the “;” character to all lines unless it is the first one −

Command

$ echo "Thsi is line one;This is line two" | awk 'BEGIN{RS=";"}{ if (NR != 1) { printf "%c", RS } printf "%s", $0 }'

Output

This is line one;This is line two

We use the NR variable to determine where we're currently at in our input file and then use the RS variable to print out the newlines.

Replacing the String

We've seen how to use awk (and sed) to split lines into fields using any non-newline characters. Now let's see how to replace one line in a text document with another.

To remove a word from a file using awk, we'll use the gensub function. This function works similarly to the sed's substitute command. It takes two parameters; the first one is a regular expression and the second one is what we want to put into place of the pattern. We'll use the same code from the previous example to do so.

We’ll keep repeating the same thing we did before. Let’s replace “.fliter()” with “.filter()”.

$ awk 'BEGIN{RS=";"} {
   gsub("\.fliter\(", ".filter(")
   if (NR != 1) {
      printf "%c", RS
   }
   printf "%s", $0
   }' < original.js > fixed.js

Notice that there is a difference between sed when we're escaping characters. We need to also be escaping the "(" character, and we need to be using two backslashes.

Conclusion

We looked at two ways to replace a string inside an extremely long one-liner.

We've seen how to manipulate files using sed, but we also learned how to manipulate them using awk. In this example, we used both tools to manipulate the same input.

Updated on: 01-Dec-2022

736 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements