How to Search and Replace text in Python?

PythonServer Side ProgrammingProgramming

Problem

You want to search for and replace a text pattern in a string.

If we have a very simple literal patterns, using the str.replace() method is an optimal solution.

Example

def sample():
yield 'Is'
yield 'USA'
yield 'Colder'
yield 'Than'
yield 'Canada?'

text = ' '.join(sample())
print(f"Output \n {text}")

Output

Is USA Colder Than Canada?

Let us first see how to search a text.

# search for exact text
print(f"Output \n {text == 'USA'}")

Output

False

We can search for the text using the basic string methods, such as str.find(), str.endswith(), str.startswith().

# text start with
print(f"Output \n {text.startswith('Is')}")

Output

True
# text ends with
print(f"Output \n {text.startswith('Is')}")

Output

True
# search text with find
print(f"Output \n {text.find('USA')}")

Output

3

If the input text to search is more complicated then we can use regular expressions and the re module.

# Let us create a date in string format
date1 = '22/10/2020'
# Let us check if the text has more than 1 digit.
# \d+ - match one or more digits
import re
if re.match(r'\d+/\d+/\d+', date1):
print('yes')
else:
print('no')
yes

Now, coming back to replacing a text. If the text and the string to replace is simple then use str.replace().

Output

print(f"Output \n {text.replace('USA', 'Australia')}")

Output

Is Australia Colder Than Canada?

If there are complicated patterns to search and replace then we can leverage the sub() methods in re module.

The first argument to sub() is the pattern to match and the second argument is the replacement pattern.

In the below example, we will find the date fields in dd/mm/yyyy and replace them in format - yyyy-dd-mm. Backslashed digits such as \3 refer to capture group numbers in the pattern

import re
sentence = 'Date is 22/11/2020. Tommorow is 23/11/2020.'
# sentence
replaced_text = re.sub(r'(\d+)/(\d+)/(\d+)', r'\3-\1-\2', sentence)
print(f"Output \n {replaced_text}")

Output

Date is 2020-22-11. Tommorow is 2020-23-11.

Another way of doing is to compile the expression first to get better performance.

Output

pattern = re.compile(r'(\d+)/(\d+)/(\d+)')
replaced_pattern = pattern.sub(r'\3-\1-\2', sentence)
print(f"Output \n {replaced_pattern}")

Output

Date is 2020-22-11. Tommorow is 2020-23-11.

re.subn() will give us the number of substitutions made along with replacing the text.

Output

output, count = pattern.subn(r'\3-\1-\2', sentence)
print(f"Output \n {output}")

Output

Date is 2020-22-11. Tommorow is 2020-23-11.

Output

print(f"Output \n {count}")

Output

2
raja
Published on 10-Nov-2020 05:22:04
Advertisements