How to match whitespace in python using regular expressions


Regular expressions, often known as RegEx, are a string of characters corresponding to letters, words, or character patterns in text strings. It implies that you may use regular expressions to match and retrieve any string pattern from the text. Search and replace procedures benefit from the usage of regular expressions. The most common application is searching for a substring that matches a pattern and substituting something else.

What are whitespace characters?

"Whitespace" refers to any letter or set of characters representing either horizontal or vertical space. Using regular expressions, the metacharacter “\s” matches whitespace characters in python.

Algorithm

  • Import re functions

  • Initialize a string.

  • Use metacharacter \s for matching whitespace in python.

  • Use the findall method, ' \s’ metacharacter, and the string as the arguments.

  • Print the result and get the matching whitespaces.

Syntax

result = re.findall(r'[\s]', str)
re.findall(): Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found.

regx = re.compile('\W')
re.compile(): We can compile a regular expression into a regex object to look for occurrences of the same pattern inside various target strings without rewriting it.

result = regx.findall(str)
The re module provides a series of methods that let us look for matches in a string:

findall: returns a list of all matches.
split: Returns a list with the string split at each match.
sub: substitutes a string for one or more matches.
  • search − returns a Match object if the string has a match somewhere.

Example 1: How to match whitespace in python

#importing re function import re #initialising a string str str= 'The Psychology of Money.' #storing the value of findall method in a variable result result = re.findall(r'[\s]', str) #printing the result print('The give string is \n',str) print('It has',len(result),'WhiteSpaces') print (result)

Output

The string in the above code has 3 whitespaces. Likewise, the following is the output of the above commands −

('The give string is \n', 'The Psychology of Money.')
('It has', 3, 'WhiteSpaces')
[' ', ' ', ' ']

Code Explanation

We import the re-module to get started to match whitespace in python using regular expressions. The next step is to initialize the variable “str” with the string from which we want to match the whitespaces. The metacharacter “\s” is used for checking whitespaces using RegEx in python.

A variable defined as “result” stores the result of the python function findall(). This function searches the entire text for all instances in which the pattern is present. It takes two parameters, the metacharacter “[\s]” and the string “str.” The final step is to print the result as the output.

Example

#importing re function import re #initializing a string str str= "Honesty is the best policy." #storing the value of findall method in a variable result result = re.findall(r'[\s]', str) #printing the result print('The given string is \n',str) print('It has',len(result),'WhiteSpaces') print (result)

Output

The string in the above code has 4 whitespaces. Likewise, the following is the output of the above commands −

('The given string is \n', 'Honesty is the best policy.')
('It has', 4, 'WhiteSpaces')
[' ', ' ', ' ', ' ']

Example

#importing re function import re #Taking input from the user and storing it in a string str str= 'Honesty is the best policy' #initialising regex, which will compile all matching word characters regx = re.compile('\W') #storing the value of findall method in a variable result result = regx.findall(str) #printing the result print('The given string is \n',str) print('It has',len(result),'WhiteSpaces') print (result)

Output

The output of the above commands is as follows −

('The given string is \n', 'Honesty is the best policy')
('It has', 4, 'WhiteSpaces')
[' ', ' ', ' ', ' ']

Code Explanation

We load the re-module to begin utilizing regular expressions in Python to match whitespace. The next step is asking the user for a string input containing the whitespaces we wish to match. When using RegEx in Python, the metacharacter "s" is used to match whitespaces.

The Python method findall is stored in a variable named "result" (). This method looks for every instance of the pattern in the text. It requires the metacharacter "[\s]" and string "str" as two arguments. The output returns the whitespaces present in the string given by the user.

Conclusion

Regular expressions are specialized text strings that provide a search pattern. They are a series of characters representing certain letters, words, or character combinations in text strings. The re-module is used for working with regular expressions. The metacharacter ‘\s’ is used for matching whitespaces in python using regular expressions.

The most common functions used in RegEx are findall(), search(), split(), and sub(). Anchors, Character Sets, and Modifiers are the key components of the structure of a regular expression.

Updated on: 20-Sep-2022

13K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements