Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to match whitespace in python using regular expressions
Regular expressions (RegEx) are powerful tools for matching patterns in text strings. The metacharacter "\s" specifically matches whitespace characters in Python, including spaces, tabs, newlines, and other whitespace characters.
What are Whitespace Characters?
Whitespace characters represent horizontal or vertical space in text. Common whitespace characters include:
Space ( )
Tab (\t)
Newline (\n)
Carriage return (\r)
Form feed (\f)
Vertical tab (\v)
Syntax
Here are the main patterns and functions for matching whitespace:
# Using \s metacharacter
result = re.findall(r'\s', text)
# Using character class
result = re.findall(r'[\s]', text)
# Using \W (non-word characters, includes whitespace)
regx = re.compile('\W')
result = regx.findall(text)
Key Functions
findall() ? Returns a list of all matches
search() ? Returns a Match object if found
split() ? Splits string at each match
sub() ? Replaces matches with a string
Example 1: Basic Whitespace Matching
import re
text = 'The Psychology of Money.'
result = re.findall(r'\s', text)
print('The given string is:', text)
print('It has', len(result), 'whitespaces')
print('Whitespaces found:', result)
The given string is: The Psychology of Money. It has 3 whitespaces Whitespaces found: [' ', ' ', ' ']
Example 2: Using Character Class
import re
text = "Honesty is the best policy."
result = re.findall(r'[\s]', text)
print('The given string is:', text)
print('It has', len(result), 'whitespaces')
print('Whitespaces found:', result)
The given string is: Honesty is the best policy. It has 4 whitespaces Whitespaces found: [' ', ' ', ' ', ' ']
Example 3: Using Compiled Regex
import re
text = 'Honesty is the best policy'
# \W matches non-word characters (includes whitespace)
regx = re.compile('\W')
result = regx.findall(text)
print('The given string is:', text)
print('It has', len(result), 'whitespaces')
print('Non-word characters found:', result)
The given string is: Honesty is the best policy It has 4 whitespaces Non-word characters found: [' ', ' ', ' ', ' ']
Example 4: Different Types of Whitespace
import re
text = "Hello\tWorld\nPython\r\nRegex"
whitespaces = re.findall(r'\s', text)
print('Text with various whitespaces:', repr(text))
print('Total whitespaces found:', len(whitespaces))
print('Whitespace characters:', [repr(ws) for ws in whitespaces])
Text with various whitespaces: 'Hello\tWorld\nPython\r\nRegex' Total whitespaces found: 4 Whitespace characters: ['\t', '\n', '\r', '\n']
Comparison of Methods
| Pattern | Matches | Best For |
|---|---|---|
\s |
All whitespace characters | General whitespace matching |
[\s] |
Same as \s (character class) | When combining with other patterns |
\W |
Non-word characters (includes whitespace) | Matching punctuation and whitespace |
Conclusion
Use the \s metacharacter to match whitespace characters in Python regular expressions. The re.findall() function returns all matches as a list, making it easy to count and analyze whitespace patterns in text.
