Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to strip spaces/tabs/newlines using Python regular expression?
Python regular expressions (regex) provide powerful methods to strip whitespace characters including spaces, tabs, and newlines from strings. The re module offers several approaches to handle different whitespace scenarios.
This article explains how to strip various types of whitespace using regular expressions, covering the following methods ?
- Stripping All Whitespace Using re.sub()
- Stripping Leading/Trailing Whitespace
- Splitting on Whitespace Characters
- Stripping Specific Whitespace Types
Stripping All Whitespace Using re.sub()
The re.sub() function replaces all occurrences of a pattern with a replacement string. Using \s+ pattern matches one or more whitespace characters (spaces, tabs, newlines) and replaces them with a single space or removes them entirely.
Example
Here's how to remove all extra whitespace from a string ?
import re
text = " Hello \t World \n Python "
# Remove all whitespace
no_whitespace = re.sub(r'\s+', '', text)
print("No whitespace:", repr(no_whitespace))
# Replace multiple whitespace with single space
single_space = re.sub(r'\s+', ' ', text).strip()
print("Single space:", repr(single_space))
No whitespace: 'HelloWorldPython' Single space: 'Hello World Python'
Stripping Leading/Trailing Whitespace
To remove whitespace only from the beginning and end of strings, use ^ (start) and $ (end) anchors with \s* pattern.
Example
The following example strips whitespace from both ends ?
import re
text = " \t Hello World \n "
# Strip leading whitespace
leading_stripped = re.sub(r'^\s+', '', text)
print("Leading stripped:", repr(leading_stripped))
# Strip trailing whitespace
trailing_stripped = re.sub(r'\s+$', '', text)
print("Trailing stripped:", repr(trailing_stripped))
# Strip both leading and trailing
both_stripped = re.sub(r'^\s+|\s+$', '', text)
print("Both stripped:", repr(both_stripped))
Leading stripped: 'Hello World \n ' Trailing stripped: ' \t Hello World' Both stripped: 'Hello World'
Splitting on Whitespace Characters
The re.split() function can split strings on various whitespace patterns. The \s+ pattern splits on one or more consecutive whitespace characters.
Example
Here's how to split text on different whitespace patterns ?
import re
text = "Python\tis\n\nawesome programming"
# Split on any whitespace
words = re.split(r'\s+', text)
print("Split on whitespace:", words)
# Split on newlines only
lines = re.split(r'\n+', text)
print("Split on newlines:", lines)
# Split on tabs only
tab_split = re.split(r'\t+', text)
print("Split on tabs:", tab_split)
Split on whitespace: ['Python', 'is', 'awesome', 'programming'] Split on newlines: ['Python\tis', 'awesome programming'] Split on tabs: ['Python', 'is\n\nawesome programming']
Stripping Specific Whitespace Types
You can target specific whitespace characters using character classes. Common patterns include spaces [ ], tabs \t, and newlines \n.
Example
The following example demonstrates stripping specific whitespace types ?
import re
text = " Hello\t\tWorld\n\nPython "
# Remove only spaces
no_spaces = re.sub(r' +', '', text)
print("No spaces:", repr(no_spaces))
# Remove only tabs
no_tabs = re.sub(r'\t+', '', text)
print("No tabs:", repr(no_tabs))
# Remove only newlines
no_newlines = re.sub(r'\n+', '', text)
print("No newlines:", repr(no_newlines))
# Remove spaces and tabs, keep newlines
no_space_tabs = re.sub(r'[ \t]+', ' ', text)
print("No space/tabs:", repr(no_space_tabs))
No spaces: 'Hello\t\tWorld\n\nPython' No tabs: ' HelloWorld\n\nPython ' No newlines: ' Hello World Python ' No space/tabs: ' Hello World\n\nPython '
Common Whitespace Patterns
| Pattern | Matches | Use Case |
|---|---|---|
\s |
Any whitespace character | General whitespace handling |
\s+ |
One or more whitespace | Multiple consecutive spaces |
[ \t] |
Spaces and tabs only | Preserve line breaks |
\n+ |
One or more newlines | Paragraph separation |
Conclusion
Regular expressions provide flexible control over whitespace stripping in Python. Use re.sub() to replace patterns, re.split() to split on whitespace, and specific character classes to target particular whitespace types. Choose the appropriate pattern based on your specific whitespace handling needs.
