- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to split strings on multiple delimiters with Python?
Problem
You need to split a string into fields, but the delimiters aren’t consistent throughout the string.
Solution
There are multiple ways you can split a string or strings of multiple delimiters in python. The most and easy approach is to use the split() method, however, it is meant to handle simple cases.
re.split() is more flexible than the normal `split()` method in handling complex string scenarios.
With re.split() you can specify multiple patterns for the separator. As shown in the solution, the separator is either ahyphen(-), or whitespace( ), or comma(,) followed values. Regular expressions documentation can be found here.
Whenever that pattern is found, the entire match becomes the delimiter between the fields that are on either side of thematch.
Extract only the text between the delimiters (no delimiters).
Example
import re tennis_greats = 'Roger-federer, Rafael nadal, Novak Djokovic,Andy murray' """" #----------------------------------------------------------------------------- # Scenario 1 - Output the players # Input - String with multiple delimiters ( - , white space) # Code - Specify the delimters in [] #----------------------------------------------------------------------------- """ players = re.split(r'[-,\s]\s*',tennis_greats)
output
print(f" The output is - {players}")
The output is -
['Roger', 'federer', 'Rafael', 'nadal', 'Novak', 'Djokovic', 'Andy', 'murray']
Extract the text between the delimiters along with delimiters
Example
import re tennis_greats = 'Roger-federer, Rafael nadal, Novak Djokovic,Andy murray' """" #----------------------------------------------------------------------------- # Scenario 2 - Output the players and the delimiters # Input - String with multiple delimiters ( - , white space) # Code - Specify the delimters between pipe (|) #----------------------------------------------------------------------------- """ players = re.split(r'(-|,|\s)\s*',tennis_greats)
output
print(f" The output is -{players}")
The output is -
['Roger', '-', 'federer', ',', 'Rafael', ' ', 'nadal', ',', 'Novak', ' ', 'Djokovic', ',', 'Andy', ' ', 'murray']
Advertisements