Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Find all the numbers in a string using regular expression in Python
Extracting numbers from text is a common requirement in Python data analytics. Regular expressions provide a powerful way to define patterns for matching digits, decimal numbers, and numbers with signs.
Basic Number Extraction
The re.findall() function extracts all occurrences of a pattern from a string. The pattern r'\d+' matches one or more consecutive digits ?
import re text = "Go to 13.8 miles and then -4.112 miles." numbers = re.findall(r'\d+', text) print(numbers)
['13', '8', '4', '112']
Note that this pattern extracts only digits, splitting decimal numbers and ignoring signs.
Extracting Complete Numbers with Decimals
To capture complete decimal numbers including signs, we need a more comprehensive pattern ?
import re text = "Go to 13.8 miles and then -4.112 miles." numbers = re.findall(r'[-+]?\d*\.?\d+', text) print(numbers)
['13.8', '-4.112']
Pattern Breakdown
Let's understand the pattern r'[-+]?\d*\.?\d+' ?
import re
# Pattern components:
# [-+]? - Optional plus or minus sign
# \d* - Zero or more digits before decimal
# \.? - Optional decimal point
# \d+ - One or more digits after decimal
text = "Price: $25.99, Temperature: -10.5°C, Quantity: +100"
numbers = re.findall(r'[-+]?\d*\.?\d+', text)
print("Numbers found:", numbers)
# Alternative pattern for integers only
integers = re.findall(r'[-+]?\d+', text)
print("Integers only:", integers)
Numbers found: ['25.99', '-10.5', '+100'] Integers only: ['25', '99', '-10', '5', '+100']
Different Number Formats
Here are patterns for various number formats ?
import re
text = "Scientific: 1.5e-4, Percentage: 95%, Currency: $1,234.56"
# Scientific notation
scientific = re.findall(r'[-+]?\d*\.?\d+[eE][-+]?\d+', text)
print("Scientific:", scientific)
# Percentages
percentages = re.findall(r'\d+\.?\d*%', text)
print("Percentages:", percentages)
# Currency (with commas)
currency = re.findall(r'\$\d{1,3}(?:,\d{3})*\.?\d*', text)
print("Currency:", currency)
Scientific: ['1.5e-4'] Percentages: ['95%'] Currency: ['$1,234.56']
Comparison of Patterns
| Pattern | Matches | Example Result |
|---|---|---|
r'\d+' |
Digits only | ['13', '8', '4', '112'] |
r'[-+]?\d+ |
Signed integers | ['13', '8', '-4', '112'] |
r'[-+]?\d*\.?\d+' |
Complete numbers | ['13.8', '-4.112'] |
Conclusion
Use r'\d+' for simple digit extraction, r'[-+]?\d*\.?\d+' for complete decimal numbers with signs. Adjust the pattern based on your specific number format requirements.
