Article Categories

Selected Reading

How to extract data from a string with Python Regular Expressions?

Python Server Side Programming Programming

In this article you will find out how to extract data from a string with Python Regular Expressions. In Python, extracting data from the given string is a common task. Regular expressions (regex) offer pattern-matching functionality to get and identify specific parts of a string.

Python's re module helps in working with regex easily. The common functions of this module are re.search(), re.findall() and re.match() to make it easier to extract desired data.

Common Regular Expression Functions

Function	Purpose	Returns
`re.findall()`	Find all matches	List of strings
`re.search()`	Find first match	Match object or None
`re.match()`	Match from start	Match object or None

Extracting Digits from a String

The following example will extract digits from the given string using the \d+ regex pattern. This pattern matches one or more consecutive digits ?

import re

# Define your text here
txt = "My ID: 89456, Ref num: 7863"

# Extract all the numbers from the string using findall()
nums = re.findall(r"\d+", txt)

# Print the result
print("Extracted numbers:", nums)
print("Number of digits found:", len(nums))

Extracted numbers: ['89456', '7863']
Number of digits found: 2

Extracting Email Addresses

Here we use the regex pattern \b[\w.-]+@[\w.-]+\.\w+\b for finding email addresses in text. This pattern matches email addresses like username@domain.com ?

import re

# Define your text here which contains email IDs
txt = "Contact us at contact@tutorialspoint.com or info@tutorix.com for support"

# Extract email IDs
emails = re.findall(r"\b[\w.-]+@[\w.-]+\.\w+\b", txt)

# Print the result
print("Found emails:", emails)
for i, email in enumerate(emails, 1):
    print(f"Email {i}: {email}")

Found emails: ['contact@tutorialspoint.com', 'info@tutorix.com']
Email 1: contact@tutorialspoint.com
Email 2: info@tutorix.com

Extracting Hashtags

Hashtags are widely used on social media platforms. The pattern #\w+ looks for words prefixed with # symbol ?

import re

# Define your text here which contains hashtags
txt = "Latest trending topics are: #Python #Coding #AI #MachineLearning"

# Extract hashtags using the findall() method
tags = re.findall(r"#\w+", txt)

# Print the result
print("Hashtags found:", tags)
print("Total hashtags:", len(tags))

Hashtags found: ['#Python', '#Coding', '#AI', '#MachineLearning']
Total hashtags: 4

Extracting Dates

The pattern \d{4}-\d{2}-\d{2} matches dates in YYYY-MM-DD format. It looks for four digits, a dash, two digits, a dash, and two digits ?

import re

# Define your text here which contains some dates
txt = "Important events: 2023-08-15, 2025-05-29, 2024-12-01 are scheduled"

# Extract dates using the findall() method
dates = re.findall(r"\d{4}-\d{2}-\d{2}", txt)

# Print the result
print("Dates found:", dates)
for date in dates:
    year, month, day = date.split('-')
    print(f"Year: {year}, Month: {month}, Day: {day}")

Dates found: ['2023-08-15', '2025-05-29', '2024-12-01']
Year: 2023, Month: 08, Day: 15
Year: 2025, Month: 05, Day: 29
Year: 2024, Month: 12, Day: 01

Pattern Syntax Summary

Pattern	Meaning	Example
`\d+`	One or more digits	123, 45
`\w+`	One or more word characters	Python, AI
`\b`	Word boundary	Start/end of word
`.`	Any character except newline	a, 1, @

Conclusion

Regular expressions provide powerful pattern matching for data extraction. Use re.findall() to extract all matches from text, and combine different patterns like \d+ for digits and #\w+ for hashtags to extract specific data types efficiently.

Nikitasha Shrivastava

Updated on: 2026-03-24T19:15:30+05:30

1K+ Views

Previous Next