Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python program to check for URL in a string
This article will teach you how to determine whether a string contains a URL or not. In Python, strings are collections of bytes that represent Unicode characters. When given a string, we will first determine whether it contains a URL and then extract it using regular expressions.
Using findall() Method
We will use Python's regular expression concept to solve this problem. Regular expressions are supported by the Python re module. The findall() method returns a list of all matches found in the string, scanning from left to right.
Syntax
re.findall(pattern, string)
Example
Here's a simple function to check and extract URLs from a string ?
import re
def checkURL(text):
# Regular expression pattern to match URLs
regex = r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+'
urls = re.findall(regex, text)
return urls
# Test string with URL
text = "Visit https://www.tutorialspoint.com/python for Python tutorials"
result = checkURL(text)
print("URLs found:", result)
URLs found: ['https://www.tutorialspoint.com/python']
Enhanced Example with Validation
This example checks if URLs exist and provides appropriate feedback ?
import re
def find_urls(text):
regex = r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+'
urls = re.findall(regex, text)
if urls:
return f"URLs found: {urls}"
else:
return "No URLs found in the string"
# Test with URL
text1 = "Check out https://www.python.org and https://www.tutorialspoint.com"
print(find_urls(text1))
# Test without URL
text2 = "This is just a plain text without any web links"
print(find_urls(text2))
URLs found: ['https://www.python.org', 'https://www.tutorialspoint.com'] No URLs found in the string
Using search() Method
The search() method finds the first occurrence of a pattern in the string. It returns a match object if found, or None if no match is found.
Syntax
re.search(pattern, string)
Example
Using search() to find and extract the first URL ?
import re
def find_first_url(text):
regex = r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+'
match = re.search(regex, text)
if match:
return match.group()
else:
return "No URL found"
# Test string
text = "Visit https://www.tutorialspoint.com and also check https://www.python.org"
first_url = find_first_url(text)
print("First URL found:", first_url)
First URL found: https://www.tutorialspoint.com
Comparison
| Method | Returns | Best For |
|---|---|---|
findall() |
List of all matches | Finding all URLs in text |
search() |
First match object | Finding first URL only |
Complete Example
Here's a comprehensive function that demonstrates both methods ?
import re
def analyze_urls(text):
regex = r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+'
# Find all URLs
all_urls = re.findall(regex, text)
# Find first URL
first_match = re.search(regex, text)
first_url = first_match.group() if first_match else None
print(f"Text: {text}")
print(f"All URLs: {all_urls}")
print(f"First URL: {first_url}")
print(f"Total URLs found: {len(all_urls)}")
print("-" * 50)
# Test cases
test_texts = [
"Visit https://www.tutorialspoint.com for tutorials",
"Check https://www.python.org and https://docs.python.org/3/",
"No URLs in this plain text"
]
for text in test_texts:
analyze_urls(text)
Text: Visit https://www.tutorialspoint.com for tutorials All URLs: ['https://www.tutorialspoint.com'] First URL: https://www.tutorialspoint.com Total URLs found: 1 -------------------------------------------------- Text: Check https://www.python.org and https://docs.python.org/3/ All URLs: ['https://www.python.org', 'https://docs.python.org/3/'] First URL: https://www.python.org Total URLs found: 2 -------------------------------------------------- Text: No URLs in this plain text All URLs: [] First URL: None Total URLs found: 0
Conclusion
Use re.findall() when you need to extract all URLs from a string, and re.search() when you only need the first URL. Both methods use regular expressions to match URL patterns effectively in text.
