The following code extracts the url from a html link using python regex
import re s = '''http://www.santa.com''' match = re.search(r'href=[\'"]?([^\'" >]+)', s) if match: print match.group(0)
This gives the output
href="http://www.santa.com"