Beautiful Soup - Search by text inside a Tag



Beautiful Soup provides different means to search for a certain text in the given HTML document. Here, we use the string argument of the find() method for the purpose.

In the following example, we use the find() method to search for the word 'by'

Example

html = '''
   <p> The quick, brown fox jumps over a lazy dog.</p>
   <p> DJs flock by when MTV ax quiz prog.</p>
   <p> Junk MTV quiz graced by fox whelps.</p>
   <p> Bawds jog, flick quartz, vex nymphs./p>
'''
from bs4 import BeautifulSoup, NavigableString

def search(tag):
   if 'by' in tag.text:
      return True

soup = BeautifulSoup(html, 'html.parser')
tag = soup.find('p', string=search)
print (tag)

Output

<p> DJs flock by when MTV ax quiz prog.</p>
You can find all occurrences of the word with find_all() method
tag = soup.find_all('p', string=search)
print (tag)

Output

[<p> DJs flock by when MTV ax quiz prog.</p>, <p> Junk MTV quiz graced by fox whelps.</p>]

There may be a situation where the required text may be somewhere in a child tag deep inside the document tree. We need to first locate a tag which has no further elements and then check whether the required text is in it.

Example

html = '''
   <p> The quick, brown fox jumps over a lazy dog.</p>
   <p> DJs flock by when MTV ax quiz prog.</p>
   <p> Junk MTV quiz graced by fox whelps.</p>
   <p> Bawds jog, flick quartz, vex nymphs./p>
'''
from bs4 import BeautifulSoup


soup = BeautifulSoup(html, 'html.parser')
tags = soup.find_all(lambda tag: len(tag.find_all()) == 0 and "by" in tag.text)
for tag in tags:
   print (tag)

Output

<p> DJs flock by when MTV ax quiz prog.</p>
<p> Junk MTV quiz graced by fox whelps.</p>
Advertisements