Beautiful Soup - Find all Children of an Element



The structure of tags in a HTML script is hierarchical. The elements are nested one inside the other. For example, the top level <HTML> tag includes <HEAD> and <BODY> tags, each may have other tags in it. The top level element is called as parent. The elements nested inside the parent are its children. With the help of Beautiful Soup, we can find all the children elements of a parent element. In this chapter, we shall find out how to obtain the children of a HTML element.

There are two provisions in BeautifulSoup class to fetch the children elements.

  • The .children property
  • The findChildren() method

Examples in this chapter use the following HTML script (index.html)

<html>
<head>
<title>TutorialsPoint</title>
</head>
<body>
<h2>Departmentwise Employees</h2>
<ul id="dept">
<li>Accounts</li>
   <ul id='acc'>
   <li>Anand</li>
   <li>Mahesh</li>
   </ul>
<li>HR</li>
   <ul id="HR">
   <li>Rani</li>
   <li>Ankita</li>
   </ul>
</ul>
</body>
</html>

Using .children property

The .children property of a Tag object returns a generator of all the child elements in a recursive manner.

The following Python code gives a list of all the children elements of top level <ul> tag. We first obtain the Tag element corresponding to the <ul> tag, and then read its .children property

Example

from bs4 import BeautifulSoup

with open("index.html") as fp:
   soup = BeautifulSoup(fp, 'html.parser')

tag = soup.ul
print (list(tag.children))

Output

['\n', <li>Accounts</li>, '\n', <ul>
<li>Anand</li>
<li>Mahesh</li>
</ul>, '\n', <li>HR</li>, '\n', <ul>
<li>Rani</li>
<li>Ankita</li>
</ul>, '\n']

Since the .children property returns a list_iterator, we can use a for loop to traverse the hierarchy.

for child in tag.children:
   print (child)

Output

<li>Accounts</li>

<ul>
<li>Anand</li>
<li>Mahesh</li>
</ul>

<li>HR</li>

<ul>
<li>Rani</li>
<li>Ankita</li>
</ul>

Using findChildren() method

The findChildren() method offers a more comprehensive alternative. It returns all the child elements under any top level tag.

In the index.html document, we have two nested unordered lists. The top level <ul> element has id = "dept" and the two enclosed lists are having id = "acc' and "HR' respectively.

In the following example, we first instantiate a Tag object pointing to top level <ul> element and extract the list of children under it.

from bs4 import BeautifulSoup

fp = open('index.html')

soup = BeautifulSoup(fp, 'html.parser')

tag = soup.find("ul", {"id": "dept"})

children = tag.findChildren()
 
for child in children:
   print(child)

Note that the resultset includes the children under an element in a recursive fashion. Hence, in the following output, you'll find the entire inner list, followed by individual elements in it.

<li>Accounts</li>
<ul id="acc">
<li>Anand</li>
<li>Mahesh</li>
</ul>
<li>Anand</li>
<li>Mahesh</li>
<li>HR</li>
<ul id="HR">
<li>Rani</li>
<li>Ankita</li>
</ul>
<li>Rani</li>
<li>Ankita</li>

Let us extract the children under an inner <ul> element with id='acc'. Here is the code −

Example

from bs4 import BeautifulSoup

fp = open('index.html')

soup = BeautifulSoup(fp, 'html.parser')

tag = soup.find("ul", {"id": "acc"})

children = tag.findChildren()
 
for child in children:
	print(child)

When the above program is run, you'll obtain the <li>elements under the <ul> with id as acc.

Output

<li>Anand</li>
<li>Mahesh</li>

Thus, BeautifulSoup makes it very easy to parse the children elements under any top level HTML element.

Advertisements