Beautiful Soup - Inspect Data Source



In order to scrape a web page with BeautifulSoup and Python, your first step for any web scraping project should be to explore the website that you want to scrape. So, first visit the website to understand the site structure before you start extracting the information that's relevant for you.

Let us visit TutorialsPoint's Python Tutorial home page. Open https://www.tutorialspoint.com/python3/index.htm in your browser.

Use Developer tools can help you understand the structure of a website. All modern browsers come with developer tools installed.

If using Chrome browser, open the Developer Tools from the top-right menu button (⋮) and selecting More Tools → Developer Tools.

Developer Tools

With Developer tools, you can explore the site's document object model (DOM) to better understand your source. Select the Elements tab in developer tools. You'll see a structure with clickable HTML elements.

The Tutorial page shows the table of contents in the left sidebar. Right click on any chapter and choose Inspect option.

tutorial_page

For the Elements tab, locate the tag that corresponds to the TOC list, as shown in the figure below −

TOC_list

Right click on the HTML element, copy the HTML element, and paste it in any editor.

html element

The HTML script of the <ul>..</ul> element is now obtained.

<ul class="toc chapters">
   <li class="heading">Python 3 Basic Tutorial</li>
   <li class="current-chapter"><a href="/python3/index.htm">Python 3 - Home</a></li>
   <li><a href="/python3/python3_whatisnew.htm">What is New in Python 3</a></li>
   <li><a href="/python3/python_overview.htm">Python 3 - Overview</a></li>
   <li><a href="/python3/python_environment.htm">Python 3 - Environment Setup</a></li>
   <li><a href="/python3/python_basic_syntax.htm">Python 3 - Basic Syntax</a></li>
   <li><a href="/python3/python_variable_types.htm">Python 3 - Variable Types</a></li>
   <li><a href="/python3/python_basic_operators.htm">Python 3 - Basic Operators</a></li>
   <li><a href="/python3/python_decision_making.htm">Python 3 - Decision Making</a></li>
   <li><a href="/python3/python_loops.htm">Python 3 - Loops</a></li>
   <li><a href="/python3/python_numbers.htm">Python 3 - Numbers</a></li>
   <li><a href="/python3/python_strings.htm">Python 3 - Strings</a></li>
   <li><a href="/python3/python_lists.htm">Python 3 - Lists</a></li>
   <li><a href="/python3/python_tuples.htm">Python 3 - Tuples</a></li>
   <li><a href="/python3/python_dictionary.htm">Python 3 - Dictionary</a></li>
   <li><a href="/python3/python_date_time.htm">Python 3 - Date & Time</a></li>
   <li><a href="/python3/python_functions.htm">Python 3 - Functions</a></li>
   <li><a href="/python3/python_modules.htm">Python 3 - Modules</a></li>
   <li><a href="/python3/python_files_io.htm">Python 3 - Files I/O</a></li>
   <li><a href="/python3/python_exceptions.htm">Python 3 - Exceptions</a></li>
</ul>

We can now load this script in a BeautifulSoup object to parse the document tree.

Advertisements