Implementing web scraping using lxml in Python Programming


In this article, we will learn about the web scraping technique using lxml module available in Python.

What is web scraping?

Web scraping is used to obtain/get the data from a website with the help of a crawler/scanner. Web scrapping comes handy to extract the data from a web page that doesn't offer the functionality of an API. In python, web scraping can be done with the help of various modules namely Beautiful Soup, Scrappy & lxml.

Here we will discuss web scraping using the lxml module.

For that, we first need to install lxml.

Type in the terminal or command prompt −

>>> pip install lxml

Here xpath is used to access the data.

In this article, we will extract data from the website known as steam containing information about different games.

https://store.steampowered.com/genre/Free%20to%20Play/

On the page, we will try to extract information from the popular new releases section. Here we will extract names, prices, tags associated & target platform.

On the page see the Html code of the new releases tab by using the inspect element feature in the chrome. Here we will get to know which tag is storing the required information.

Here in this website; every list element is encapsulated in a div tag id=tab_content which is further encapsulated in

a div tag id=tab_select_newreleases

Now let's see the implementation

raja
Published on 11-Sep-2019 15:09:48
Advertisements