- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
HyperText Markup Language support in Python?
Python has the capability to process the HTML files through the HTMLParser class in the html.parser module. It can detect the nature of the HTML tags their position and many other properties of the tags. It has functions which can also identify and fetch the data present in an HTML file.
In the below example we see how to use the HTMLParser class to create a custom parser class which can only process the tags and data that we define in the class. Here we are processing the start tag, end tag and data.
Below is the html which is getting processed by the python custom parser.
Example
<html> <br> <head> <br> <title>welcome to Tutorials Point!</title> <br> </head> <br> <body> <br> <h1>Learn anything !</h1> <br> </body> <br> </html>
Below is the program which parser the above file and then outputs the result as per a custom parser.
Example
from html.parser import HTMLParser import io class Custom_Parser(HTMLParser): def handle_starttag(self, tag, attrs): print("Line and Offset ==", HTMLParser.getpos(self)) print("Encountered a start tag:", tag) def handle_endtag(self, tag): print("Line and Offset ==", HTMLParser.getpos(self)) print("Encountered an end tag :", tag) def handle_data(self, data): print("Line and Offset ==", HTMLParser.getpos(self)) print("Encountered some data :", data) parser = Custom_Parser() stream = io.open("E:\test.html", "r") parser.feed(stream.read())
Output
Running the above code gives us the following result −
Line and Offset == (1, 0) Encountered a start tag: html Line and Offset == (1, 6) Encountered some data : Line and Offset == (2, 0) Encountered a start tag: head Line and Offset == (2, 6) Encountered some data : Line and Offset == (3, 0) Encountered a start tag: title Line and Offset == (3, 7) Encountered some data : welcome to Tutorials Point! Line and Offset == (3, 34) Encountered an end tag : title Line and Offset == (3, 42) Encountered some data : Line and Offset == (4, 0) Encountered an end tag : head Line and Offset == (4, 7) Encountered some data : Line and Offset == (5, 0) Encountered a start tag: body Line and Offset == (5, 6) Encountered some data : Line and Offset == (6, 0) Encountered a start tag: h1 Line and Offset == (6, 4) Encountered some data : Learn anything ! Line and Offset == (6, 20) Encountered an end tag : h1 Line and Offset == (6, 25) Encountered some data : Line and Offset == (7, 0) Encountered an end tag : body Line and Offset == (7, 7) Encountered some data : Line and Offset == (8, 0) Encountered an end tag : html
- Related Articles
- What are the platforms that support Java programming language?
- Difference Between Hypertext and Hyperlink
- Difference Between Hypertext and Hypermedia
- Support for Enumerations in Python
- W3C Markup Validator for HTML5
- How to markup postal address in HTML?
- Does Python support polymorphism?
- Python class browser support
- enum - Support for enumerations in Python
- Where should I put tags in HTML markup?
- Does Python support multiple inheritance?
- Memory-mapped file support in Python (mmap)?
- Python Support for gzip files (gzip)
- Python Support for bzip2 compression (bz2)
- Support for line-oriented command interpreters in Python

Advertisements