- Beautiful Soup Tutorial
- Beautiful Soup - Home
- Beautiful Soup - Overview
- Beautiful Soup - Web Scraping
- Beautiful Soup - Installation
- Beautiful Soup - Souping the Page
- Beautiful Soup - Kinds of objects
- Beautiful Soup - Inspect Data Source
- Beautiful Soup - Scrape HTML Content
- Beautiful Soup - Navigating by Tags
- Beautiful Soup - Find Elements by ID
- Beautiful Soup - Find Elements by Class
- Beautiful Soup - Find Elements by Attribute
- Beautiful Soup - Searching the Tree
- Beautiful Soup - Modifying the Tree
- Beautiful Soup - Parsing a Section of a Document
- Beautiful Soup - Find all Children of an Element
- Beautiful Soup - Find Element using CSS Selectors
- Beautiful Soup - Find all Comments
- Beautiful Soup - Scraping List from HTML
- Beautiful Soup - Scraping Paragraphs from HTML
- BeautifulSoup - Scraping Link from HTML
- Beautiful Soup - Get all HTML Tags
- Beautiful Soup - Get Text Inside Tag
- Beautiful Soup - Find all Headings
- Beautiful Soup - Extract Title Tag
- Beautiful Soup - Extract Email IDs
- Beautiful Soup - Scrape Nested Tags
- Beautiful Soup - Parsing Tables
- Beautiful Soup - Selecting nth Child
- Beautiful Soup - Search by text inside a Tag
- Beautiful Soup - Remove HTML Tags
- Beautiful Soup - Remove all Styles
- Beautiful Soup - Remove all Scripts
- Beautiful Soup - Remove Empty Tags
- Beautiful Soup - Remove Child Elements
- Beautiful Soup - find vs find_all
- Beautiful Soup - Specifying the Parser
- Beautiful Soup - Comparing Objects
- Beautiful Soup - Copying Objects
- Beautiful Soup - Get Tag Position
- Beautiful Soup - Encoding
- Beautiful Soup - Output Formatting
- Beautiful Soup - Pretty Printing
- Beautiful Soup - NavigableString Class
- Beautiful Soup - Convert Object to String
- Beautiful Soup - Convert HTML to Text
- Beautiful Soup - Parsing XML
- Beautiful Soup - Error Handling
- Beautiful Soup - Trouble Shooting
- Beautiful Soup - Porting Old Code
- Beautiful Soup - Functions Reference
- Beautiful Soup - contents Property
- Beautiful Soup - children Property
- Beautiful Soup - string Property
- Beautiful Soup - strings Property
- Beautiful Soup - stripped_strings Property
- Beautiful Soup - descendants Property
- Beautiful Soup - parent Property
- Beautiful Soup - parents Property
- Beautiful Soup - next_sibling Property
- Beautiful Soup - previous_sibling Property
- Beautiful Soup - next_siblings Property
- Beautiful Soup - previous_siblings Property
- Beautiful Soup - next_element Property
- Beautiful Soup - previous_element Property
- Beautiful Soup - next_elements Property
- Beautiful Soup - previous_elements Property
- Beautiful Soup - find Method
- Beautiful Soup - find_all Method
- Beautiful Soup - find_parents Method
- Beautiful Soup - find_parent Method
- Beautiful Soup - find_next_siblings Method
- Beautiful Soup - find_next_sibling Method
- Beautiful Soup - find_previous_siblings Method
- Beautiful Soup - find_previous_sibling Method
- Beautiful Soup - find_all_next Method
- Beautiful Soup - find_next Method
- Beautiful Soup - find_all_previous Method
- Beautiful Soup - find_previous Method
- Beautiful Soup - select Method
- Beautiful Soup - append Method
- Beautiful Soup - extend Method
- Beautiful Soup - NavigableString Method
- Beautiful Soup - new_tag Method
- Beautiful Soup - insert Method
- Beautiful Soup - insert_before Method
- Beautiful Soup - insert_after Method
- Beautiful Soup - clear Method
- Beautiful Soup - extract Method
- Beautiful Soup - decompose Method
- Beautiful Soup - replace_with Method
- Beautiful Soup - wrap Method
- Beautiful Soup - unwrap Method
- Beautiful Soup - smooth Method
- Beautiful Soup - prettify Method
- Beautiful Soup - encode Method
- Beautiful Soup - decode Method
- Beautiful Soup - get_text Method
- Beautiful Soup - diagnose Method
- Beautiful Soup Useful Resources
- Beautiful Soup - Quick Guide
- Beautiful Soup - Useful Resources
- Beautiful Soup - Discussion
Beautiful Soup - Find all Comments
Inserting comments in a computer code is supposed to be a good programming practice. Comments are helpful for understanding the logic of the program. They also serve as a documentation. You can put comments in a HTML as well as XML script, just as in a program written in C, Java, Python etc. BeautifulSoup API can be helpful to identify all the comments in a HTML document.
In HTML and XML, the comment text is written between <!-- and --> tags.
<!-- Comment Text -->
The BeutifulSoup package, whose internal name is bs4, defines Comment as an important object. The Comment object is a special type of NavigableString object. Hence, the string property of any Tag that is found between <!-- and --> is recognized as a Comment.
Example
from bs4 import BeautifulSoup markup = "<b><!--This is a comment text in HTML--></b>" soup = BeautifulSoup(markup, 'html.parser') comment = soup.b.string print (comment, type(comment))
Output
This is a comment text in HTML <class 'bs4.element.Comment'>
To search for all the occurrences of comment in a HTML document, we shall use find_all() method. Without any argument, find_all() returns all the elements in the parsed HTML document. You can pass a keyword argument 'string' to find_all() method. We shall assign the return value of a function iscomment() to it.
comments = soup.find_all(string=iscomment)
The iscomment() function verifies if the text in a tag is a comment object or not, with the help of isinstance() function.
def iscomment(elem): return isinstance(elem, Comment)
The comments variable shall store all the comment text occurrences in the given HTML document. We shall use the following index.html file in the example code −
<html> <head> <!-- Title of document --> <title>TutorialsPoint</title> </head> <body> <!-- Page heading --> <h2>Departmentwise Employees</h2> <!-- top level list--> <ul id="dept"> <li>Accounts</li> <ul id='acc'> <!-- first inner list --> <li>Anand</li> <li>Mahesh</li> </ul> <li>HR</li> <ul id="HR"> <!-- second inner list --> <li>Rani</li> <li>Ankita</li> </ul> </ul> </body> </html>
The following Python program scrapes the above HTML document, and finds all the comments in it.
Example
from bs4 import BeautifulSoup, Comment fp = open('index.html') soup = BeautifulSoup(fp, 'html.parser') def iscomment(elem): return isinstance(elem, Comment) comments = soup.find_all(string=iscomment) print (comments)
Output
[' Title of document ', ' Page heading ', ' top level list', ' first inner list ', ' second inner list ']
The above output shows a list of all comments. We can also use a for loop over the collection of comments.
Example
i=0 for comment in comments: i+=1 print (i,".",comment)
Output
1 . Title of document 2 . Page heading 3 . top level list 4 . first inner list 5 . second inner list
In this chapter, we learned how to extract all the comment strings in a HTML document.