
- Python Basic Tutorial
- Python - Home
- Python - Overview
- Python - Environment Setup
- Python - Basic Syntax
- Python - Comments
- Python - Variables
- Python - Data Types
- Python - Operators
- Python - Decision Making
- Python - Loops
- Python - Numbers
- Python - Strings
- Python - Lists
- Python - Tuples
- Python - Dictionary
- Python - Date & Time
- Python - Functions
- Python - Modules
- Python - Files I/O
- Python - Exceptions
Python Program to crawl a web page and get most frequent words
Our task is to crawl a web page and count the frequency of the word. And ultimately retrieving most frequent words.
First we are using request and beautiful soup module and with the help of these module creating web-crawler and extract data from web page and store in a list.
Example code
import requests from bs4 import BeautifulSoup import operator from collections import Counter def my_start(url): my_wordlist = [] my_source_code = requests.get(url).text my_soup = BeautifulSoup(my_source_code, 'html.parser') for each_text in my_soup.findAll('div', {'class':'entry-content'}): content = each_text.text words = content.lower().split() for each_word in words: my_wordlist.append(each_word) clean_wordlist(my_wordlist) # Function removes any unwanted symbols def clean_wordlist(wordlist): clean_list =[] for word in wordlist: symbols = '!@#$%^&*()_-+={[}]|\;:"<>?/., ' for i in range (0, len(symbols)): word = word.replace(symbols[i], '') if len(word) > 0: clean_list.append(word) create_dictionary(clean_list) def create_dictionary(clean_list): word_count = {} for word in clean_list: if word in word_count: word_count[word] += 1 else: word_count[word] = 1 c = Counter(word_count) # returns the most occurring elements top = c.most_common(10) print(top) # Driver code if __name__ == '__main__': my_start("https://www.tutorialspoint.com/python3/python_overview.htm/")
Output

- Related Articles
- Python program to find Most Frequent Character in a String
- Find the k most frequent words from data set in Python
- Finding n most frequent words from a sentence in JavaScript
- Python program for most frequent word in Strings List
- Program to find frequency of the most frequent element in Python
- Program to find most frequent subtree sum of a binary tree in Python
- C# program to find the most frequent element
- Program to find second most frequent character in C++
- How to get the protocol and page path of the current web page in JavaScript?
- Find most frequent element in a list in Python
- Save a Web Page with Python Selenium
- Get the Most Frequent Element in an Array in Java
- How to get the web page contents from a WebView in Android?
- Program to find out the index of the most frequent element in a concealed array in Python
- Difference Between Web page and Website

Advertisements