How to get the Daily News using Python?


Daily News is a word used to describe news that is released every day and focuses on global events and topics. The Daily News's mission is to keep readers informed and up to date with events across the world. Politics, sports, entertainment, science, and technology are just a few of the many areas that the Daily News covers.

The fields of data analysis and web development both make extensive use of the programming language Python. It can be used to make a programme that gathers news articles from several sources and compiles them into a daily news summary. Python packages like Requests and Beautiful Soup can be used for this.

Note − Output might change as the daily news is been updated

Algorithm to Get Daily News in Python

Below are some algorithmic approaches for obtaining daily news using Python.

Step 1 − Determine reputable news sources in Find trustworthy news sites that provide information on a variety of subjects, such as politics, sports, entertainment, science, and technology. This could include well-known news organisations like CNN, BBC, and The New York Times as well as more specialised, smaller periodicals that concentrate on particular locations or issues.

Step 2 − Set up your Python environment in step two. You must set up a Python environment on your computer in order to use Python for scraping news items. Install the most recent versions of any required libraries, including Beautiful Soup and Requests, as well as Python.

Step 3 − Write code to scrape news articles Write Python code that uses the Requests library to make HTTP requests to the websites of the news sources you have identified. Use Beautiful Soup to parse the HTML content and extract the relevant information, such as the headline, article text, author, and publication date. Store this information in a database or data file, along with the URL of the original article.

Step 4 − Examine the articles' texts. Apply natural language processing tools to each article's text analysis to look for patterns or similarities that might point to plagiarism or spinning. The text of each item can be compared to other scraped articles using tools like NLTK (Natural Language Toolkit). Any articles that seem to be recycled or plagiarised should be excluded from the daily news summary.

Step 5 − Make a summary of the day's news. assemble the news articles that were scraped into a daily news summary that should include global events and topics. To make sure the summary is fair and helpful, include a wide range of subjects and points of view.

Step 6 − Make the daily news summary public. Put the daily news summary online or on a social media platform. A Python package called Beautiful Soup (bs4) is used to extract data from HTML and XML files. Python does not include this module by default. Enter the following command in the terminal to install this. so that anyone can access it. Include references to the source articles so that readers can access more details and context.

You can use Python to obtain Daily News by following these steps.

Modules Required

Command To Install Beautiful Soup(bs4)

bs4 − Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal.

pip install bs4

Command To Install Request

Requests − Request makes it incredibly simple to send HTTP/1.1 requests. Additionally, Python does not include this module by default. Enter the following command in the terminal to install this.

pip install requests

Approaches to get Daily News Using Python

Approach 1

In this approach, we are starting with importing modules −

import requests
from bs4 import BeautifulSoup

After this to easily get daily news of any specific news we will add the HTML contents of https://www.bbc.com/news, add these 2 lines of code −

url='https://www.bbc.com/news'
response = requests.get(url)

Visit https://www.bbc.com/news and inspect a news headline by right-clicking on it and selecting "inspect" in order to discover the HTML tags that news headlines are made up of −

As a starting point, we define "soup" as the HTML of the BBC news website. The next step is to define "headlines" as an array of all "h3>" tags on the website. Finally, using the "text.strip()" method to strip each element of its outerHTML and display only its text content, the script iteratively navigates through the "headlines" array and displays each item in turn.

Code for First Approach Implementation

import requests
from bs4 import BeautifulSoup

url = 'https://www.bbc.com/news'
response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')
headlines = soup.find('body').find_all('h3')
for x in headlines:
   print(x.text.strip())

Output for First Approach

Approach 2

Because you may have seen that your output contains duplicate news headlines and text contents that aren't news headlines, in this technique we will remove unneeded data from daily news data.

Syntax

unwanted = ['BBC World News TV', 'BBC World Service Radio', 'News daily newsletter', 'Mobile app', 'Get in touch']

Then print text elements only if they are not in this list by putting −

Code Implementation of Second Approach

import requests
from bs4 import BeautifulSoup

url = 'https://www.bbc.com/news'
response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')
headlines = soup.find('body').find_all('h3')

unwanted = ['BBC World News TV', 'BBC World Service Radio', 'News daily newsletter', 'Mobile app', 'Get in touch']

for x in headlines:
   headline_text = x.text.strip()
   if headline_text not in unwanted:
      print(headline_text)

Output for Second Approach

Conclusion

Python programmers can use web scraping methods and news organisations' APIs to obtain the day's news. Web scraping must, however, be done in accordance with the terms and conditions of the websites being scraped, and excessive queries should not be made to their servers. Utilising web scraping techniques to collect news articles from websites is an additional strategy. Developers can parse HTML and XML documents using Python's BeautifulSoup package to accomplish this. However, it's crucial to be aware of the website's terms and conditions and to refrain from overtaxing its systems.

Updated on: 24-Jul-2023

419 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements