Article Categories

Selected Reading

Fetch top 10 starred repositories of user on GitHub using Python?

Python Server Side Programming Programming

GitHub is the world's largest platform for version control and collaborative development. You can scrape GitHub's trending repositories page to fetch the top 10 most starred repositories within a specific timeframe using Python's requests and BeautifulSoup libraries.

This tutorial demonstrates how to scrape GitHub's trending page, extract repository information, and save the results to a file with proper formatting.

Required Libraries

First, ensure you have the necessary libraries installed ?

pip install requests beautifulsoup4 lxml

Complete Implementation

Here's the complete code to fetch and display the top 10 trending repositories ?

import requests
from bs4 import BeautifulSoup

# Fetch the trending repositories page
r = requests.get('https://github.com/trending/python?since=monthly')
bs = BeautifulSoup(r.text, 'lxml')

# Find all repository containers
repo_containers = bs.find_all('article', class_='Box-row')

# Open file to store results
with open('starred-repos.txt', 'w') as f1:
    # Write header
    f1.write('{}\t{}\t\t{}\n\n'.format('Position', 'Owner', 'Repository'))
    
    # Process top 10 repositories
    for i, container in enumerate(repo_containers[:10]):
        # Extract repository link
        repo_link = container.find('h2').find('a')
        if repo_link:
            href = repo_link.get('href')
            # Split the href to get owner and repo name
            parts = href.strip('/').split('/')
            if len(parts) >= 2:
                owner = parts[0]
                repo_name = parts[1]
                
                # Write to file
                f1.write('{}.\t{}\t\t{}\n'.format(i + 1, owner, repo_name))

# Read and display the results
print("Top 10 Trending Python Repositories:")
print("-" * 50)
with open('starred-repos.txt', 'r') as f1:
    print(f1.read())

How the Code Works

The script follows these key steps:

Web Scraping: Uses requests.get() to fetch the GitHub trending page
HTML Parsing: BeautifulSoup parses the HTML content with the lxml parser
Data Extraction: Finds repository containers and extracts owner/repository names from href attributes
File Operations: Saves formatted results to a text file and displays them

Alternative Approach Using GitHub API

For more reliable data access, consider using the GitHub API instead of web scraping ?

import requests
import json
from datetime import datetime, timedelta

# Calculate date for last month
last_month = (datetime.now() - timedelta(days=30)).strftime('%Y-%m-%d')

# GitHub API endpoint for searching repositories
url = f'https://api.github.com/search/repositories?q=created:>{last_month}&sort=stars&order=desc&per_page=10'

response = requests.get(url)
data = response.json()

print("Top 10 Most Starred Repositories (Last Month):")
print("-" * 50)

for i, repo in enumerate(data['items'], 1):
    print(f"{i}. {repo['owner']['login']}/{repo['name']} - {repo['stargazers_count']} stars")

Key Points

Web Scraping: GitHub's HTML structure may change, making scraped code fragile
API Approach: More reliable and provides structured JSON data
Rate Limits: GitHub API has rate limits; consider authentication for higher limits
Error Handling: Add try-except blocks for production use

Expected Output Format

Position    Owner           Repository

1.          microsoft       vscode
2.          tensorflow      tensorflow  
3.          facebook        react
4.          vuejs           vue
5.          angular         angular
6.          nodejs          node
7.          kubernetes      kubernetes
8.          moby            moby
9.          golang          go
10.         atom            atom

Conclusion

This tutorial shows how to scrape GitHub's trending page using BeautifulSoup and save the results to a file. For production applications, consider using the GitHub API for more reliable and structured data access.

Jennifer Nicholas

Updated on: 2026-03-25T05:36:14+05:30

273 Views

Previous Next