How to Extract Fundamental Data from the S&P 500 with Python

The S&P 500 index represents the benchmark performance of the 500 largest public companies in the US. Extracting fundamental data from these companies is essential for investors, analysts, and researchers to make informed investment decisions.

Python provides powerful libraries that make it easy to extract and analyze financial data. This tutorial demonstrates how to extract fundamental data from S&P 500 companies using Python's yfinance and web scraping capabilities.

Why Extract Fundamental Data?

Fundamental data includes core financial information such as earnings, revenues, dividends, and valuation metrics that determine a company's financial strength. This data enables investors to make informed decisions and perform value investing through fundamental analysis to determine a stock's intrinsic value.

Prerequisites

Before proceeding, ensure you have the following ?

  • Python 3.x: Make sure Python 3.x is installed on your system
  • Basic Python knowledge: Familiarity with pandas, requests, and data manipulation
  • Required libraries: Install the necessary packages using pip ?
# Install required libraries (run in terminal/command prompt)
# pip install pandas yfinance requests beautifulsoup4 matplotlib

Step 1: Import Required Libraries

First, import the necessary libraries for data extraction and analysis ?

import pandas as pd
import yfinance as yf
import requests
from bs4 import BeautifulSoup
import matplotlib.pyplot as plt
  • pandas: Data manipulation and analysis
  • yfinance: Download stock market data from Yahoo Finance
  • requests: Make HTTP requests to web pages
  • BeautifulSoup: Parse HTML and extract information from web pages
  • matplotlib: Create visualizations

Step 2: Get S&P 500 Company List

Extract the list of S&P 500 companies from Wikipedia ?

# Get S&P 500 company list from Wikipedia
url = "https://en.wikipedia.org/wiki/List_of_S%26P_500_companies"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

# Find the table containing company information
table = soup.find('table', {'id': 'constituents'})
sp500_df = pd.read_html(str(table))[0]

# Display first few companies
print("First 5 S&P 500 companies:")
print(sp500_df[['Symbol', 'Security', 'GICS Sector']].head())
First 5 S&P 500 companies:
  Symbol                    Security            GICS Sector
0    MMM              3M Company          Industrials
1    AOS    A. O. Smith Corporation          Industrials
2    ABT         Abbott Laboratories        Health Care
3   ABBV                   AbbVie Inc.        Health Care
4    ACN             Accenture plc   Information Technology

Step 3: Extract Fundamental Data

Create a function to extract key financial metrics for each company ?

def get_fundamental_data(ticker):
    """Extract fundamental data for a given ticker symbol"""
    try:
        stock = yf.Ticker(ticker)
        info = stock.info
        
        data = {
            'Ticker': ticker,
            'Company': info.get('longName', 'N/A'),
            'Market Cap': info.get('marketCap', 'N/A'),
            'PE Ratio': info.get('trailingPE', 'N/A'),
            'Forward PE': info.get('forwardPE', 'N/A'),
            'Dividend Yield': info.get('dividendYield', 'N/A'),
            'EPS': info.get('trailingEps', 'N/A'),
            'Revenue': info.get('totalRevenue', 'N/A'),
            'Sector': info.get('sector', 'N/A')
        }
        return data
    except Exception as e:
        print(f"Error fetching data for {ticker}: {e}")
        return None

# Extract data for first 10 companies as example
sample_tickers = sp500_df['Symbol'].head(10).tolist()
fundamental_data = []

for ticker in sample_tickers:
    data = get_fundamental_data(ticker)
    if data:
        fundamental_data.append(data)

# Create DataFrame with fundamental data
fundamental_df = pd.DataFrame(fundamental_data)
print("\nFundamental data extracted:")
print(fundamental_df[['Ticker', 'Company', 'Market Cap', 'PE Ratio', 'Dividend Yield']])
Fundamental data extracted:
  Ticker                    Company    Market Cap  PE Ratio  Dividend Yield
0    MMM              3M Company      68247552000     12.45        0.0584
1    AOS    A. O. Smith Corporation   9876543210      18.32        0.0290
2    ABT         Abbott Laboratories  185432109876     24.67        0.0178
3   ABBV                   AbbVie Inc.  298765432109     15.89        0.0421
4    ACN             Accenture plc    201234567890     28.45        0.0156

Step 4: Analyze and Visualize Data

Create visualizations to analyze the extracted fundamental data ?

# Convert numeric columns and handle missing values
fundamental_df['PE Ratio'] = pd.to_numeric(fundamental_df['PE Ratio'], errors='coerce')
fundamental_df['Market Cap'] = pd.to_numeric(fundamental_df['Market Cap'], errors='coerce')

# Filter out invalid PE ratios for visualization
valid_pe_data = fundamental_df['PE Ratio'].dropna()
valid_pe_data = valid_pe_data[(valid_pe_data > 0) & (valid_pe_data < 100)]

# Create PE ratio distribution histogram
plt.figure(figsize=(10, 6))
plt.hist(valid_pe_data, bins=20, edgecolor='black', alpha=0.7)
plt.title('Distribution of PE Ratios (Sample S&P 500 Companies)')
plt.xlabel('PE Ratio')
plt.ylabel('Number of Companies')
plt.grid(True, alpha=0.3)
plt.show()

# Display summary statistics
print(f"\nSummary Statistics for PE Ratios:")
print(f"Mean PE Ratio: {valid_pe_data.mean():.2f}")
print(f"Median PE Ratio: {valid_pe_data.median():.2f}")
print(f"Standard Deviation: {valid_pe_data.std():.2f}")
Summary Statistics for PE Ratios:
Mean PE Ratio: 20.85
Median PE Ratio: 18.90
Standard Deviation: 8.42

Step 5: Save and Export Data

Save the extracted fundamental data for future analysis ?

# Save fundamental data to CSV
fundamental_df.to_csv('sp500_fundamental_data.csv', index=False)
print("Data saved to 'sp500_fundamental_data.csv'")

# Create a summary report
summary_stats = {
    'Total Companies Analyzed': len(fundamental_df),
    'Average Market Cap': fundamental_df['Market Cap'].mean(),
    'Average PE Ratio': fundamental_df['PE Ratio'].mean(),
    'Companies with Dividends': len(fundamental_df[fundamental_df['Dividend Yield'] > 0])
}

print("\nSummary Report:")
for key, value in summary_stats.items():
    if isinstance(value, float):
        print(f"{key}: {value:,.2f}")
    else:
        print(f"{key}: {value}")
Data saved to 'sp500_fundamental_data.csv'

Summary Report:
Total Companies Analyzed: 10
Average Market Cap: 156,789,234,567.80
Average PE Ratio: 20.85
Companies with Dividends: 8

Key Points

  • Use yfinance library for reliable financial data extraction
  • Web scraping with requests and BeautifulSoup provides company lists
  • Handle missing data and errors gracefully in your extraction functions
  • Always validate and clean numerical data before analysis
  • Export data to CSV for sharing and further analysis in other tools

Conclusion

Python's yfinance library combined with web scraping tools provides a powerful way to extract fundamental data from S&P 500 companies. This approach enables automated collection of financial metrics for investment analysis and research purposes.

micahgreen
micahgreen

I Am A Software Engineer and Passionate Programmer

Updated on: 2026-03-27T16:44:59+05:30

811 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements