Olympics Data Analysis Using Python

The contemporary Olympic Games, sometimes known as the Olympics, are major international sporting events that feature summer and winter sports contests in which thousands of participants from all over the world compete in a range of disciplines. With over 200 nations competing, the Olympic Games are regarded as the world's premier sporting event. In this article, we will examine the Olympics using Python. Let’s begin.

Importing necessary libraries

!pip install pandas !pip install numpy import numpy as np import pandas as pd import seaborn as sns from matplotlib import pyplot as plt

Importing and understanding the dataset

We have two CSV files when dealing with Olympic data. One detailing the total sports-related expenses of all Olympic Games. Another has information on athletes from all years who competed with information.

You can get a CSV data file by clicking here −

data = pd.read_csv('/content/sample_data/athlete_events.csv') # data.head() display first 5 entry print(data.head(), data.describe(), data.info())

Merging both datasets

# regions and country noc data CSV file regions = pd.read_csv('/content/sample_data/datasets_31029_40943_noc_regions.csv') print(regions.head()) # merging to data and regions frame merged = pd.merge(data, regions, on='NOC', how='left') print(merged.head())

From here the data analysis starts.

Data analysis of Gold analysis


#creating goldmedal dataframes goldMedals = merged[(merged.Medal == 'Gold')] print(goldMedals.head())


 ID                     Name    Sex   Age  Height  Weight            Team  \
3    4     Edgar Lindenau Aabye   M  34.0     NaN     NaN  Denmark/Sweden   
42  17  Paavo Johannes Aaltonen   M  28.0   175.0    64.0         Finland   
44  17  Paavo Johannes Aaltonen   M  28.0   175.0    64.0         Finland   
48  17  Paavo Johannes Aaltonen   M  28.0   175.0    64.0         Finland   
60  20       Kjetil Andr Aamodt   M  20.0   176.0    85.0          Norway   

    NOC        Games  Year  Season         City          Sport  \
3   DEN  1900 Summer  1900  Summer        Paris     Tug-Of-War   
42  FIN  1948 Summer  1948  Summer       London     Gymnastics   
44  FIN  1948 Summer  1948  Summer       London     Gymnastics   
48  FIN  1948 Summer  1948  Summer       London     Gymnastics   
60  NOR  1992 Winter  1992  Winter  Albertville  Alpine Skiing   

                               Event Medal   region notes  
3        Tug-Of-War Men's Tug-Of-War  Gold  Denmark   NaN  
42  Gymnastics Men's Team All-Around  Gold  Finland   NaN  
44      Gymnastics Men's Horse Vault  Gold  Finland   NaN  
48  Gymnastics Men's Pommelled Horse  Gold  Finland   NaN  
60       Alpine Skiing Men's Super G  Gold   Norway   NaN  

Analysis of gold medalists according to age

Here, we'll make a graph showing the number of gold medals in relation to age. For this, we will develop a counterplot for graph representation, with the participants' ages shown on the X-axis and the number of medals on the Y-axis.


plt.figure(figsize=(20, 10)) plt.title('Distribution of Gold Medals') sns.countplot(goldMedals['Age']) plt.show()


Make a new data frame named ‘masterDisciplines’ in which to place this new group of people. Then, use that data frame to make a visualization.


masterDisciplines = goldMedals['Sport'][goldMedals['Age'] > 50] plt.figure(figsize=(20, 10)) plt.tight_layout() sns.countplot(masterDisciplines) plt.title('Gold Medals for Athletes Over 50') plt.show()


Analysis women won the medals


womenInOlympics = merged[(merged.Sex == 'F') & (merged.Season == 'Summer')] print(womenInOlympics.head(10)) sns.set(style="darkgrid") plt.figure(figsize=(20, 10)) sns.countplot(x='Year', data=womenInOlympics) plt.title('Women medals per edition of the Games') plt.show()


Analyzing the top 5 countries that won the medal


print(goldMedals.region.value_counts().reset_index(name='Medal').head()) totalGoldMedals = goldMedals.region.value_counts().reset_index(name='Medal').head(5) g = sns.catplot(x="index", y="Medal", data=totalGoldMedals, height=6, kind="bar", palette="muted") g.despine(left=True) g.set_xlabels("Top 5 countries") g.set_ylabels("Number of Medals") plt.title('Medals per Country') plt.show()


Evolution of athletes over time


MenOverTime = merged[(merged.Sex == 'M') & (merged.Season == 'Summer')] WomenOverTime = merged[(merged.Sex == 'F') & (merged.Season == 'Summer')] part = MenOverTime.groupby('Year')['Sex'].value_counts() plt.figure(figsize=(20, 10)) part.loc[:,'M'].plot() plt.title('Variation of Male Athletes over time')



part = WomenOverTime.groupby('Year')['Sex'].value_counts() plt.figure(figsize=(20, 10)) part.loc[:,'F'].plot() plt.title('Variation of Female Athletes over time')



We have gone through some analysis of the data, you can also go further and figure out more insights.

Updated on: 01-Dec-2022

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started