How to select subset of data with Index Labels in Python Pandas?

Pandas provides powerful selection capabilities to extract subsets of data using either index positions or index labels. This article demonstrates how to select data using index labels with the .loc accessor.

The .loc attribute works similar to Python dictionaries, selecting data by index labels rather than positions. This is different from .iloc which selects by integer position like Python lists.

Setting Up the Dataset

Let's start by importing a movies dataset with the title as the index ?

import pandas as pd

movies = pd.read_csv("https://raw.githubusercontent.com/sasankac/TestDataSet/master/movies_data.csv",
                     index_col="title",
                     usecols=["title", "budget", "vote_average", "vote_count"])

# Sort the index for better performance with string indices
movies.sort_index(inplace=True)
print(movies.head(3))
                       budget  vote_average  vote_count
title                                                  
#Horror               1500000           3.3          52
(500) Days of Summer  7500000           7.2        2904
10 Cloverfield Lane  15000000           6.8        2468

Selecting Single Rows by Label

Use square brackets [] with .loc to select a single row by its index label ?

# Extract data for "Spider-Man 3"
print(movies.loc["Spider-Man 3"])
budget          2.580000e+08
vote_average    5.900000e+00
vote_count      3.576000e+03
Name: Spider-Man 3, dtype: float64

Selecting Ranges with Slicing

Use slicing to extract multiple rows between two labels. Unlike Python lists, string label slicing is inclusive of both endpoints ?

# Select all movies from "Alien" to "Avatar" (inclusive)
result = movies.loc["Alien":"Avatar"]
print(f"Selected {len(result)} movies")
print(result.head())
Selected 167 movies
                     budget  vote_average  vote_count
title                                               
Alien              11000000           7.9        4470
Alien Zone                0           4.0           3
Alien: Resurrection 70000000           5.9        1365
Aliens             18500000           7.7        3220
Aliens in the Attic 45000000           5.3         244

Selecting Multiple Non-Consecutive Rows

Pass a list of labels to select specific movies that aren't adjacent ?

# Select specific movies by passing a list
selected_movies = movies.loc[["Avatar", "Avengers: Age of Ultron"]]
print(selected_movies)
                        budget  vote_average  vote_count
title                                                  
Avatar                237000000           7.2       11800
Avengers: Age of Ultron 280000000           7.3        6767

Handling Missing Labels

Modern pandas raises a KeyError when you try to select non-existent labels. You can check if a label exists first ?

# Check if a movie exists in the index
movie_exists = "When is Avengers next movie?" in movies.index
print(f"Movie exists: {movie_exists}")

# Alternative approach using query() to ignore missing labels
result = movies.query("index in ['Avatar', 'When is Avengers next Movie?']")
print(result)
Movie exists: False
        budget  vote_average  vote_count
title                                  
Avatar  237000000           7.2       11800

Key Points

Feature Behavior Example
Single selection Returns Series .loc["Avatar"]
List selection Returns DataFrame .loc[["Avatar", "Alien"]]
Slice selection Inclusive of endpoints .loc["Alien":"Avatar"]
Missing labels Raises KeyError Check with in movies.index

Conclusion

The .loc accessor provides flexible label-based selection in pandas. Always sort string indices for better performance, and remember that label slicing is inclusive of both endpoints unlike Python lists.

Updated on: 2026-03-25T11:49:42+05:30

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements