Article Categories

Selected Reading

How to select subset of data with Index Labels in Python Pandas?

Python Server Side Programming Programming Pandas

Pandas provides powerful selection capabilities to extract subsets of data using either index positions or index labels. This article demonstrates how to select data using index labels with the .loc accessor.

The .loc attribute works similar to Python dictionaries, selecting data by index labels rather than positions. This is different from .iloc which selects by integer position like Python lists.

Setting Up the Dataset

Let's start by importing a movies dataset with the title as the index ?

import pandas as pd

movies = pd.read_csv("https://raw.githubusercontent.com/sasankac/TestDataSet/master/movies_data.csv",
                     index_col="title",
                     usecols=["title", "budget", "vote_average", "vote_count"])

# Sort the index for better performance with string indices
movies.sort_index(inplace=True)
print(movies.head(3))

                       budget  vote_average  vote_count
title                                                  
#Horror               1500000           3.3          52
(500) Days of Summer  7500000           7.2        2904
10 Cloverfield Lane  15000000           6.8        2468

Selecting Single Rows by Label

Use square brackets [] with .loc to select a single row by its index label ?

# Extract data for "Spider-Man 3"
print(movies.loc["Spider-Man 3"])

budget          2.580000e+08
vote_average    5.900000e+00
vote_count      3.576000e+03
Name: Spider-Man 3, dtype: float64

Selecting Ranges with Slicing

Use slicing to extract multiple rows between two labels. Unlike Python lists, string label slicing is inclusive of both endpoints ?

# Select all movies from "Alien" to "Avatar" (inclusive)
result = movies.loc["Alien":"Avatar"]
print(f"Selected {len(result)} movies")
print(result.head())

Selected 167 movies
                     budget  vote_average  vote_count
title                                               
Alien              11000000           7.9        4470
Alien Zone                0           4.0           3
Alien: Resurrection 70000000           5.9        1365
Aliens             18500000           7.7        3220
Aliens in the Attic 45000000           5.3         244

Selecting Multiple Non-Consecutive Rows

Pass a list of labels to select specific movies that aren't adjacent ?

# Select specific movies by passing a list
selected_movies = movies.loc[["Avatar", "Avengers: Age of Ultron"]]
print(selected_movies)

                        budget  vote_average  vote_count
title                                                  
Avatar                237000000           7.2       11800
Avengers: Age of Ultron 280000000           7.3        6767

Handling Missing Labels

Modern pandas raises a KeyError when you try to select non-existent labels. You can check if a label exists first ?

# Check if a movie exists in the index
movie_exists = "When is Avengers next movie?" in movies.index
print(f"Movie exists: {movie_exists}")

# Alternative approach using query() to ignore missing labels
result = movies.query("index in ['Avatar', 'When is Avengers next Movie?']")
print(result)

Movie exists: False
        budget  vote_average  vote_count
title                                  
Avatar  237000000           7.2       11800

Key Points

Feature	Behavior	Example
Single selection	Returns Series	`.loc["Avatar"]`
List selection	Returns DataFrame	`.loc[["Avatar", "Alien"]]`
Slice selection	Inclusive of endpoints	`.loc["Alien":"Avatar"]`
Missing labels	Raises KeyError	Check with `in movies.index`

Conclusion

The .loc accessor provides flexible label-based selection in pandas. Always sort string indices for better performance, and remember that label slicing is inclusive of both endpoints unlike Python lists.

Kiran P

Updated on: 2026-03-25T11:49:42+05:30

2K+ Views

Previous Next