Filter Pandas DataFrame Based on Index


NumPy, which offers high-performance data manipulation and analysis capabilities, is the foundation for the Python package Pandas. It introduces the Series and DataFrame data structures. Any sort of data can be stored in a series, which is a one-dimensional labeled array. It is comparable to a column in a database table or spreadsheet. The Series object is labeled, which means each member has an associated index, making data access and manipulation quick and simple.

Similar to a spreadsheet or a SQL table, a data frame is a two-dimensional tabular data structure made up of rows and columns. It is a collection of series objects and provides strong methods for data processing, filtering, grouping, joining, and many other operations. You can easily execute data cleaning, exploration, and analysis with Pandas by loading data from a variety of file formats, such as CSV or Excel.

Filter() provides a useful means for filtering a dataframe via its indexes, permitting rows and columns to be specifically subset based on their associated labels in the index provided. Following filtration, a new DataFrame containing the filtered results is generated via filter().

Syntax

df.filter(items=None, like=None, regex=None, axis=None)

Items: This function permits a list-like object containing said labels as an argument input. As a result, only rows and columns possessing matching names will be returned upon request, irrespective of one of several labels.

like (optional): Using this argument, you can filter index labels or columns based on a string value. The result will only contain the columns or rows whose names contain the supplied string.

regex (optional): You can choose columns or index labels using the regex (optional) parameter by using a regular expression pattern. Based on the specified regular expression, it filters the columns or rows.

axis (optional): The optional argument axis determines whether to filter rows (axis = 0) or columns (axis = 1). Since it is usually set to None, filtering is done on the columns.

Example 1

An example to demonstrate filtering a dataframe based on numeric value indices using the iloc() function available in pandas

import pandas as pd

# Create a sample DataFrame
data = {'A': [1, 2, 3, 4, 5],
    'B': [6, 7, 8, 9, 10],
    'C': [11, 12, 13, 14, 15]}
df = pd.DataFrame(data)

# Filter rows based on numeric value indexes
filtered_df = df.iloc[[1, 3]]

print(filtered_df)

Output

   A  B   C
1  2  7  12
3  4  9  14

To ensure effective filtration of rows as per numerical value indexes [1, 3], we deploy the filter function. We shall utilize it on df DataFrame's three pertinent columns: namely, 'A', 'B', and 'C'. The filtered DataFrame filtered_df consequently only contains the rows with the indices 1 and 3, but all other columns are kept.

Example 2

An example to demonstrate filtering a dataframe based on non-numeric values

import pandas as pd

# Create a sample DataFrame
data = {'A': [1, 2, 3, 4, 5],
    'B': [6, 7, 8, 9, 10],
    'C': [11, 12, 13, 14, 15]}
df = pd.DataFrame(data)

# Create a mapping dictionary for index label conversion
index_mapping = {1: 'two', 3: 'four'}

# Filter rows based on numeric value indexes and update index labels
filtered_df = df.iloc[[1, 3]].rename(index=index_mapping)

print(filtered_df)

Output

      A  B   C
two   2  7  12
four  4  9  14

Here, we use the filter() method to filter the rows according to the non-numeric value indices [two, four]. The filtered DataFrame keeps all columns and the resultant contains only the rows with indices "two" and "four."

Example 3

Here, we illustrate on filtering a DataFrame and extracting indices that contain a specific character.

import pandas as pd

# Create a sample DataFrame
sample_data = {'A': [1, 2, 3, 4, 5],
    'B': [6, 7, 8, 9, 10],
    'C': [11, 12, 13, 14, 15]}
df = pd.DataFrame(sample_data, index=['apple', 'banana', 'orange', 'grape', 'kiwi'])

# Filter rows based on indexes containing a specific character
filtered_df = df[df.index.str.contains('a')]

print(filtered_df)

Output

        A  B   C
apple   1  6  11
banana  2  7  12
orange  3  8  13
grape   4  9  14

Columns 'A', 'B', and 'C' are the columns of the DataFrame and it has the indices "Apple," "Banana," "Orange," "Grape," and "Kiwi." To determine whether each index contains the letter "a", we use the contains() method. This boolean criterion is then used to filter the rows of the data frame.

Conclusion

When working with data analysis and manipulation activities, it is essential to filter a data frame based on index values. When working with enormous datasets, it gives us the flexibility and efficiency to extract pertinent subsets of data for additional analysis or processing.

Updated on: 10-Aug-2023

170 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements