Limited rows selection with given column in Pandas


Pandas, a Python package, is now the tool of choice for data scientists and analysts all around the world. Row and column selection from dataframes is one of its many functions for data manipulation and analysis. This article examines, using real-world examples, how to use Pandas to pick a set number of rows with a particular column.

While we emphasise one particular feature of Pandas, keep in mind that the library's capabilities go much beyond this, making it a crucial tool for data processing.

Pandas DataFrame: A Brief Introduction

For Python, Pandas offers a fast, user-friendly data structure (DataFrame) and tools for data analysis. The term "Panel Data," used in econometrics to describe datasets that include observations for the same persons over a number of time periods, is the source of the name "Pandas."

Selecting Limited Rows with Given Columns in Pandas

In data analysis, it is frequently necessary to choose particular rows and columns from a DataFrame. In situations where you're only interested in analysing or modifying a portion of the full dataset, this may be helpful. Here are some ways to use the Pandas library to pick a limited number of rows from a set of columns:

Method 1: Using the iloc and loc methods

Rows and columns can be chosen using the iloc and loc methods, respectively, based on their integer index and label.

Example 1: Using iloc

import pandas as pd

# Create a simple dataframe
data = {
   'Name': ['John', 'Anna', 'Peter', 'Linda', 'Mike'],
   'Age': [28, 24, 35, 32, 30],
   'City': ['New York', 'Paris', 'Berlin', 'London', 'Sydney']
}

df = pd.DataFrame(data)

# Select the first three rows from the 'Name' and 'Age' columns
subset = df.iloc[:3, [0, 1]]
print(subset)

Output

    Name  Age
0   John   28
1   Anna   24
2  Peter   35

Example 2: Using loc

# Select the first three rows from the 'Name' and 'Age' columns
subset = df.loc[:2, ['Name', 'Age']]
print(subset)

Method 2: Using Boolean Indexing

You can choose rows using boolean indexing depending on the DataFrame's real values.

Example 3: Using Boolean Indexing

# Select rows where 'Age' is greater than 30 and only show 'Name' and 'City' columns
subset = df[df['Age'] > 30][['Name', 'City']]
print(subset)

Conclusion

Pandas provides a flexible toolkit for data manipulation and analysis by providing choices for picking only a small number of rows with specific columns. Understanding how to effectively choose data is crucial whether you are undertaking exploratory data analysis or prepping data for machine learning.

Remember that there is much more you can do with Pandas than what is shown in these examples. The extensive features of the library go well beyond this, allowing for more difficult data processing and analysis jobs.

Having the proper questions and understanding how to extract the right subset of data from the larger collection of data is essential for conducting good data analysis. You are prepared to accomplish so with the help of pandas!

Updated on: 18-Jul-2023

64 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements