Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python Pandas – How to use Pandas DataFrame Property: shape
The shape property in Pandas DataFrame returns a tuple containing the number of rows and columns. It's essential for understanding your dataset dimensions before performing data analysis operations.
DataFrame.shape Property
The shape property returns (rows, columns) as a tuple. You can access individual values using indexing ?
# Basic syntax df.shape # Returns (rows, columns) df.shape[0] # Number of rows df.shape[1] # Number of columns
Creating Sample Data
Let's create a sample products dataset to demonstrate the shape property ?
import pandas as pd
# Create sample products data
data = {
'id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'product': ['Bike', 'Car', 'Truck', 'Bus', 'Car', 'Car', 'Bike', 'Truck', 'Car', 'Bus'],
'engine': ['Gas', 'Diesel', 'Diesel', 'Diesel', 'Gas', 'Gas', 'Gas', 'Diesel', 'Diesel', 'Gas'],
'avgmileage': [45, 21, 12, 8, 18, 19, 42, 15, 23, 10],
'price': [8500, 16500, 35000, 45000, 17450, 15250, 9200, 38000, 16925, 42000],
'height_mm': [1200, 1530, 2800, 3200, 1530, 1530, 1150, 2750, 1530, 3100],
'width_mm': [800, 1735, 2400, 2500, 1780, 1790, 750, 2350, 1800, 2450],
'productionYear': [2019, 2020, 2021, 2018, 2018, 2019, 2020, 2019, 2018, 2017]
}
df = pd.DataFrame(data)
print("Dataset shape:", df.shape)
print("Rows:", df.shape[0], "Columns:", df.shape[1])
Dataset shape: (10, 8) Rows: 10 Columns: 8
Method 1: Using iloc with Column Index
Filter the first ten rows and find cars using column index position ?
import pandas as pd
# Create the same sample data
data = {
'id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'product': ['Bike', 'Car', 'Truck', 'Bus', 'Car', 'Car', 'Bike', 'Truck', 'Car', 'Bus'],
'engine': ['Gas', 'Diesel', 'Diesel', 'Diesel', 'Gas', 'Gas', 'Gas', 'Diesel', 'Diesel', 'Gas'],
'avgmileage': [45, 21, 12, 8, 18, 19, 42, 15, 23, 10],
'price': [8500, 16500, 35000, 45000, 17450, 15250, 9200, 38000, 16925, 42000],
'height_mm': [1200, 1530, 2800, 3200, 1530, 1530, 1150, 2750, 1530, 3100],
'width_mm': [800, 1735, 2400, 2500, 1780, 1790, 750, 2350, 1800, 2450],
'productionYear': [2019, 2020, 2021, 2018, 2018, 2019, 2020, 2019, 2018, 2017]
}
df = pd.DataFrame(data)
print("Rows:", df.shape[0], "Columns:", df.shape[1])
# Get first 10 rows (all rows in this case)
df1 = df.iloc[0:10, :]
# Filter cars using column index (product column is at index 1)
cars_data = df1[df1.iloc[:, 1] == 'Car']
print(cars_data)
Rows: 10 Columns: 8 id product engine avgmileage price height_mm width_mm productionYear 1 2 Car Diesel 21 16500 1530 1735 2020 4 5 Car Gas 18 17450 1530 1780 2018 5 6 Car Gas 19 15250 1530 1790 2019 8 9 Car Diesel 23 16925 1530 1800 2018
Method 2: Using head() with Column Name
Use head() method and filter by column name for more readable code ?
import pandas as pd
# Create the same sample data
data = {
'id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'product': ['Bike', 'Car', 'Truck', 'Bus', 'Car', 'Car', 'Bike', 'Truck', 'Car', 'Bus'],
'engine': ['Gas', 'Diesel', 'Diesel', 'Diesel', 'Gas', 'Gas', 'Gas', 'Diesel', 'Diesel', 'Gas'],
'avgmileage': [45, 21, 12, 8, 18, 19, 42, 15, 23, 10],
'price': [8500, 16500, 35000, 45000, 17450, 15250, 9200, 38000, 16925, 42000],
'height_mm': [1200, 1530, 2800, 3200, 1530, 1530, 1150, 2750, 1530, 3100],
'width_mm': [800, 1735, 2400, 2500, 1780, 1790, 750, 2350, 1800, 2450],
'productionYear': [2019, 2020, 2021, 2018, 2018, 2019, 2020, 2019, 2018, 2017]
}
df = pd.DataFrame(data)
print("Rows:", df.shape[0], "Columns:", df.shape[1])
# Get first 10 rows using head()
df1 = df.head(10)
# Filter cars using column name
cars_data = df1[df1['product'] == 'Car']
print(cars_data)
Rows: 10 Columns: 8 id product engine avgmileage price height_mm width_mm productionYear 1 2 Car Diesel 21 16500 1530 1735 2020 4 5 Car Gas 18 17450 1530 1780 2018 5 6 Car Gas 19 15250 1530 1790 2019 8 9 Car Diesel 23 16925 1530 1800 2018
Comparison
| Method | Row Selection | Column Access | Best For |
|---|---|---|---|
| iloc with index | df.iloc[0:10,:] |
df.iloc[:,1] |
Position-based access |
| head() with name | df.head(10) |
df['product'] |
More readable code |
Conclusion
The shape property is essential for understanding DataFrame dimensions. Use df.shape[0] for rows and df.shape[1] for columns. Method 2 with column names is more readable and maintainable.
