Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to Count the Number of Lines in a CSV File in Python?
Counting lines in a CSV file is a common task in data analysis. Python provides several approaches using Pandas, from simple DataFrame methods to file-level operations.
Prerequisites
First, ensure you have Pandas installed ?
pip install pandas
Sample CSV File
Let's create a sample CSV file to work with ?
import pandas as pd
# Create sample data
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'Age': [25, 30, 35, 28, 32],
'City': ['New York', 'London', 'Tokyo', 'Paris', 'Sydney']
}
df = pd.DataFrame(data)
df.to_csv('sample.csv', index=False)
print("Sample CSV created:")
print(df)
Sample CSV created:
Name Age City
0 Alice 25 New York
1 Bob 30 London
2 Charlie 35 Tokyo
3 Diana 28 Paris
4 Eve 32 Sydney
Using DataFrame Shape Attribute
The shape attribute returns a tuple of (rows, columns). The first element gives us the line count ?
import pandas as pd
# Read the CSV file
df = pd.read_csv('sample.csv')
# Get number of lines using shape attribute
num_lines = df.shape[0]
total_columns = df.shape[1]
print(f"Number of lines: {num_lines}")
print(f"Number of columns: {total_columns}")
Number of lines: 5 Number of columns: 3
Using the len() Function
The len() function directly returns the number of rows in the DataFrame ?
import pandas as pd
# Read the CSV file
df = pd.read_csv('sample.csv')
# Count lines using len() function
num_lines = len(df)
print(f"Number of lines: {num_lines}")
Number of lines: 5
Counting Lines Without Loading Entire File
For very large CSV files, you can count lines without loading all data into memory ?
import csv
def count_csv_lines(filename):
with open(filename, 'r') as file:
csv_reader = csv.reader(file)
line_count = sum(1 for row in csv_reader)
return line_count
# Count lines in our sample file
line_count = count_csv_lines('sample.csv')
print(f"Total lines (including header): {line_count}")
print(f"Data lines (excluding header): {line_count - 1}")
Total lines (including header): 6 Data lines (excluding header): 5
Comparison of Methods
| Method | Memory Usage | Speed | Best For |
|---|---|---|---|
df.shape[0] |
High | Fast | When you need the data anyway |
len(df) |
High | Fast | Simple syntax, data analysis |
csv.reader |
Low | Slow | Very large files, memory constraints |
Including Header Count
Remember that CSV files typically have headers. Here's how to handle both scenarios ?
import pandas as pd
# Read CSV file
df = pd.read_csv('sample.csv')
# Data rows only (excluding header)
data_lines = len(df)
# Total lines including header
total_lines = data_lines + 1
print(f"Data lines: {data_lines}")
print(f"Total lines (with header): {total_lines}")
Data lines: 5 Total lines (with header): 6
Conclusion
Use len(df) or df.shape[0] for most cases when working with DataFrames. For very large files where memory is a concern, use the csv.reader approach to count lines without loading all data.
