Python Pandas - Grouping & Aggregation
Python Pandas - GroupBy

Python Pandas - Visualization
Python Pandas - Visualization

Python Pandas - Additional Concepts
Python Pandas - Caveats & Gotchas

Selected Reading

Python Pandas - IO Tools

Quiz

The Pandas library offers powerful I/O tools (API) for data import and export, enabling seamless handling of various file formats like CSV, Excel, JSON, and many more. This API includes top-level reader functions like, pd.read_csv(), read_clipboard() and corresponding writer methods like, to_csv(), to_clipboard() for easy data handling.

In this tutorial, we will learn about the overview of the Pandas I/O tools and learn how to use them effectively.

Overview of Pandas IO Tools

The Pandas I/O API supports a wide variety of data formats. Here is a summary of supported formats and their corresponding reader and writer functions −

Format	Reader Function	Writer Function
Tabular Data	read_table()	NA
CSV	read_csv()	to_csv()
Fixed-Width Text File	read_fwf()	NA
Clipboard	read_clipboard()	to_clipboard()
Pickling	read_pickle()	to_pickle()
Excel	read_excel()	to_excel()
JSON	read_json()	to_json()
HTML	read_html()	to_html()
XML	read_xml()	to_xml()
LaTeX	NA	to_latex()
HDF5 Format	read_hdf()	to_hdf()
Feather	read_feather()	to_feather()
Parquet	read_parquet()	to_parquet()
ORC	read_orc()	to_orc()
SQL	read_sql()	to_sql()
Stata	read_stata()	to_stata()

Among these, the most frequently used functions for handling text files are read_csv() and read_table(). Both convert flat files into DataFrame objects.

Example: Reading CSV Data

This example shows reading the CSV data using the pandas read_csv() function. In this example we are using the StringIO to load the CSV string into a Pandas DataFrame object.

import pandas as pd

# Import StringIO to load a file-like object for reading CSV
from io import StringIO

# Create string representing CSV data
data = """S.No,Name,Age,City,Salary
1,Tom,28,Toronto,20000
2,Lee,32,HongKong,3000
3,Steven,43,Bay Area,8300
4,Ram,38,Hyderabad,3900"""

# Use StringIO to convert the string data into a file-like object
obj = StringIO(data)

# read CSV into a Pandas DataFrame
df = pd.read_csv(obj)

print(df)

Output

Its output is as follows −

   S.No    Name  Age       City  Salary
0     1     Tom   28    Toronto   20000
1     2     Lee   32   HongKong    3000
2     3  Steven   43   Bay Area    8300
3     4     Ram   38  Hyderabad    3900

Customizing Parsing Options

Pandas allows several customization options when parsing data. You can modify how the data is parsed using parameters like −

Index_col
dtype
names
skiprows

Below we will discuss about the common parsing options for customization.

Customizing the index

You can customize the row labels or index of the Pandas object by using index_col parameter. Setting index_col=False forces Pandas to not use the first column as the index, which can be helpful when handling malformed files with extra delimiters.

Example

This example uses the index_col parameter to customize the row labels while reading the CSV data.

import pandas as pd

# Import StringIO to load a file-like object for reading CSV
from io import StringIO

# Create string representing CSV data
data = """S.No,Name,Age,City,Salary
1,Tom,28,Toronto,20000
2,Lee,32,HongKong,3000
3,Steven,43,Bay Area,8300
4,Ram,38,Hyderabad,3900"""

# Use StringIO to convert the string data into a file-like object
obj = StringIO(data)

# read CSV into a Pandas DataFrame
df = pd.read_csv(obj, index_col=['S.No'])

# Display the DataFrame
print(df)

Output

Its output is as follows −

        Name  Age       City  Salary
S.No                                
1        Tom   28    Toronto   20000
2        Lee   32   HongKong    3000
3     Steven   43   Bay Area    8300
4        Ram   38  Hyderabad    3900

Converters

Pandas also provides the ability to specify the data type for columns using the dtype parameter. You can convert columns to specific types like {'Col_1': np.float64, 'Col_2': np.int32, 'Col3': 'Int64'}.

Example

This example customizes the data type of a JSON data while parsing the data using the read_json() method with the dtype parameter.

import pandas as pd
from io import StringIO
import numpy as np

# Create a string representing JSON data
data = """[
    {"Name": "Braund", "Gender": "Male", "Age": 30},
    {"Name": "Cumings", "Gender": "Female", "Age": 25},
    {"Name": "Heikkinen", "Gender": "Female", "Age": 35}
]"""

# Use StringIO to convert the JSON-formatted string data into a file-like object
obj = StringIO(data)

# Read JSON into a Pandas DataFrame
df = pd.read_json(obj, dtype={'Age': np.float64})

# Display the DataFrame
print(df.dtypes)

Output

Its output is as follows −

Name       object
Gender     object
Age       float64
dtype: object

By default, the dtype of the 'Age' column is int, but the result shows it as float because we have explicitly casted the type.

Thus, the data looks like float −

        Name  Gender    Age 
S.No                                
1     Braund    Male    30.0
2    Cumings  Female    25.0
3  Heikkinen  Female    35.0

Customizing the Header Names

When reading data files, Pandas assumes the first row as the header. However, you can customize this using the names Parameter to provide custom column names.

Example

This example reads the XML data into a Pandas DataFrame object by customizing the header names using the names parameter of the read_xml() method.

import pandas as pd
from io import StringIO

# Create a String representing XML data 
xml = """<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
  <book category="cooking">
    <title lang="en">Everyday Italian</title>
    <author>Giada De Laurentiis</author>
    <year>2005</year>
    <price>30.00</price>
  </book>
  <book category="children">
    <title lang="en">Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <price>29.99</price>
  </book>
  <book category="web">
    <title lang="en">Learning XML</title>
    <author>Erik T. Ray</author>
    <year>2003</year>
    <price>39.95</price>
  </book>
</bookstore>"""

# Parse the XML data with custom column names
df = pd.read_xml(StringIO(xml), names=['a', 'b', 'c','d','e'])

# Display the Output DataFrame
print('Output DataFrame from XML:')
print(df)

Output

Its output is as follows −

Output DataFrame from XML:
          a                 b                    c     d      e
0   cooking  Everyday Italian  Giada De Laurentiis  2005  30.00
1  children      Harry Potter         J K. Rowling  2005  29.99
2       web      Learning XML          Erik T. Ray  2003  39.95

Example: Reading with custom column names and header row

If the header is in a row other than the first, pass the row number to header. This will skip the preceding rows.

import pandas as pd

# Import StringIO to load a file-like object for reading CSV
from io import StringIO

# Create string representing CSV data
data = """S.No,Name,Age,City,Salary
1,Tom,28,Toronto,20000
2,Lee,32,HongKong,3000
3,Steven,43,Bay Area,8300
4,Ram,38,Hyderabad,3900"""

# Use StringIO to convert the string data into a file-like object
obj = StringIO(data)

# read CSV into a Pandas DataFrame
df = pd.read_csv(obj, names=['a', 'b', 'c','d','e'], header=0)

# Display the DataFrame
print(df)

Output

Its output is as follows −

   a       b   c          d      e
0  1     Tom  28    Toronto  20000
1  2     Lee  32   HongKong   3000
2  3  Steven  43   Bay Area   8300
3  4     Ram  38  Hyderabad   3900

Skipping Rows

The skiprows parameter allows you to skip a specific number of rows or line numbers when reading a file. It can also accept a callable function to decide which rows to skip based on conditions.

Example

This example shows skipping the rows of a input data while parsing.

import pandas as pd

# Import StringIO to load a file-like object for reading CSV
from io import StringIO

# Create string representing CSV data
data = """S.No,Name,Age,City,Salary
1,Tom,28,Toronto,20000
2,Lee,32,HongKong,3000
3,Steven,43,Bay Area,8300
4,Ram,38,Hyderabad,3900"""

# Use StringIO to convert the string data into a file-like object
obj = StringIO(data)
    
# read CSV into a Pandas DataFrame
df = pd.read_csv(obj, skiprows=2)

# Display the DataFrame
print(df)

Output

Its output is as follows −

   2     Lee  32   HongKong  3000
0  3  Steven  43   Bay Area  8300
1  4     Ram  38  Hyderabad  3900

Previous Quiz Next