Various approaches in Python to load CSV data for ML projects


To successfully build a machine learning project, loading data properly is one of the most important as well as challenging tasks. CSV is the most common format for machine learning projects. It is a simple format which is used to store tabular data.

Followings are the three most common approaches in Python with the help of which you can load CSV data for machine learning projects −

Using Python Standard Library

To load CSV data files, Python standard library provides us with a built-in function namely csv module.

Example

In this example we will be loading CSV data file of iris flower data set −

#Importing csv module
import csv

#To convert the data into NumPy array, import numpy module:
import numpy as np

#Providing the full path of the CSV data file which is stored on our local directory:

datafile_path = r"c:/Users/ Desktop/iris.csv"

# Reading data using the csv.reader()function:

with open(datafile_path,'r') as f:
reader = csv.reader(f,delimiter = ',')
data_headers = next(reader)
data = list(reader)
data = np.array(data).astype(float)

#Printing the names of the data headers and the first 5 lines of the data file:
print(data_headers)
print(data[:5])

Output

['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
[
  [5.1 3.5 1.4 0.2]
  [4.9 3.  1.4 0.2]
  [4.7 3.2 1.3 0.2]
  [4.6 3.1 1.5 0.2]
  [5.  3.6 1.4 0.2]
]

Using Pandas

Another approach which we can use to load CSV data files is pandas.read_csv() function. This function will return a pandas.DataFrame that can be used immediately for plotting.

Example

In this example we will be loading CSV data file of Pima Indians Dataset −

#Importing read_csv function from Pandas
from pandas import read_csv

#Providing the full path of the CSV data file which is stored on our local directory:
datafile_path = r"C:/Users/Leekha/Desktop/pima-indians-diabetes.csv"

#Providing header names and reading data using read_csv() function:
headernames = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
data = read_csv(datafile_path, names=headernames)

#Printing the number of rows and columns in the file and first 5 lines of the data file:
print(data.shape)
print(data[:5])

Output

(768, 9)
  preg plas pres  skin test  mass  pedi  age  class
0   6   148   72   35    0   33.6  0.627  50     1
1   1    85   66   29    0   26.6  0.351  31     0
2   8   183   64    0    0   23.3  0.672  32     1
3   1    89   66   23   94   28.1  0.167  21     0
4   0   137   40   35  168   43.1  2.288  33     1

Updated on: 24-Nov-2021

274 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements