How can data be cleaned to predict the fuel efficiency with Auto MPG dataset using TensorFlow?

PythonServer Side ProgrammingProgramming

Tensorflow is a machine learning framework that is provided by Google. It is an open−source framework used in conjunction with Python to implement algorithms, deep learning applications and much more.

The ‘tensorflow’ package can be installed on Windows using the below line of code −

pip install tensorflow

Tensor is a data structure used in TensorFlow. It helps connect edges in a flow diagram. This flow diagram is known as the ‘Data flow graph’. Tensors are nothing but multidimensional array or a list.

The aim behind a regression problem is to predict the output of a continuous or discrete variable, such as a price, probability, whether it would rain or not and so on.

The dataset we use is called the ‘Auto MPG’ dataset. It contains fuel efficiency of 1970s and 1980s automobiles. It includes attributes like weight, horsepower, displacement, and so on. With this, we need to predict the fuel efficiency of specific vehicles.

We are using the Google Colaboratory to run the below code. Google Colab or Colaboratory helps run Python code over the browser and requires zero configuration and free access to GPUs (Graphical Processing Units). Colaboratory has been built on top of Jupyter Notebook.

Following is the code snippet wherein we will see how can data be cleaned to predict the fuel efficiency with Auto MPG dataset using TensorFlow −


print("Data cleaning has begun")
dataset = dataset.dropna()
dataset['Origin'] = dataset['Origin'].map({1: 'USA', 2: 'Europe', 3: 'Japan'})

print("Data cleaning complete!")
dataset = pd.get_dummies(dataset, prefix='', prefix_sep='')

print("A sample of dataset after data cleaning :")

Code credit −


Data cleaning has begun
Data cleaning complete!
A sample of dataset after data cleaning −

MPGCylindersDisplacementhorsepowerweightAccelerationModel YearEuropeJapanUSA


  • The data cleaning begins by deleting ‘nan’ present in the dataset.

  • The ‘map’ function is used to map label to column names.

  • A sample of the dataet after data cleaning is displayed on the console.

Updated on 20-Jan-2021 12:36:53