How can data be cleaned to predict the fuel efficiency with Auto MPG dataset using TensorFlow?


Tensorflow is a machine learning framework that is provided by Google. It is an open−source framework used in conjunction with Python to implement algorithms, deep learning applications and much more.

The ‘tensorflow’ package can be installed on Windows using the below line of code −

pip install tensorflow

Tensor is a data structure used in TensorFlow. It helps connect edges in a flow diagram. This flow diagram is known as the ‘Data flow graph’. Tensors are nothing but multidimensional array or a list.

The aim behind a regression problem is to predict the output of a continuous or discrete variable, such as a price, probability, whether it would rain or not and so on.

The dataset we use is called the ‘Auto MPG’ dataset. It contains fuel efficiency of 1970s and 1980s automobiles. It includes attributes like weight, horsepower, displacement, and so on. With this, we need to predict the fuel efficiency of specific vehicles.

We are using the Google Colaboratory to run the below code. Google Colab or Colaboratory helps run Python code over the browser and requires zero configuration and free access to GPUs (Graphical Processing Units). Colaboratory has been built on top of Jupyter Notebook.

Following is the code snippet wherein we will see how can data be cleaned to predict the fuel efficiency with Auto MPG dataset using TensorFlow −

Example

print("Data cleaning has begun")
dataset.isna().sum()
dataset = dataset.dropna()
dataset['Origin'] = dataset['Origin'].map({1: 'USA', 2: 'Europe', 3: 'Japan'})

print("Data cleaning complete!")
dataset = pd.get_dummies(dataset, prefix='', prefix_sep='')

print("A sample of dataset after data cleaning :")
dataset.head(4)

Code credit − https://www.tensorflow.org/tutorials/keras/regression

Output

Data cleaning has begun
Data cleaning complete!
A sample of dataset after data cleaning −



MPGCylindersDisplacementhorsepowerweightAccelerationModel YearEuropeJapanUSA
018.08307.0130.03504.012.070001
115.08350.0165.03693.011.570001
218.08318.0150.03436.011.070001
316.08304.0150.03433.012.070001

Explanation

  • The data cleaning begins by deleting ‘nan’ present in the dataset.

  • The ‘map’ function is used to map label to column names.

  • A sample of the dataet after data cleaning is displayed on the console.

Updated on: 20-Jan-2021

87 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements