How can data be cleaned to predict the fuel efficiency with Auto MPG dataset using TensorFlow?

TensorFlow is a machine learning framework provided by Google. It is an open-source framework used in conjunction with Python to implement algorithms, deep learning applications and much more. The Auto MPG dataset contains fuel efficiency data from 1970s and 1980s automobiles, which we'll clean to prepare for predicting vehicle fuel efficiency.

Installing TensorFlow

The 'tensorflow' package can be installed on Windows using the below command:

pip install tensorflow

About the Auto MPG Dataset

The Auto MPG dataset contains fuel efficiency information for automobiles from the 1970s and 1980s. It includes attributes like:

  • MPG - Miles per gallon (target variable)

  • Cylinders - Number of cylinders

  • Displacement - Engine displacement

  • Horsepower - Engine horsepower

  • Weight - Vehicle weight

  • Origin - Country of origin (1=USA, 2=Europe, 3=Japan)

Data Cleaning Process

Data cleaning is essential before training any machine learning model. Here's how to clean the Auto MPG dataset ?

import pandas as pd
import tensorflow as tf

# Load the dataset
url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data'
column_names = ['MPG', 'Cylinders', 'Displacement', 'Horsepower', 'Weight',
                'Acceleration', 'Model Year', 'Origin']
dataset = pd.read_csv(url, names=column_names, na_values='?', comment='\t',
                      sep=' ', skipinitialspace=True)

print("Data cleaning has begun")
print("Missing values per column:")
print(dataset.isna().sum())

# Remove rows with missing values
dataset = dataset.dropna()

# Map origin numbers to country names
dataset['Origin'] = dataset['Origin'].map({1: 'USA', 2: 'Europe', 3: 'Japan'})

print("Data cleaning complete!")

# Convert categorical variables to dummy variables
dataset = pd.get_dummies(dataset, prefix='', prefix_sep='')

print("A sample of dataset after data cleaning:")
print(dataset.head(4))
Data cleaning has begun
Missing values per column:
MPG             0
Cylinders       0
Displacement    0
Horsepower      6
Weight          0
Acceleration    0
Model Year      0
Origin          0
dtype: int64
Data cleaning complete!
A sample of dataset after data cleaning:
MPG Cylinders Displacement Horsepower Weight Acceleration Model Year Europe Japan USA
0 18.0 8 307.0 130.0 3504.0 12.0 70 0 0 1
1 15.0 8 350.0 165.0 3693.0 11.5 70 0 0 1
2 18.0 8 318.0 150.0 3436.0 11.0 70 0 0 1
3 16.0 8 304.0 150.0 3433.0 12.0 70 0 0 1

Key Data Cleaning Steps

  • Handle Missing Values - Use dropna() to remove rows with missing data

  • Map Categorical Data - Convert origin codes (1,2,3) to meaningful country names

  • One-Hot Encoding - Use pd.get_dummies() to convert categorical variables into binary columns

  • Data Validation - Check for missing values using isna().sum()

Conclusion

Data cleaning is crucial for accurate fuel efficiency prediction. The key steps include handling missing values, mapping categorical variables, and creating dummy variables for machine learning models. Clean data ensures better model performance and reliable predictions.

Updated on: 2026-03-25T15:37:00+05:30

236 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements