Article Categories

Selected Reading

How can data be cleaned to predict the fuel efficiency with Auto MPG dataset using TensorFlow?

Python Server Side Programming Programming

TensorFlow is a machine learning framework provided by Google. It is an open-source framework used in conjunction with Python to implement algorithms, deep learning applications and much more. The Auto MPG dataset contains fuel efficiency data from 1970s and 1980s automobiles, which we'll clean to prepare for predicting vehicle fuel efficiency.

Installing TensorFlow

The 'tensorflow' package can be installed on Windows using the below command:

pip install tensorflow

About the Auto MPG Dataset

The Auto MPG dataset contains fuel efficiency information for automobiles from the 1970s and 1980s. It includes attributes like:

MPG - Miles per gallon (target variable)
Cylinders - Number of cylinders
Displacement - Engine displacement
Horsepower - Engine horsepower
Weight - Vehicle weight
Origin - Country of origin (1=USA, 2=Europe, 3=Japan)

Data Cleaning Process

Data cleaning is essential before training any machine learning model. Here's how to clean the Auto MPG dataset ?

import pandas as pd
import tensorflow as tf

# Load the dataset
url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data'
column_names = ['MPG', 'Cylinders', 'Displacement', 'Horsepower', 'Weight',
                'Acceleration', 'Model Year', 'Origin']
dataset = pd.read_csv(url, names=column_names, na_values='?', comment='\t',
                      sep=' ', skipinitialspace=True)

print("Data cleaning has begun")
print("Missing values per column:")
print(dataset.isna().sum())

# Remove rows with missing values
dataset = dataset.dropna()

# Map origin numbers to country names
dataset['Origin'] = dataset['Origin'].map({1: 'USA', 2: 'Europe', 3: 'Japan'})

print("Data cleaning complete!")

# Convert categorical variables to dummy variables
dataset = pd.get_dummies(dataset, prefix='', prefix_sep='')

print("A sample of dataset after data cleaning:")
print(dataset.head(4))

Data cleaning has begun
Missing values per column:
MPG             0
Cylinders       0
Displacement    0
Horsepower      6
Weight          0
Acceleration    0
Model Year      0
Origin          0
dtype: int64
Data cleaning complete!
A sample of dataset after data cleaning:

	MPG	Cylinders	Displacement	Horsepower	Weight	Acceleration	Model Year	USA
0	18.0	8	307.0	130.0	3504.0	12.0	70	1
1	15.0	8	350.0	165.0	3693.0	11.5	70	1
2	18.0	8	318.0	150.0	3436.0	11.0	70	1
3	16.0	8	304.0	150.0	3433.0	12.0	70	1

Key Data Cleaning Steps

Handle Missing Values - Use dropna() to remove rows with missing data
Map Categorical Data - Convert origin codes (1,2,3) to meaningful country names
One-Hot Encoding - Use pd.get_dummies() to convert categorical variables into binary columns
Data Validation - Check for missing values using isna().sum()

Conclusion

Data cleaning is crucial for accurate fuel efficiency prediction. The key steps include handling missing values, mapping categorical variables, and creating dummy variables for machine learning models. Clean data ensures better model performance and reliable predictions.

AmitDiwan

Updated on: 2026-03-25T15:37:00+05:30

246 Views

Kickstart Your Career

Get certified by completing the course

Get Started

Previous Next