Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How can data be imported to predict the fuel efficiency with Auto MPG dataset (basic regression) using TensorFlow?
TensorFlow is a machine learning framework provided by Google. It is an open-source framework used in conjunction with Python to implement algorithms, deep learning applications and much more. It is used in research and for production purposes.
The Auto MPG dataset contains fuel efficiency data of 1970s and 1980s automobiles. It includes attributes like weight, horsepower, displacement, and acceleration. With this dataset, we can predict the fuel efficiency of specific vehicles using regression techniques.
Installing TensorFlow
The tensorflow package can be installed using the following command:
pip install tensorflow
Understanding the Dataset
The Auto MPG dataset is a classic regression dataset containing the following features:
- MPG: Miles per gallon (target variable)
- Cylinders: Number of cylinders in the engine
- Displacement: Engine displacement
- Horsepower: Engine power
- Weight: Vehicle weight
- Acceleration: Time to accelerate from 0 to 60 mph
- Model Year: Manufacturing year
- Origin: Country of origin
Loading and Exploring the Dataset
Let's import the necessary libraries and load the Auto MPG dataset from the UCI repository:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
# Set numpy print options
np.set_printoptions(precision=3, suppress=True)
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
print("TensorFlow version:", tf.__version__)
# Load the dataset
url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data'
column_names = ['MPG', 'Cylinders', 'Displacement', 'Horsepower', 'Weight',
'Acceleration', 'Model Year', 'Origin']
print("Loading the Auto MPG dataset...")
raw_dataset = pd.read_csv(url, names=column_names, na_values='?',
comment='\t', sep=' ', skipinitialspace=True)
dataset = raw_dataset.copy()
print("Dataset loaded successfully!")
print("\nFirst 5 rows of the dataset:")
print(dataset.head())
TensorFlow version: 2.4.0
Loading the Auto MPG dataset...
Dataset loaded successfully!
First 5 rows of the dataset:
MPG Cylinders Displacement Horsepower Weight Acceleration Model Year Origin
0 18.0 8 307.0 130.0 3504.0 12.0 70 1
1 15.0 8 350.0 165.0 3693.0 11.5 70 1
2 18.0 8 318.0 150.0 3436.0 11.0 70 1
3 16.0 8 304.0 150.0 3433.0 12.0 70 1
4 17.0 8 302.0 140.0 3449.0 10.5 70 1
Dataset Information
Let's examine the basic statistics and structure of our dataset:
# Check dataset info
print("Dataset shape:", dataset.shape)
print("\nDataset info:")
print(dataset.info())
print("\nBasic statistics:")
print(dataset.describe())
print("\nMissing values:")
print(dataset.isnull().sum())
Dataset shape: (398, 8)
Dataset info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 398 entries, 0 to 397
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 MPG 398 non-null float64
1 Cylinders 398 non-null int64
2 Displacement 398 non-null float64
3 Horsepower 392 non-null float64
4 Weight 398 non-null float64
5 Acceleration 398 non-null float64
6 Model Year 398 non-null int64
7 Origin 398 non-null int64
Basic statistics:
MPG Cylinders Displacement Horsepower Weight Acceleration Model Year Origin
count 398.000 398.000 398.000 392.000 398.000 398.000 398.000 398.000
mean 23.446 5.472 194.412 104.469 2977.584 15.541 75.979 1.577
std 7.805 1.705 104.644 38.491 849.403 2.758 3.684 0.775
Missing values:
MPG 0
Cylinders 0
Displacement 0
Horsepower 6
Weight 0
Acceleration 0
Model Year 0
Origin 0
Data Preprocessing
The dataset has some missing values in the Horsepower column. Let's handle these missing values:
# Remove rows with missing values
dataset_cleaned = dataset.dropna()
print("Dataset shape after removing missing values:", dataset_cleaned.shape)
print("Missing values after cleaning:", dataset_cleaned.isnull().sum().sum())
# Display correlation with target variable (MPG)
print("\nCorrelation with MPG:")
correlations = dataset_cleaned.corr()['MPG'].sort_values(ascending=False)
print(correlations)
Dataset shape after removing missing values: (392, 8) Missing values after cleaning: 0 Correlation with MPG: MPG 1.000000 Acceleration 0.423329 Model Year 0.580541 Origin 0.565209 Cylinders -0.777618 Displacement -0.805127 Weight -0.832244 Horsepower -0.778427
Key Insights
From the correlation analysis, we can observe:
- Strong negative correlations: Weight (-0.83), Displacement (-0.81), and Horsepower (-0.78) have strong negative relationships with MPG
- Positive correlations: Model Year (0.58) and Origin (0.57) show positive relationships with fuel efficiency
- Moderate correlation: Acceleration (0.42) has a moderate positive correlation
Conclusion
The Auto MPG dataset has been successfully loaded and explored. The dataset contains 392 complete records with 8 features, where heavier cars with larger engines tend to have lower fuel efficiency. This dataset is now ready for building regression models to predict fuel efficiency using TensorFlow.
