Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How can data be normalized to predict the fuel efficiency with Auto MPG dataset using TensorFlow?
TensorFlow is a machine learning framework provided by Google. It is an open-source framework used with Python to implement algorithms, deep learning applications and much more. The 'tensorflow' package can be installed on Windows using the below command:
pip install tensorflow
A Tensor is a data structure used in TensorFlow that helps connect edges in a flow diagram called the 'Data flow graph'. Tensors are multidimensional arrays or lists that store numerical data.
The aim of a regression problem is to predict the output of a continuous variable, such as fuel efficiency, price, or probability. The Auto MPG dataset contains fuel efficiency data of 1970s and 1980s automobiles with attributes like weight, horsepower, and displacement to predict vehicle fuel efficiency.
Data Preparation and Normalization
Before training a model, we need to separate features from labels and normalize the data. Here's how to prepare the Auto MPG dataset:
import tensorflow as tf
from tensorflow.keras.utils import get_file
from tensorflow.keras.layers.experimental import preprocessing
import pandas as pd
import numpy as np
# Load the Auto MPG dataset
url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data'
column_names = ['MPG','Cylinders','Displacement','Horsepower','Weight',
'Acceleration', 'Model Year', 'Origin']
dataset = pd.read_csv(url, names=column_names, na_values = "?",
comment='\t', sep=" ", skipinitialspace=True)
# Clean and split the data
dataset = dataset.dropna()
dataset['Origin'] = pd.get_dummies(dataset['Origin'], prefix='', prefix_sep='')
train_dataset = dataset.sample(frac=0.8, random_state=0)
test_dataset = dataset.drop(train_dataset.index)
print("Separating the label from features")
train_features = train_dataset.copy()
test_features = test_dataset.copy()
train_labels = train_features.pop('MPG')
test_labels = test_features.pop('MPG')
print("The mean and standard deviation of the training dataset:")
print(train_dataset.describe().transpose()[['mean', 'std']])
Separating the label from features
The mean and standard deviation of the training dataset:
mean std
MPG 23.351077 7.728652
Cylinders 5.467949 1.701849
Displacement 193.847436 104.135079
Horsepower 104.135897 38.096214
Weight 2976.880769 847.904119
Acceleration 15.591026 2.789230
Model Year 75.934615 3.675642
1 0.168205 0.374611
2 0.197436 0.398374
3 0.634615 0.482204
Creating the Normalization Layer
Normalization is crucial because features use different scales. TensorFlow's preprocessing.Normalization layer standardizes the input features:
print("Normalize the features since they use different scales")
print("Creating the normalization layer")
normalizer = preprocessing.Normalization()
normalizer.adapt(np.array(train_features))
print("Normalization layer mean values:")
print(normalizer.mean.numpy())
# Test normalization on first example
first = np.array(train_features[:1])
print("Every feature has been individually normalized")
with np.printoptions(precision=2, suppress=True):
print('First example is:', first)
print('Normalized data:', normalizer(first).numpy())
Normalize the features since they use different scales Creating the normalization layer Normalization layer mean values: [ 5.467 193.847 104.135 2976.88 15.591 75.934 0.168 0.197 0.635] Every feature has been individually normalized First example is: [[ 4. 105. 63. 2125. 14.7 82. 0. 0. 1. ]] Normalized data: [[-0.87 -0.87 -1.11 -1.03 -0.33 1.65 -0.45 -0.5 0.76]]
How Normalization Works
The normalization process follows these steps:
The target value (MPG) is separated from the features as it's what we want to predict
The Normalization layer calculates mean and standard deviation for each feature
Each feature is standardized using:
(value - mean) / stdThis ensures all features have zero mean and unit variance, improving training stability
The layer stores these statistics and applies them consistently during training and inference
Conclusion
Data normalization is essential for neural network training as it ensures all features contribute equally to the learning process. The TensorFlow preprocessing.Normalization layer provides an efficient way to standardize features and maintain consistency across training and prediction phases.
