Article Categories

Selected Reading

How can data be normalized to predict the fuel efficiency with Auto MPG dataset using TensorFlow?

Python Server Side Programming Programming

TensorFlow is a machine learning framework provided by Google. It is an open-source framework used with Python to implement algorithms, deep learning applications and much more. The 'tensorflow' package can be installed on Windows using the below command:

pip install tensorflow

A Tensor is a data structure used in TensorFlow that helps connect edges in a flow diagram called the 'Data flow graph'. Tensors are multidimensional arrays or lists that store numerical data.

The aim of a regression problem is to predict the output of a continuous variable, such as fuel efficiency, price, or probability. The Auto MPG dataset contains fuel efficiency data of 1970s and 1980s automobiles with attributes like weight, horsepower, and displacement to predict vehicle fuel efficiency.

Data Preparation and Normalization

Before training a model, we need to separate features from labels and normalize the data. Here's how to prepare the Auto MPG dataset:

import tensorflow as tf
from tensorflow.keras.utils import get_file
from tensorflow.keras.layers.experimental import preprocessing
import pandas as pd
import numpy as np

# Load the Auto MPG dataset
url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data'
column_names = ['MPG','Cylinders','Displacement','Horsepower','Weight',
                'Acceleration', 'Model Year', 'Origin']
dataset = pd.read_csv(url, names=column_names, na_values = "?",
                      comment='\t', sep=" ", skipinitialspace=True)

# Clean and split the data
dataset = dataset.dropna()
dataset['Origin'] = pd.get_dummies(dataset['Origin'], prefix='', prefix_sep='')
train_dataset = dataset.sample(frac=0.8, random_state=0)
test_dataset = dataset.drop(train_dataset.index)

print("Separating the label from features")
train_features = train_dataset.copy()
test_features = test_dataset.copy()

train_labels = train_features.pop('MPG')
test_labels = test_features.pop('MPG')

print("The mean and standard deviation of the training dataset:")
print(train_dataset.describe().transpose()[['mean', 'std']])

Separating the label from features
The mean and standard deviation of the training dataset:
              mean         std
MPG         23.351077    7.728652
Cylinders    5.467949    1.701849
Displacement 193.847436   104.135079
Horsepower   104.135897   38.096214
Weight      2976.880769   847.904119
Acceleration  15.591026    2.789230
Model Year    75.934615    3.675642
1             0.168205     0.374611
2             0.197436     0.398374
3             0.634615     0.482204

Creating the Normalization Layer

Normalization is crucial because features use different scales. TensorFlow's preprocessing.Normalization layer standardizes the input features:

print("Normalize the features since they use different scales")
print("Creating the normalization layer")

normalizer = preprocessing.Normalization()
normalizer.adapt(np.array(train_features))

print("Normalization layer mean values:")
print(normalizer.mean.numpy())

# Test normalization on first example
first = np.array(train_features[:1])
print("Every feature has been individually normalized")

with np.printoptions(precision=2, suppress=True):
    print('First example is:', first)
    print('Normalized data:', normalizer(first).numpy())

Normalize the features since they use different scales
Creating the normalization layer
Normalization layer mean values:
[  5.467 193.847 104.135 2976.88  15.591  75.934   0.168   0.197   0.635]
Every feature has been individually normalized
First example is: [[  4.    105.     63.   2125.    14.7    82.      0.      0.      1.  ]]
Normalized data: [[-0.87 -0.87 -1.11 -1.03 -0.33  1.65 -0.45 -0.5   0.76]]

How Normalization Works

The normalization process follows these steps:

The target value (MPG) is separated from the features as it's what we want to predict
The Normalization layer calculates mean and standard deviation for each feature
Each feature is standardized using: (value - mean) / std
This ensures all features have zero mean and unit variance, improving training stability
The layer stores these statistics and applies them consistently during training and inference

Conclusion

Data normalization is essential for neural network training as it ensures all features contribute equally to the learning process. The TensorFlow preprocessing.Normalization layer provides an efficient way to standardize features and maintain consistency across training and prediction phases.

AmitDiwan

Updated on: 2026-03-25T15:37:47+05:30

345 Views

Previous Next