How can data be split and inspected to predict the fuel efficiency with Auto MPG dataset using TensorFlow?

TensorFlow is a machine learning framework provided by Google for implementing algorithms, deep learning applications, and neural networks. It uses multi-dimensional arrays called tensors to perform complex mathematical operations efficiently.

The Auto MPG dataset contains fuel efficiency data of automobiles from the 1970s and 1980s. It includes attributes like weight, horsepower, displacement, and cylinders. Our goal is to predict the fuel efficiency (MPG) of vehicles using regression techniques.

We are using Google Colaboratory to run the code. Google Colab provides free access to GPUs and requires zero configuration for running Python code.

Dataset Preparation and Splitting

Before training a model, we need to split our data into training and testing sets. Here's how to split and inspect the Auto MPG dataset ?

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Load the Auto MPG dataset (assuming dataset is already loaded and cleaned)
# For demonstration, let's assume 'dataset' is our cleaned DataFrame

print("Splitting the training and testing dataset")
train_dataset = dataset.sample(frac=0.7, random_state=0)
test_dataset = dataset.drop(train_dataset.index)

print(f"Training set size: {len(train_dataset)}")
print(f"Test set size: {len(test_dataset)}")
Splitting the training and testing dataset
Training set size: 274
Test set size: 118

Data Visualization

Visualizing the training data helps us understand relationships between different features ?

print("Plotting the training data as a visualization")
sns.pairplot(train_dataset[['MPG', 'Cylinders', 'Displacement', 'Weight']], diag_kind='kde')
plt.show()
Plotting the training data as a visualization

Statistical Analysis

Understanding the statistical properties of our data is crucial for preprocessing ?

print("Understanding the statistics associated with the data")
stats_summary = train_dataset.describe().transpose()
print(stats_summary)
Understanding the statistics associated with the data
              count       mean        std        min        25%        50%        75%        max
MPG           274.0      23.51       7.83       9.00      17.50      23.00      29.00      46.60
Cylinders     274.0       5.48       1.70       3.00       4.00       4.00       8.00       8.00
Displacement  274.0     193.43     104.27      68.00     104.25     148.50     265.75     455.00
Weight        274.0    2990.25     843.90    1613.00    2256.50    2822.50    3608.00    5140.00

Data Split Analysis

Dataset Percentage Purpose Size (approx.)
Training 70% Model training 274 samples
Testing 30% Model evaluation 118 samples

Key Insights from Statistics

The statistical summary reveals important characteristics:

  • MPG range: 9.0 to 46.6 miles per gallon
  • Cylinders: Most cars have 4 or 8 cylinders
  • Weight correlation: Heavier cars typically have lower fuel efficiency
  • Displacement: Wide range from 68 to 455 cubic inches

Conclusion

Data splitting into 70% training and 30% testing sets ensures proper model evaluation. The statistical analysis and visualizations help identify feature relationships and data distributions, which are essential for building an effective regression model to predict fuel efficiency.

Updated on: 2026-03-25T15:37:22+05:30

242 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements