Article Categories

Selected Reading

How can Tensorflow be used with Estimators to inspect the titanic dataset using Python?

Tensorflow Python Server Side Programming Programming

The titanic dataset can be inspected using TensorFlow and Estimators by iterating through features and examining the data structure. This helps understand the dataset before building machine learning models.

We will use TensorFlow Estimators, which are high-level APIs designed for easy scaling and asynchronous training. The goal is to predict passenger survival based on characteristics like gender, age, class, and other features from the titanic dataset.

Setting Up the Environment

We are using Google Colaboratory to run the code. Google Colab helps run Python code in the browser with zero configuration and free access to GPUs.

Understanding TensorFlow Estimators

An Estimator is TensorFlow's high−level representation of a complete model. Estimators use feature columns to describe how the model interprets raw input features. They expect numeric input vectors, making feature columns essential for converting dataset features into the proper format.

Complete Dataset Inspection Example

Here's a complete example showing how to load and inspect the titanic dataset using TensorFlow Estimators ?

import tensorflow as tf
import pandas as pd

# Load the titanic dataset
dftrain = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/train.csv')
dfeval = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/eval.csv')

# Separate features and labels
y_train = dftrain.pop('survived')
y_eval = dfeval.pop('survived')

# Create input function
def make_input_fn(data_df, label_df, num_epochs=10, shuffle=True, batch_size=32):
    def input_function():
        ds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df))
        if shuffle:
            ds = ds.shuffle(1000)
        ds = ds.batch(batch_size).repeat(num_epochs)
        return ds
    return input_function

# Inspect the dataset
print("The dataset is being inspected")
ds = make_input_fn(dftrain, y_train, batch_size=10)()

for feature_batch, label_batch in ds.take(1):
    print('Some feature keys are:', list(feature_batch.keys()))
    print()
    print('A batch of class:', feature_batch['class'].numpy())
    print()
    print('A batch of Labels:', label_batch.numpy())
    break

The dataset is being inspected
Some feature keys are: ['sex', 'age', 'n_siblings_spouses', 'parch', 'fare', 'class', 'deck', 'embark_town', 'alone']

A batch of class: [b'First' b'First' b'First' b'Third' b'Third' b'Third' b'First' b'Third'
 b'Second' b'Third']

A batch of Labels: [0 1 1 0 0 0 1 0 0 0]

Understanding the Features

The titanic dataset contains the following key features ?

Feature	Description	Type
`sex`	Gender of passenger	Categorical
`age`	Age of passenger	Numerical
`class`	Ticket class (First, Second, Third)	Categorical
`fare`	Passenger fare	Numerical
`alone`	Whether passenger traveled alone	Boolean

Key Points

The dataset inspection reveals 9 feature keys including demographic and ticket information
Labels are binary (0 = did not survive, 1 = survived)
Features include both categorical (class, sex) and numerical (age, fare) data types
The batch processing allows efficient data handling for training

Conclusion

Inspecting the titanic dataset with TensorFlow Estimators reveals the structure and types of features available for survival prediction. This inspection step is crucial before building machine learning models to understand data characteristics and preprocessing requirements.

AmitDiwan

Updated on: 2026-03-25T16:38:12+05:30

345 Views

Previous Next