How can Tensorflow be used with Estimators to inspect the titanic dataset using Python?

The titanic dataset can be inspected using TensorFlow and Estimators by iterating through features and examining the data structure. This helps understand the dataset before building machine learning models.

Read More: What is TensorFlow and how Keras work with TensorFlow to create Neural Networks?

We will use TensorFlow Estimators, which are high-level APIs designed for easy scaling and asynchronous training. The goal is to predict passenger survival based on characteristics like gender, age, class, and other features from the titanic dataset.

Setting Up the Environment

We are using Google Colaboratory to run the code. Google Colab helps run Python code in the browser with zero configuration and free access to GPUs.

Understanding TensorFlow Estimators

An Estimator is TensorFlow's high−level representation of a complete model. Estimators use feature columns to describe how the model interprets raw input features. They expect numeric input vectors, making feature columns essential for converting dataset features into the proper format.

Complete Dataset Inspection Example

Here's a complete example showing how to load and inspect the titanic dataset using TensorFlow Estimators ?

import tensorflow as tf
import pandas as pd

# Load the titanic dataset
dftrain = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/train.csv')
dfeval = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/eval.csv')

# Separate features and labels
y_train = dftrain.pop('survived')
y_eval = dfeval.pop('survived')

# Create input function
def make_input_fn(data_df, label_df, num_epochs=10, shuffle=True, batch_size=32):
    def input_function():
        ds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df))
        if shuffle:
            ds = ds.shuffle(1000)
        ds = ds.batch(batch_size).repeat(num_epochs)
        return ds
    return input_function

# Inspect the dataset
print("The dataset is being inspected")
ds = make_input_fn(dftrain, y_train, batch_size=10)()

for feature_batch, label_batch in ds.take(1):
    print('Some feature keys are:', list(feature_batch.keys()))
    print()
    print('A batch of class:', feature_batch['class'].numpy())
    print()
    print('A batch of Labels:', label_batch.numpy())
    break
The dataset is being inspected
Some feature keys are: ['sex', 'age', 'n_siblings_spouses', 'parch', 'fare', 'class', 'deck', 'embark_town', 'alone']

A batch of class: [b'First' b'First' b'First' b'Third' b'Third' b'Third' b'First' b'Third'
 b'Second' b'Third']

A batch of Labels: [0 1 1 0 0 0 1 0 0 0]

Understanding the Features

The titanic dataset contains the following key features ?

Feature Description Type
sex Gender of passenger Categorical
age Age of passenger Numerical
class Ticket class (First, Second, Third) Categorical
fare Passenger fare Numerical
alone Whether passenger traveled alone Boolean

Key Points

  • The dataset inspection reveals 9 feature keys including demographic and ticket information
  • Labels are binary (0 = did not survive, 1 = survived)
  • Features include both categorical (class, sex) and numerical (age, fare) data types
  • The batch processing allows efficient data handling for training

Conclusion

Inspecting the titanic dataset with TensorFlow Estimators reveals the structure and types of features available for survival prediction. This inspection step is crucial before building machine learning models to understand data characteristics and preprocessing requirements.

Updated on: 2026-03-25T16:38:12+05:30

243 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements