Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How can Tensorflow be used with Estimators to inspect the titanic dataset using Python?
The titanic dataset can be inspected using TensorFlow and Estimators by iterating through features and examining the data structure. This helps understand the dataset before building machine learning models.
Read More: What is TensorFlow and how Keras work with TensorFlow to create Neural Networks?
We will use TensorFlow Estimators, which are high-level APIs designed for easy scaling and asynchronous training. The goal is to predict passenger survival based on characteristics like gender, age, class, and other features from the titanic dataset.
Setting Up the Environment
We are using Google Colaboratory to run the code. Google Colab helps run Python code in the browser with zero configuration and free access to GPUs.
Understanding TensorFlow Estimators
An Estimator is TensorFlow's high−level representation of a complete model. Estimators use feature columns to describe how the model interprets raw input features. They expect numeric input vectors, making feature columns essential for converting dataset features into the proper format.
Complete Dataset Inspection Example
Here's a complete example showing how to load and inspect the titanic dataset using TensorFlow Estimators ?
import tensorflow as tf
import pandas as pd
# Load the titanic dataset
dftrain = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/train.csv')
dfeval = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/eval.csv')
# Separate features and labels
y_train = dftrain.pop('survived')
y_eval = dfeval.pop('survived')
# Create input function
def make_input_fn(data_df, label_df, num_epochs=10, shuffle=True, batch_size=32):
def input_function():
ds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df))
if shuffle:
ds = ds.shuffle(1000)
ds = ds.batch(batch_size).repeat(num_epochs)
return ds
return input_function
# Inspect the dataset
print("The dataset is being inspected")
ds = make_input_fn(dftrain, y_train, batch_size=10)()
for feature_batch, label_batch in ds.take(1):
print('Some feature keys are:', list(feature_batch.keys()))
print()
print('A batch of class:', feature_batch['class'].numpy())
print()
print('A batch of Labels:', label_batch.numpy())
break
The dataset is being inspected Some feature keys are: ['sex', 'age', 'n_siblings_spouses', 'parch', 'fare', 'class', 'deck', 'embark_town', 'alone'] A batch of class: [b'First' b'First' b'First' b'Third' b'Third' b'Third' b'First' b'Third' b'Second' b'Third'] A batch of Labels: [0 1 1 0 0 0 1 0 0 0]
Understanding the Features
The titanic dataset contains the following key features ?
| Feature | Description | Type |
|---|---|---|
sex |
Gender of passenger | Categorical |
age |
Age of passenger | Numerical |
class |
Ticket class (First, Second, Third) | Categorical |
fare |
Passenger fare | Numerical |
alone |
Whether passenger traveled alone | Boolean |
Key Points
- The dataset inspection reveals 9 feature keys including demographic and ticket information
- Labels are binary (0 = did not survive, 1 = survived)
- Features include both categorical (class, sex) and numerical (age, fare) data types
- The batch processing allows efficient data handling for training
Conclusion
Inspecting the titanic dataset with TensorFlow Estimators reveals the structure and types of features available for survival prediction. This inspection step is crucial before building machine learning models to understand data characteristics and preprocessing requirements.
