How can Tensorflow be used with Estimators to add a column to the titanic dataset?

A column can be added to the titanic dataset using Tensorflow by using the ‘crossed_column’ method which is present in the ‘feature_column’ class of ‘Tensorflow’ module. The model can be trained again using the ‘train’ method.

Read More: What is TensorFlow and how Keras work with TensorFlow to create Neural Networks?

We will use the Keras Sequential API, which is helpful in building a sequential model that is used to work with a plain stack of layers, where every layer has exactly one input tensor and one output tensor.

A neural network that contains at least one layer is known as a convolutional layer. We can use the Convolutional Neural Network to build learning model. 

We are using the Google Colaboratory to run the below code. Google Colab or Colaboratory helps run Python code over the browser and requires zero configuration and free access to GPUs (Graphical Processing Units). Colaboratory has been built on top of Jupyter Notebook.

An Estimator is TensorFlow's high-level representation of a complete model. It is designed for easy scaling and asynchronous training.  We will train a logistic regression model using the tf.estimator API. The model is used as a baseline for other algorithms. Estimators use feature columns to describe how the model would interpret the raw input features. An Estimator expects a vector of numeric inputs, and feature columns will help describe how the model should convert every feature in the dataset.

Selecting and using the right set of feature columns is essential to learning an effective model. A feature column can be one of the raw inputs in the original features dict, or a new column created using transformations that are defined on one or multiple base columns.


print("Crossed feature column is added to the data")
age_x_gender = tf.feature_column.crossed_column(['age', 'sex'], hash_bucket_size=100)
print("The combination feature is added")
print("The model is trained again")
derived_feature_columns = [age_x_gender]
linear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns+derived_feature_columns)
result = linear_est.evaluate(eval_input_fn)

Code credit −


Crossed feature column is added to the data
The combination feature is added
The model is trained again
{'accuracy': 0.7613636, 'accuracy_baseline': 0.625, 'auc': 0.84352624, 'auc_precision_recall': 0.78346276, 'average_loss': 0.48114488, 'label/mean': 0.375, 'loss': 0.4756022, 'precision': 0.65789473, 'prediction/mean': 0.4285249, 'recall': 0.75757575, 'global_step': 200}


  • An accuracy of 75% is reached.

  • This is done by using every base feature column separately, but it may not be enough to explain the data.

  • To understand the differences between different feature combinations, crossed feature columns can be added to the model.

  • On the other hand, the ‘age’ column can also be bucketized before the cross column.