How can Tensorflow be used with Estimator to show a sample of the data using boosted trees?


Boosted trees with Tensorflow can be used to show a sample of the titanic dataset using the ‘head’ method, the ‘describe’ method and the ‘shape’ method. The head method gives the first few rows of the dataset, and the describe method gives information about the dataset, such as column names, types, mean, variance, standard deviation and so on. The shape method gives the dimensions of the data.

Read More: What is TensorFlow and how Keras work with TensorFlow to create Neural Networks?

We will use the Keras Sequential API, which is helpful in building a sequential model that is used to work with a plain stack of layers, where every layer has exactly one input tensor and one output tensor.

A neural network that contains at least one layer is known as a convolutional layer. We can use the Convolutional Neural Network to build learning model. 

We are using the Google Colaboratory to run the below code. Google Colab or Colaboratory helps run Python code over the browser and requires zero configuration and free access to GPUs (Graphical Processing Units). Colaboratory has been built on top of Jupyter Notebook.

We will see how a gradient boosting model can be trained using decision trees and tf.estimator API.

An Estimator is TensorFlow's high-level representation of a complete model. It is designed for easy scaling and asynchronous training. Estimators use feature columns to describe how the model would interpret the raw input features. An Estimator expects a vector of numeric inputs, and feature columns will help describe how the model should convert every feature in the dataset.

Boosted Trees models are considered the most popular and effective machine learning approaches for regression as well as classification. It is an ensemble technique which combines the predictions from many (10s or 100s or 1000s) tree models.

Example

print("Some sample of the data")
print(dftrain.head())
print("Metadata about the dataset")
print(dftrain.describe())
print("Dimensions of the data")
print(dftrain.shape[0], dfeval.shape[0])

Code credit −https://www.tensorflow.org/tutorials/estimator/boosted_trees

Output

Some sample of the data
   sex    age   n_siblings_spouses parch ... class deck embark_town     alone
0  male   22.0   1                 0    ... Third unknown Southampton    n
1  female 38.0   1                 0    ... First C Cherbourg            n
2  female 26.0   0                 0    ... Third unknown Southampton    y
3  female 35.0   1                 0    ... First C Southampton          n
4  male   28.0   0                 0    ... Third unknown Queenstown     y
[5 rows x 9 columns]
Metadata about the dataset
        age       n_siblings_spouses parch fare
count   627.000000 627.000000 627.000000 627.000000
mean    29.631308   0.545455   0.379585    34.385399
std     12.511818   1.151090   0.792999    54.597730
min     0.750000    0.000000   0.000000    0.000000
25%     23.000000   0.000000   0.000000    7.895800
50%     28.000000   0.000000   0.000000    15.045800
75%     35.000000   1.000000   0.000000   31.387500
max     80.000000   8.000000   5.000000   512.329200
Dimensions of the data
627 264

Explanation

  • The dataset contains a training set and an evaluation set.
  • The dftrain and y_train are the training set.
  • It is used by the model to learn the features and patterns.
  • The model is tested with the eval set, dfeval, and y_eval.
  • Certain summary statistics of the data are obtained, and displayed on the console.

Updated on: 25-Feb-2021

52 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements