The lessons of this course help you mastering the use of decision trees and random forests for your data analysis projects. You will learn how to address classification and regression problems with decision trees and random forests. The course focuses on decision tree classifiers and random forest classifiers because most of the successful machine learning applications appear to be classification problems.
Focusing on classification problems, the course uses the DecisionTreeClassifier and RandomForestClassifier methods of Python’s Scikit-learn library to explain all the details you need for understanding decision trees and random forests. It also explains and demonstrates Scikit-learn's DecisionTreeRegressor and RandomForestRegressor methods to adress regression problems. It prepares you for using decision trees and random forests to make predictions and understanding the predictive structure of data sets.
Learn how decision trees and random forests make their predictions.
Learn how to use Scikit-learn for prediction with decision trees and random forests and for understanding the predictive structure of data sets.
Learn how to do your own prediction project with decision trees and random forests using Scikit-learn.
Learn about each parameter of Scikit-learn’s methods DecisonTreeClassifier and RandomForestClassifier to define your decision tree or random forest.
Learn using the output of Scikit-learn’s DecisonTreeClassifier and RandomForestClassifier methods to investigate and understand your predictions.
Learn about how to work with imbalanced class values in the data and how noisy data can affect random forests’ prediction performance.
Growing decision trees: node splitting, node impurity, Gini diversity, entropy, mean squared and absolute error, Poisson deviance, feature thresholds.
Improving decision trees: cross-validation, grid/randomized search, tuning and minimal cost-complexity pruning, evaluating feature importance.
You should be comfortable with reading and following Python code in Jupyter notebooks representing data descriptions, estimation or model fitting and data analysis output (using Python libraries: pandas, numpy, scikit-learn, matplotlib).