Auto Machine Learning Python Equivalent code explained


Introduction

Machine learning is a rapidly developing field, and fresh techniques and algorithms are being created all the time. Yet, creating and enhancing machine learning models may be a time-consuming and challenging task that necessitates a high degree of expertise. Automated machine learning, commonly known as autoML, aims to streamline the creation and optimization of machine learning models by automating a number of labor-intensive tasks such as feature engineering, hyperparameter tweaking, and model selection.

Built on top of scikit-learn, one of the most well-known machine learning libraries in Python, auto-sklearn is a potent open-source framework for automated machine learning. It effectively searches the space of potential machine learning pipelines and automatically identifies the optimum model and hyperparameters for a given dataset using Bayesian optimization and meta-learning. The usage of Auto-sklearn in Python will be introduced in this tutorial, along with instructions on how to install it, import data, do data preparation, create and train models, and assess model effectiveness. Even novices can create powerful machine learning models fast and simply using Auto-sklearn.

Ways to handle errors in node-red

Auto-sklearn

The creation and continual enhancement of machine learning models are automated using the efficient open-source software program Auto-sklearn. The ideal model and hyperparameters for a particular dataset are autonomously found using Bayesian optimization and meta-learning, which itself is based on the well-known machine learning program scikit-learn. increase.

Only a handful of the several applications autosklearn has created for classification and regression issues include natural language processing, picture classification, and time series prediction.

The library operates by doing a search across the set of potential machine learning pipelines, which comprise feature engineering, model selection, and data preparation processes. It effectively searches this space using Bayesian optimization, and it continuously improves its search efficiency by learning from previous tests using meta-learning.

Moreover, Auto-sklearn offers a number of potent features including dynamic ensemble selection, automated model ensembling, and active learning. Moreover, it offers simple-to-use APIs for developing, testing, and training models.

AutoML Code

Let's use Auto-sklearn to examine the AutoML code in more detail now. We will use the digits dataset from scikit-learn, which is a dataset of handwritten digits. Predicting the digit from a picture of the digit is the objective. This is the code −

Program

import autosklearn.classification
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

# Load the dataset
X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

# Create and fit the AutoML model
automl = autosklearn.classification.AutoSklearnClassifier(time_left_for_this_task=180, per_run_time_limit=30)
automl.fit(X_train, y_train)

# Evaluate the model on the test set
print("Accuracy:", automl.score(X_test, y_test))

Output

Accuracy: 0.9866666666666667

Code Explanation

This program classifies handwritten digits from the MNIST dataset using automated machine learning (AutoML), which includes the use of the Auto-sklearn module. Here's a brief rundown of the code −

  • Importing the AutoSklearnClassifier class from the autosklearn.classification module, which contains the AutoML classification model that will be utilized, imports the autosklearn.classification module.

  • Coming from sklearn.datasets import load digits: This imports the MNIST dataset's load digits function from the sklearn.datasets package.

  • Model selection from sklearn The MNIST dataset is divided into training and testing sets using the train test split function from the sklearn.model selection module, which is imported here.

  • The MNIST dataset is loaded, the input features are stored in X, and the corresponding labels are stored in y. X, y = load digits(return X y=True): This loads the MNIST dataset.

  • X train, X test, y train, y test = train test split(X, y, random state=1) splits the dataset into training and testing sets in a ratio of 75:25 and sets the random seed to 1 for repeatability.

  • Automl is equal to autosklearn.classification. AutoSklearnClassifier (per run time limit = 30, time left for this task = 180): The AutoML model which is going to be trained on the MNIST dataset is formed by doing this as an instance of the AutoSklearnClassifier class. The per-run time restriction indicates the maximum time (in seconds) that each individual model may run, while the remaining time for this job shows the maximum time (in seconds) that the AutoML process may run.

  • The AutoSklearnClassifier model is trained using the training set X train and associated labels Y train through the use of the automl.fit function (X train, y train).

  • accuracy:", print(X test, y test), automl.score This determines the accuracy of the AutoSklearnClassifier model on the test set after evaluating its performance on the X test and Y test-related labels. The score method gives the model's accuracy on the given dataset.

The aforementioned code implements the AutoML method, a machine-learning technique that automates every step of the model-building process, involving feature selection, hyperparameter adjustment, and data preparation. Even non-experts can create powerful models thanks to AutoML, which decreases the amount of manual labor needed to create machine learning models.

The required libraries, such as pandas, numpy, sklearn, and tpot, are first imported into the code. Sklearn is used for machine learning tasks like data preprocessing, model selection, and evaluation, Pandas is used for data manipulation, and NumPy is used for numerical calculations. The primary library used to implement the AutoML algorithm is TPOT.

The dataset is then loaded using the pandas read csv function, and the input features and output labels are separated into different variables. The 'y' variable holds the labeling for the output, whereas the 'X' variable stores the characteristics for the input.

In order to fit the data and produce a machine-learning model, the code first loads the dataset before creating an instance of the TPOTRegressor class. A subclass of the TPOTBase class called TPOTRegressor uses a genetic algorithm to choose features and tune hyperparameters. Regression difficulties are handled by the TPOTRegressor class, whereas classification issues are handled by the TPOTClassifier class.

Use Sklearn's train-test-split method to divide the dataset into training and testing sets. The data is divided into two sets as is common practice in machine learning: a training set for fitting the model and a testing set for assessing the model's performance.

Once the data has been split, the TPOTRegressor instance's fit method is invoked, which adjusts the model to the training data. The best feature subsets and hyperparameters for the given data are found using a genetic algorithm via the fit technique. The best model is then brought back.

The model's performance on the testing set is then assessed by the code using the scoring method, which determines the model's accuracy. A score of accuracy indicates how well the model fits the data, with values nearer 1 suggesting a better fit.

The best model is then exported using the export function to a python file, along with its accuracy score for the testing set.

Conclusion

To sum up, Auto-sklearn is a strong library that streamlines the creation and improvement of machine learning models. By automatically looking for the optimum model and hyperparameters for a given dataset, it can save time and effort. An introduction of using Auto-sklearn in Python has been given in this tutorial, along with instructions on how to install it, import data, prepare data, create and train models, and assess model performance. Even novices can create powerful machine learning models fast and simply using Auto-sklearn.

Updated on: 13-Apr-2023

94 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements