- Trending Categories
- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP

- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who

The idea behind using gradient descent is to minimize the loss when in various machine learning algorithms. Mathematically speaking, the local minimum of a function is obtained.

To implement this, a set of parameters are defined, and they need to be minimized. Once the parameters are assigned coefficients, the error or loss is calculated. Next, the weights are updated to ensure that the error is minimized. Instead of parameters, weak learners can be users, such as decision trees.

Once the loss is calculated, gradient descent is performed, and tree is added to the algorithm step wise, so that loss is minimal.

Some examples includes coefficient parameters in linear regression or making sure that optimal weights are used in a machine learning algorithm.

There are different types of gradient descent algorithms and some of them have been discussed below.

It is a type of gradient descent algorithm that processes all training data set for every iteration of the algorithm’s run.

If the number of training data is huge, batch gradient descent is computationally expensive. Hence, it wouldn’t be preferred to use batch gradient descent when the dataset is large.

In such cases, if the number of training examples is large, then stochastic gradient descent or mini-batch gradient descent is preferred.

This algorithm processes one training sample in every iteration. The parameters get updated after every iteration since only one data sample is worked on in every iteration.

It is quicker in comparison to batch gradient descent. The overhead is high if the number of training samples in the dataset is large.

This is because the number of iterations would be high and the amount of time taken would also be high.

This gradient descent algorithm works better than batch gradient descent and stochastic gradient descent. Here, *‘b’* number of examples are processed in every iteration, where *b<m*.

The value ‘m’ refers to the total number of training examples in the dataset.The value ‘b’ is a value less than ‘m’. If the number of training examples is high, data is processed in batches, where every batch would contain ‘b’ training examples in one iteration.

Mini batch gradient descent works well with large training examples in reduced number of iterations.

- Related Questions & Answers
- What are the different kinds of advertising frauds?
- Machine Learning – The Intelligent Machine
- What is Q-learning with respect to reinforcement learning in Machine Learning?
- What are layers in a Neural Network with respect to Deep Learning in Machine Learning?
- Why are Neural Networks needed in Machine Learning?
- What is a Neural Network in Machine Learning?
- What is time series with respect to Machine Learning?
- Introduction To Machine Learning using Python
- What are the effective methods of Teaching and Learning?
- Learning Model Building in Scikit-learn: A Python Machine Learning Library
- What are the Routing Algorithms in Computer Network?
- What are the prerequisites for learning Java?
- What are the prerequisites for learning C#?
- Explain what a neuron is, in terms of Neural Network in Machine Learning.
- What is a Perceptron? What are its limitations? How can these limitations be overcome in Machine Learning?

Advertisements