How does Short Term Memory in machine learning work?


Introduction

LSTM, which stands for Long Short-Term Memory, is an advanced form of recurrent neural network (RNN) specifically designed to analyze sequential data like text, speech, and time series. Unlike conventional RNNs, which struggle to capture long-term dependencies in data, LSTMs excel in understanding and predicting patterns within sequences.

Conventional RNNs face a significant challenge in retaining crucial information as they process sequences over time. This limitation hampers their ability to make accurate predictions based on long-term memory. LSTM was developed to overcome this hurdle by enabling the network to store and maintain information for extended periods.

Structure of an LSTM Network

An LSTM network's architecture is made up of memory cells and several gates that control information flow. The forget gate, which regulates whether information should be preserved or destroyed, is an important gate. The input gate regulates the insertion of fresh data to the memory cells. The forget gate in an LSTM network is crucial in determining which information is saved and which is ignored from the cell state. It accepts two inputs: xt, which represents the current time step input, and ht-1, which represents the output of the previous cell. To create a binary output, these inputs are multiplied by weight matrices (Wf), added to a bias term (bf), and then sent via a sigmoid activation function ().

The equation for the forget gate is −

f_t = σ(W_f · [h_t-1, x_t] + b_f)

In the following equation −

  • Wf is the forget gate's weight matrix, which helps us the importance of inputs in selecting which information to discard.

  • [ht-1, xt] denotes the combination of the prior hidden state (ht-1) and the current input (xt), taking into account information from both time steps.

  • The bias term for the forget gate, which allows the network to alter its behavior, is bf.

  • represents the sigmoid activation function, which converts the weighted sum of inputs to a value between 0 and 1. This value specifies whether each element of the cell state should be forgotten or retained.

Input Gate

An input gate controls the flow of information into the Long Short-Term Memory (LSTM) architecture. It determines which input elements to save and which to ignore.

LSTM input gates are calculated with sigmoid activation functions, and their contents are: the current input and the prior hidden state. These components are combined to calculate the input gate activation.

  • Current Input − At the current time step, the current input is the new input. A word, a feature vector, or any other relevant input to the LSTM might be used. This input is multiplied by a weight matrix before being merged with other components.

  • Previous Hidden State − The information stored in the LSTM cell from the previous time step is represented by the previous hidden state. It records the previous context and determines how much of the current input should be preserved. The prior concealed state is multiplied by another weight matrix before being mixed with additional components.

  • Bias Term − To the weighted sum of the current input and the prior hidden state, a bias term is added. This bias aids in the adjustment of the input gate's decision-making process.

  • Activation Function − A sigmoid activation function is applied to the weighted sum of the current input, prior hidden state, and bias term. The sigmoid function compresses the result to a value between 0 and 1, indicating whether the gate is open or closed. A value close to 1 indicates that the gate is open, allowing more information into the cell state, whereas a value close to 0 indicates that the gate is closed, preventing information from entering the cell state.

  • Cell State Update − The input gate's output, which is between 0 and 1, is element-wise multiplied with the cell state candidate. The additional information that might be contributed to the cell state is represented by the cell state candidate. The final product is subsequently added to the cell state, which is then updated with the necessary information.

The LSTM may learn to choose to keep or discard information from the input and the prior hidden state by altering the weights and biases associated with the input gate, allowing it to capture long-term dependencies and make intelligent judgements.

Output Gate

The output gate directs the extraction of important information from memory cells in order to generate the final output. LSTMs are especially useful for jobs involving context understanding and recording long-term dependencies. They can analyse and anticipate data sequences, which makes them useful for jobs like language translation, speech recognition, and time series forecasting.

The capacity of LSTMs to capture long-term dependencies by storing information in memory cells is one of their advantages. They address the issue of disappearing and bursting gradients that typical RNNs experience during lengthy sequence training. LSTMs can efficiently tackle these challenges by selectively recalling or forgetting information.

Limitations

LSTMs, however, have several drawbacks. They are less scalable for large datasets or resource-constrained applications because they are computationally more expensive. In order for LSTMs to perform well, more data and longer training cycles are required because of their computational complexity. Due to the sequential nature of LSTM processing, phrases or sequences cannot be parallelized easily.

Conclusion

LSTMs are used in a variety of sectors. Language modelling, machine translation, speech recognition, time series forecasting, anomaly detection, recommender systems, and video analysis activities such as object detection and activity recognition make extensive use of them. These applications benefit from the model's capacity to capture complicated patterns and relationships in sequential data by exploiting the characteristics of LSTMs.

Updated on: 17-Oct-2023

38 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements