Understand Vanishing and Exploding Gradients Problem in Recurrent Neural Network & Some technics to Mitigate them
This article is a comprehensive overview to understand vanishing and exploding gradients problem and some technics to mitigate them for a better model.
Introduction
A Recurrent Neural Network is made up of memory cells unrolled through time, where the output to the previous time instance is used as input to the next time instance, just like in a regular feed-forward neural network where the output of the previous layer is fed as input to the next layer. In the recurrent neural networks, the number of layers correspond to the number of time instance existing in the dataset, which means that they can very deep. The action of enrolling a layer of recurrent network to as many time instances as there are time periods in the data is what allows the neural network to learn from the past. So, how is it possible to train such network and get good performance while mitigating the problem of exploding gradients and vanishing gradients?
In this article, we will cover the following topics:
- Vanishing gradients and exploding gradients problem in RNNs.
- Some solutions to mitigate vanishing gradients and exploding gradients problem.