Backpropagation is a key algorithm used in training neural networks. It involves adjusting the weights of the network to minimize the difference between the actual output and the predicted output of the network. This adjustment process typically follows these steps:
- Forward Pass: The input data is passed through the network, and an output is produced.
- Loss Calculation: The error or loss is computed, often using a loss calculation that measures how far the prediction is from the actual result.
- Backward Pass: Using the chain rule of calculus, the gradient of the loss calculation with respect to each weight is calculated.
- Weight Update: The weights are updated in the direction that reduces the loss, using an optimization algorithm like stochastic gradient descent.
Terms
Chain rule
Example: If a car travels twice as fast as a bicycle and the bicycle is four times as fast as a walking man = 4, then the car travels 8 times as fast as the man .
The chain rule is applied across the network layers to propagate the error gradient from the output, back through every layer, to the input, allowing the network to update its weights to minimize loss (Gradient descent)
Derivative
In calculus: it is the rate at which the calculation changes at any given point.
In machine learning: it is applied to analyze and predict change based on parameter adjustment.
Gradient
Gradients guide the model’s learning process. It represents the derivative of the loss calculation with respect to the model’s weights and biases. The gradient indicates how much change occurs from the adjustment of a parameter weight or bias.
High gradient means parameter updates might be large, leading to large convergence. If gradient is too high, it can cause overshooting.
Low gradient result in smaller parameter updates. These means slower convergence, but provides more stable and controlled learning progress.
The goal is to find a balance between learning rate and gradient size.
See also: Vector, How humans learn, Convergence and Divergence