Back propagation

source

Backpropagation is a key algorithm used in training neural networks. It involves adjusting the weights of the network to minimize the difference between the actual output and the predicted output of the network. This adjustment process typically follows these steps:

Forward Pass: The input data is passed through the network, and an output is produced.
Loss Calculation: The error or loss is computed, often using a loss calculation that measures how far the prediction is from the actual result.
Backward Pass: Using the chain rule of calculus, the gradient of the loss calculation with respect to each weight is calculated.
Weight Update: The weights are updated in the direction that reduces the loss, using an optimization algorithm like stochastic gradient descent.

Terms

Chain rule

\frac{d z}{d x} = \frac{d z}{d y} \cdot \frac{d y}{d x}

source

Example: If a car travels twice as fast as a bicycle $\frac{d z}{d y} = 2$ and the bicycle is four times as fast as a walking man $\frac{d y}{d x}$ = 4, then the car travels 8 times as fast as the man $\frac{d y}{d x}$ .

The chain rule is applied across the network layers to propagate the error gradient from the output, back through every layer, to the input, allowing the network to update its weights to minimize loss (Gradient descent)

Derivative

In calculus: it is the rate at which the calculation changes at any given point.

In machine learning: it is applied to analyze and predict change based on parameter adjustment.

Gradient

Gradients guide the model’s learning process. It represents the derivative of the loss calculation with respect to the model’s weights and biases. The gradient indicates how much change occurs from the adjustment of a parameter weight or bias.

High gradient means parameter updates might be large, leading to large convergence. If gradient is too high, it can cause overshooting.

Low gradient result in smaller parameter updates. These means slower convergence, but provides more stable and controlled learning progress.

The goal is to find a balance between learning rate and gradient size.

Gradual Notes

Recent Notes

Aporia

Know thyself

Reader-generated essays

Recent Writing

Vibe Coding a Multi Agent System

Substrate

Back propagation

Terms

Chain rule

Derivative

Gradient

Recent Notes

Technical

Pattern

Sapling

Seed

Aporia

Graph View

Table of Contents

Backlinks