What is Gradient Descent?
Gradient Descent — An optimization algorithm used to minimize the error in a neural network by adjusting weights iteratively.
Gradient descent is how neural networks learn. It computes the slope (gradient) of the error surface and takes small steps downhill toward the minimum error. Learning rate controls step size — too large overshoots, too small never converges.
Frequently Asked Questions
What variants of gradient descent exist?
Batch (uses all data per step), stochastic (one sample per step), and mini-batch (small batches per step). Adam, AdaGrad, and RMSprop are adaptive variants that adjust learning rates automatically.
What is a learning rate?
The step size in gradient descent. It controls how much weights change with each update. It is the single most important hyperparameter in neural network training.
Can gradient descent get stuck?
It can converge to local minima or saddle points. Modern optimizers like Adam and techniques like learning rate scheduling help escape these traps.