What is Weight Decay?

Weight Decay — A regularization technique used during training to prevent overfitting by penalizing large weights.

Weight decay adds a small penalty for large weight values during training, encouraging the model to keep weights small. This acts as a regularizer that prevents overfitting by discouraging the model from relying too heavily on any single feature.

Frequently Asked Questions

How does weight decay prevent overfitting?

By penalizing large weights, it forces the model to find simpler solutions that generalize better. Complex, overfitted models tend to have extreme weight values.

What weight decay value should I use?

Common values range from 0.01 to 0.1. Start with 0.01 for fine-tuning LLMs. Higher values provide stronger regularization but may prevent the model from learning complex patterns.

Is weight decay the same as L2 regularization?

They are closely related but not identical. In standard SGD they are equivalent. With adaptive optimizers like Adam, decoupled weight decay (AdamW) produces different and generally better results.

← Back to Glossary

Enterprise Diagnostics

Where does your
organization stand?

Take our comprehensive 5-minute readiness assessment to uncover critical gaps across Strategy, Data, Infrastructure, Governance, and Workforce.