The vanishing gradient problem occurs in deep neural networks when gradients become too small during backpropagation, causing slow convergence or stagnation in training, particularly in deeper layers. NVIDIA’s documentation on deep learning fundamentals, such as in CUDA and cuDNN guides, explains that this issue is common in architectures like RNNs or deep feedforward networks with certain activation functions (e.g., sigmoid). Techniques like ReLU activation, batch normalization, or residual connections (used in transformers) mitigate this problem. Option A (overfitting) is unrelated to gradients. Option B describes the exploding gradient problem, not vanishing gradients. Option C (underfitting) is a performance issue, not a gradient-related problem.
[References:, NVIDIA CUDA Documentation:https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html, Goodfellow, I., et al. (2016). "Deep Learning." MIT Press., , ]
Submit