Index update count.
Weight decay augments the objective function with a regularization term that penalizes large weights. The penalty scales with the square of the magnitude of each weight.
The decay rate for the 1st moment estimates.
The decay rate for the 2nd moment estimates.
Whether to use bias correction.
Clip gradient to the range of [-clip_gradient, clip_gradient] If clip_gradient <= 0, gradient clipping is turned off. grad = max(min(grad, clip_gradient), -clip_gradient).
A small constant for numerical stability.
Rescale gradient to grad = rescale_grad*grad.