mx.opt.adagrad
¶
Description¶
Create an AdaGrad optimizer with respective parameters. AdaGrad optimizer of Duchi et al., 2011,
This code follows the version in http://arxiv.org/pdf/1212.5701v1.pdf Eq(5) by Matthew D. Zeiler, 2012. AdaGrad will help the network to converge faster in some cases.
Usage¶
mx.opt.adagrad(
learning.rate = 0.05,
epsilon = 1e-08,
wd = 0,
rescale.grad = 1,
clip_gradient = -1,
lr_scheduler = NULL
)
Arguments¶
Argument |
Description |
---|---|
|
float, default=0.05. Step size. |
|
float, default=1e-8 |
|
float, default=0.0. L2 regularization coefficient add to all the weights. |
|
float, default=1.0. rescaling factor of gradient. |
|
float, default=-1.0 (no clipping if < 0). clip gradient in range [-clip_gradient, clip_gradient]. |
|
function, optional. The learning rate scheduler. |