mx.opt.rmsprop
¶
Description¶
Create an RMSProp optimizer with respective parameters. Reference: Tieleman T, Hinton G. Lecture 6.5- Divide the gradient by a running average of its recent magnitude[J]. COURSERA: Neural Networks for Machine Learning, 2012, 4(2). The code follows: http://arxiv.org/pdf/1308.0850v5.pdf Eq(38) - Eq(45) by Alex Graves, 2013.
Create an RMSProp optimizer with respective parameters. Reference: Tieleman T, Hinton G. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude[J]. COURSERA: Neural Networks for Machine Learning, 2012, 4(2). The code follows: http://arxiv.org/pdf/1308.0850v5.pdf Eq(38) - Eq(45) by Alex Graves, 2013.
Usage¶
mx.opt.rmsprop(
learning.rate = 0.002,
centered = TRUE,
gamma1 = 0.95,
gamma2 = 0.9,
epsilon = 1e-04,
wd = 0,
rescale.grad = 1,
clip_gradient = -1,
lr_scheduler = NULL
)
Arguments¶
Argument |
Description |
---|---|
|
float, default=0.002. The initial learning rate. |
|
float, default=0.95. decay factor of moving average for gradient, gradient^2. |
|
float, default=0.9. “momentum” factor. |
|
float, default=1e-4 |
|
float, default=0.0. L2 regularization coefficient add to all the weights. |
|
float, default=1.0. rescaling factor of gradient. |
|
float, optional, default=-1 (no clipping if < 0). clip gradient in range [-clip_gradient, clip_gradient]. |
|
function, optional. The learning rate scheduler. |