Table Of Contents

Table Of Contents

`mx.opt.rmsprop`¶

Description¶

Create an RMSProp optimizer with respective parameters. Reference: Tieleman T, Hinton G. Lecture 6.5- Divide the gradient by a running average of its recent magnitude[J]. COURSERA: Neural Networks for Machine Learning, 2012, 4(2). The code follows: http://arxiv.org/pdf/1308.0850v5.pdf Eq(38) - Eq(45) by Alex Graves, 2013.

Create an RMSProp optimizer with respective parameters. Reference: Tieleman T, Hinton G. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude[J]. COURSERA: Neural Networks for Machine Learning, 2012, 4(2). The code follows: http://arxiv.org/pdf/1308.0850v5.pdf Eq(38) - Eq(45) by Alex Graves, 2013.

Usage¶

mx.opt.rmsprop(

  learning.rate = 0.002,

  centered = TRUE,

  gamma1 = 0.95,

  gamma2 = 0.9,

  epsilon = 1e-04,

  wd = 0,

  rescale.grad = 1,

  clip_gradient = -1,

  lr_scheduler = NULL

)

Arguments¶

Argument	Description
`learning.rate`	float, default=0.002. The initial learning rate.
`gamma1`	float, default=0.95. decay factor of moving average for gradient, gradient^2.
`gamma2`	float, default=0.9. “momentum” factor.
`epsilon`	float, default=1e-4
`wd`	float, default=0.0. L2 regularization coefficient add to all the weights.
`rescale.grad`	float, default=1.0. rescaling factor of gradient.
`clip_gradient`	float, optional, default=-1 (no clipping if < 0). clip gradient in range [-clip_gradient, clip_gradient].
`lr_scheduler`	function, optional. The learning rate scheduler.

Table Of Contents

mx.opt.rmsprop