mx.opt.adadelta

Description

Create an AdaDelta optimizer with respective parameters.

AdaDelta optimizer as described in Zeiler, M. D. (2012). ADADELTA: An adaptive learning rate method. http://arxiv.org/abs/1212.5701

Usage

mx.opt.adadelta(

  rho = 0.9,

  epsilon = 1e-05,

  wd = 0,

  rescale.grad = 1,

  clip_gradient = -1

)

Arguments

Argument

Description

rho

float, default=0.90.

Decay rate for both squared gradients and delta x.

epsilon

float, default=1e-5.

The constant as described in the thesis.

wd

float, default=0.0.

L2 regularization coefficient add to all the weights.

rescale.grad

float, default=1.

rescaling factor of gradient.

clip_gradient

float, default=-1 (no clipping if < 0).

clip gradient in range [-clip_gradient, clip_gradient].