# mx.nd.rmspropalex.update¶

## Description¶

Update function for RMSPropAlex optimizer.

RMSPropAlex is non-centered version of RMSProp.

Define $$E[g^2]_t$$ is the decaying average over past squared gradient and $$E[g]_t$$ is the decaying average over past gradient.

$\begin{split}E[g^2]_t = \gamma_1 * E[g^2]_{t-1} + (1 - \gamma_1) * g_t^2\\ E[g]_t = \gamma_1 * E[g]_{t-1} + (1 - \gamma_1) * g_t\\ \Delta_t = \gamma_2 * \Delta_{t-1} - \frac{\eta}{\sqrt{E[g^2]_t - E[g]_t^2 + \epsilon}} g_t\\\end{split}$

The update step is

$\theta_{t+1} = \theta_t + \Delta_t$

The RMSPropAlex code follows the version in http://arxiv.org/pdf/1308.0850v5.pdf Eq(38) - Eq(45) by Alex Graves, 2013.

Graves suggests the momentum term $$\gamma_1$$ to be 0.95, $$\gamma_2$$ to be 0.9 and the learning rate $$\eta$$ to be 0.0001.

## Arguments¶

Argument

Description

weight

NDArray-or-Symbol.

Weight

grad

NDArray-or-Symbol.

n

NDArray-or-Symbol n

g

NDArray-or-Symbol g

delta

NDArray-or-Symbol delta

lr

float, required.

Learning rate

gamma1

float, optional, default=0.949999988.

Decay rate.

gamma2

float, optional, default=0.899999976.

Decay rate.

epsilon

float, optional, default=9.99999994e-09.

A small constant for numerical stability.

wd

float, optional, default=0.

Weight decay augments the objective function with a regularization term that penalizes large weights. The penalty scales with the square of the magnitude of each weight.

rescale.grad

float, optional, default=1.

clip.gradient

float, optional, default=-1.

clip.weights
out The result mx.ndarray