Contrib Optimization API

Overview

This document summaries the contrib APIs used to initialize and update the model weights during training

mxnet.optimizer.contrib Contrib optimizers.

The Contrib Optimization API, defined in the optimizer.contrib package, provides many useful experimental APIs for new features. This is a place for the community to try out the new features, so that feature contributors can receive feedback.

Warning

This package contains experimental APIs and may change in the near future.

In the rest of this document, we list routines provided by the optimizer.contrib package.

Contrib

GroupAdaGrad Adagrad optimizer with row-wise learning rates.

API Reference

Contrib optimizers.

class mxnet.optimizer.contrib.GroupAdaGrad(eps=1e-05, **kwargs)[source]

Adagrad optimizer with row-wise learning rates.

This class implements the AdaGrad optimizer described in Adaptive Subgradient Methods for Online Learning and Stochastic Optimization, and available at http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf but uses only a single learning rate for every row of the parameter array.

This optimizer updates each weight by:

grad = clip(grad * rescale_grad, clip_gradient)
history += mean(square(grad), axis=1, keepdims=True)
div = grad / sqrt(history + float_stable_eps)
weight -= div * lr

Weights are updated lazily if the gradient is sparse.

For details of the update algorithm see group_adagrad_update.

This optimizer accepts the following parameters in addition to those accepted by Optimizer. Weight decay is not supported.

Parameters:eps (float, optional) – Initial value of the history accumulator. Avoids division by 0.