# Model API¶

The model API provides a simplified way to train neural networks using common best practices. It’s a thin wrapper built on top of the ndarray and symbolic modules that make neural network training easy.

Topics:

## Train the Model¶

To train a model, perform two steps: configure the model using the symbol parameter, then call model.Feedforward.create to create the model. The following example creates a two-layer neural network.

    # configure a two layer neuralnetwork
data = mx.symbol.Variable('data')
fc1 = mx.symbol.FullyConnected(data, name='fc1', num_hidden=128)
act1 = mx.symbol.Activation(fc1, name='relu1', act_type='relu')
fc2 = mx.symbol.FullyConnected(act1, name='fc2', num_hidden=64)
softmax = mx.symbol.SoftmaxOutput(fc2, name='sm')
# create a model
model = mx.model.FeedForward.create(
softmax,
X=data_set,
num_epoch=num_epoch,
learning_rate=0.01)


You can also use the scikit-learn-style construct and fit function to create a model.

    # create a model using sklearn-style two-step way
model = mx.model.FeedForward(
softmax,
num_epoch=num_epoch,
learning_rate=0.01)

model.fit(X=data_set)


## Save the Model¶

After the job is done, save your work. To save the model, you can directly pickle it with Python. We also provide save and load functions.

    # save a model to mymodel-symbol.json and mymodel-0100.params
prefix = 'mymodel'
iteration = 100
model.save(prefix, iteration)



The advantage of these two save and load functions are that they are language agnostic. You should be able to save and load directly into cloud storage, such as Amazon S3 and HDFS.

## Periodic Checkpointing¶

We recommend checkpointing your model after each iteration. To do this, add a checkpoint callback do_checkpoint(path) to the function. The training process automatically checkpoints the specified location after each iteration.

    prefix='models/chkpt'
model = mx.model.FeedForward.create(
softmax,
X=data_set,
iter_end_callback=mx.callback.do_checkpoint(prefix),
...)


You can load the model checkpoint later using Feedforward.load.

## Use Multiple Devices¶

Set ctx to the list of devices that you want to train on.

    devices = [mx.gpu(i) for i in range(num_device)]
model = mx.model.FeedForward.create(
softmax,
X=dataset,
ctx=devices,
...)


Training occurs in parallel on the GPUs that you specify.

## Initializer API Reference¶

Weight initializer.

class mxnet.initializer.InitDesc[source]

Descriptor for the initialization pattern.

name : str
Name of variable.
attrs : dict of str to str
Attributes of this variable taken from Symbol.attr_dict.
global_init : Initializer
Global initializer to fallback to.
class mxnet.initializer.Initializer(**kwargs)[source]

The base class of an initializer.

set_verbosity(verbose=False, print_func=None)[source]

Switch on/off verbose mode

Parameters: verbose (bool) – switch on/off verbose mode print_func (function) – A function that computes statistics of initialized arrays. Takes an NDArray and returns an str. Defaults to mean absolute value str((|x|/size(x)).asscalar()).
dumps()[source]

Saves the initializer to string

Returns: JSON formatted string that describes the initializer. str

Examples

>>> # Create initializer and retrieve its parameters
...
>>> init = mx.init.Normal(0.5)
>>> init.dumps()
'["normal", {"sigma": 0.5}]'
>>> init = mx.init.Xavier(factor_type="in", magnitude=2.34)
>>> init.dumps()
'["xavier", {"rnd_type": "uniform", "magnitude": 2.34, "factor_type": "in"}]'

mxnet.initializer.register(klass)[source]

Registers a custom initializer.

Custom initializers can be created by extending mx.init.Initializer and implementing the required functions like _init_weight and _init_bias. The created initializer must be registered using mx.init.register before it can be called by name.

Parameters: klass (class) – A subclass of mx.init.Initializer that needs to be registered as a custom initializer.

Example

>>> # Create and register a custom initializer that
... # initializes weights to 0.1 and biases to 1.
...
>>> @mx.init.register
... @alias('myinit')
... class CustomInit(mx.init.Initializer):
...   def __init__(self):
...     super(CustomInit, self).__init__()
...   def _init_weight(self, _, arr):
...     arr[:] = 0.1
...   def _init_bias(self, _, arr):
...     arr[:] = 1
...
>>> # Module is an instance of 'mxnet.module.Module'
...
>>> module.init_params("custominit")
>>> # module.init_params("myinit")
>>> # module.init_params(CustomInit())

class mxnet.initializer.Load(param, default_init=None, verbose=False)[source]

Note Load will drop arg: or aux: from name and initialize the variables that match with the prefix dropped.

Parameters: param (str or dict of str->NDArray) – Parameter file or dict mapping name to NDArray. default_init (Initializer) – Default initializer when name is not found in param. verbose (bool) – Flag for enabling logging of source when initializing.
class mxnet.initializer.Mixed(patterns, initializers)[source]

Initialize parameters using multiple initializers.

Parameters: patterns (list of str) – List of regular expressions matching parameter names. initializers (list of Initializer) – List of initializers corresponding to patterns.

Example

>>> # Given 'module', an instance of 'mxnet.module.Module', initialize biases to zero
... # and every other parameter to random values with uniform distribution.
...
>>> init = mx.initializer.Mixed(['bias', '.*'], [mx.init.Zero(), mx.init.Uniform(0.1)])
>>> module.init_params(init)
>>>
>>> for dictionary in module.get_params():
...     for key in dictionary:
...         print(key)
...         print(dictionary[key].asnumpy())
...
fullyconnected1_weight
[[ 0.0097627   0.01856892  0.04303787]]
fullyconnected1_bias
[ 0.]

class mxnet.initializer.Zero[source]

Initializes weights to zero.

Example

>>> # Given 'module', an instance of 'mxnet.module.Module', initialize weights to zero.
...
>>> init = mx.initializer.Zero()
>>> module.init_params(init)
>>> for dictionary in module.get_params():
...     for key in dictionary:
...         print(key)
...         print(dictionary[key].asnumpy())
...
fullyconnected0_weight
[[ 0.  0.  0.]]

class mxnet.initializer.One[source]

Initializes weights to one.

Example

>>> # Given 'module', an instance of 'mxnet.module.Module', initialize weights to one.
...
>>> init = mx.initializer.One()
>>> module.init_params(init)
>>> for dictionary in module.get_params():
...     for key in dictionary:
...         print(key)
...         print(dictionary[key].asnumpy())
...
fullyconnected0_weight
[[ 1.  1.  1.]]

class mxnet.initializer.Constant(value)[source]

Initializes the weights to a scalar value.

Parameters: value (float) – Fill value.
class mxnet.initializer.Uniform(scale=0.07)[source]

Initializes weights with random values uniformly sampled from a given range.

Parameters: scale (float, optional) – The bound on the range of the generated random values. Values are generated from the range [-scale, scale]. Default scale is 0.07.

Example

>>> # Given 'module', an instance of 'mxnet.module.Module', initialize weights
>>> # to random values uniformly sampled between -0.1 and 0.1.
...
>>> init = mx.init.Uniform(0.1)
>>> module.init_params(init)
>>> for dictionary in module.get_params():
...     for key in dictionary:
...         print(key)
...         print(dictionary[key].asnumpy())
...
fullyconnected0_weight
[[ 0.01360891 -0.02144304  0.08511933]]

class mxnet.initializer.Normal(sigma=0.01)[source]

Initializes weights with random values sampled from a normal distribution with a mean of zero and standard deviation of sigma.

Parameters: sigma (float, optional) – Standard deviation of the normal distribution. Default standard deviation is 0.01.

Example

>>> # Given 'module', an instance of 'mxnet.module.Module', initialize weights
>>> # to random values sampled from a normal distribution.
...
>>> init = mx.init.Normal(0.5)
>>> module.init_params(init)
>>> for dictionary in module.get_params():
...     for key in dictionary:
...         print(key)
...         print(dictionary[key].asnumpy())
...
fullyconnected0_weight
[[-0.3214761  -0.12660924  0.53789419]]

class mxnet.initializer.Orthogonal(scale=1.414, rand_type='uniform')[source]

Initialize weight as orthogonal matrix.

This initializer implements Exact solutions to the nonlinear dynamics of learning in deep linear neural networks, available at https://arxiv.org/abs/1312.6120.

Parameters: scale (float optional) – Scaling factor of weight. rand_type (string optional) – Use “uniform” or “normal” random number to initialize weight.
class mxnet.initializer.Xavier(rnd_type='uniform', factor_type='avg', magnitude=3)[source]

Returns an initializer performing “Xavier” initialization for weights.

This initializer is designed to keep the scale of gradients roughly the same in all layers.

By default, rnd_type is 'uniform' and factor_type is 'avg', the initializer fills the weights with random numbers in the range of $$[-c, c]$$, where $$c = \sqrt{\frac{3.}{0.5 * (n_{in} + n_{out})}}$$. $$n_{in}$$ is the number of neurons feeding into weights, and $$n_{out}$$ is the number of neurons the result is fed to.

If rnd_type is 'uniform' and factor_type is 'in', the $$c = \sqrt{\frac{3.}{n_{in}}}$$. Similarly when factor_type is 'out', the $$c = \sqrt{\frac{3.}{n_{out}}}$$.

If rnd_type is 'gaussian' and factor_type is 'avg', the initializer fills the weights with numbers from normal distribution with a standard deviation of $$\sqrt{\frac{3.}{0.5 * (n_{in} + n_{out})}}$$.

Parameters: rnd_type (str, optional) – Random generator type, can be 'gaussian' or 'uniform'. factor_type (str, optional) – Can be 'avg', 'in', or 'out'. magnitude (float, optional) – Scale of random number.
class mxnet.initializer.MSRAPrelu(factor_type='avg', slope=0.25)[source]

Initialize the weight according to a MSRA paper.

This initializer implements Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, available at https://arxiv.org/abs/1502.01852.

This initializer is proposed for initialization related to ReLu activation, it maked some changes on top of Xavier method.

Parameters: factor_type (str, optional) – Can be 'avg', 'in', or 'out'. slope (float, optional) – initial slope of any PReLU (or similar) nonlinearities.
class mxnet.initializer.Bilinear[source]

Initialize weight for upsampling layers.

class mxnet.initializer.LSTMBias(forget_bias=1.0)[source]

Initialize all bias of an LSTMCell to 0.0 except for the forget gate whose bias is set to custom value.

Parameters: forget_bias (float, default 1.0) – bias for the forget gate. Jozefowicz et al. 2015 recommends setting this to 1.0.
class mxnet.initializer.FusedRNN(init, num_hidden, num_layers, mode, bidirectional=False, forget_bias=1.0)[source]

Initialize parameters for fused rnn layers.

Parameters: init (Initializer) – initializer applied to unpacked weights. Fall back to global initializer if None. num_hidden (int) – should be the same with arguments passed to FusedRNNCell. num_layers (int) – should be the same with arguments passed to FusedRNNCell. mode (str) – should be the same with arguments passed to FusedRNNCell. bidirectional (bool) – should be the same with arguments passed to FusedRNNCell. forget_bias (float) – should be the same with arguments passed to FusedRNNCell.

## Evaluation Metric API Reference¶

Online evaluation metric module.

class mxnet.metric.EvalMetric(name, output_names=None, label_names=None, **kwargs)[source]

Base class for all evaluation metrics.

Note

This is a base class that provides common metric interfaces. One should not use this class directly, but instead create new metric classes that extend it.

Parameters: name (str) – Name of this metric instance for display. output_names (list of str, or None) – Name of predictions that should be used when updating with update_dict. By default include all predictions. label_names (list of str, or None) – Name of labels that should be used when updating with update_dict. By default include all labels.
get_config()[source]

Save configurations of metric. Can be recreated from configs with metric.create(**config)

update_dict(label, pred)[source]

Update the internal evaluation with named label and pred

Parameters: labels (OrderedDict of str -> NDArray) – name to array mapping for labels. preds (list of NDArray) – name to array mapping of predicted outputs.
update(labels, preds)[source]

Parameters: labels (list of NDArray) – The labels of the data. preds (list of NDArray) – Predicted values.
reset()[source]

Resets the internal evaluation result to initial state.

get()[source]

Gets the current evaluation result.

Returns: names (list of str) – Name of the metrics. values (list of float) – Value of the evaluations.
get_name_value()[source]

Returns zipped name and value pairs.

Returns: A (name, value) tuple list. list of tuples
mxnet.metric.create(metric, *args, **kwargs)[source]

Creates evaluation metric from metric names or instances of EvalMetric or a custom metric function.

Parameters: metric (str or callable) – Specifies the metric to create. This argument must be one of the below: Name of a metric. An instance of EvalMetric. A list, each element of which is a metric or a metric name. An evaluation function that computes custom metric for a given batch of labels and predictions. *args (list) – Additional arguments to metric constructor. Only used when metric is str. **kwargs (dict) – Additional arguments to metric constructor. Only used when metric is str

Examples

>>> def custom_metric(label, pred):
...     return np.mean(np.abs(label - pred))
...
>>> metric1 = mx.metric.create('acc')
>>> metric2 = mx.metric.create(custom_metric)
>>> metric3 = mx.metric.create([metric1, metric2, 'rmse'])

class mxnet.metric.CompositeEvalMetric(metrics=None, name='composite', output_names=None, label_names=None)[source]

Manages multiple evaluation metrics.

Parameters: metrics (list of EvalMetric) – List of child metrics. name (str) – Name of this metric instance for display. output_names (list of str, or None) – Name of predictions that should be used when updating with update_dict. By default include all predictions. label_names (list of str, or None) – Name of labels that should be used when updating with update_dict. By default include all labels.

Examples

>>> predicts = [mx.nd.array([[0.3, 0.7], [0, 1.], [0.4, 0.6]])]
>>> labels   = [mx.nd.array([0, 1, 1])]
>>> eval_metrics_1 = mx.metric.Accuracy()
>>> eval_metrics_2 = mx.metric.F1()
>>> eval_metrics = mx.metric.CompositeEvalMetric()
>>> for child_metric in [eval_metrics_1, eval_metrics_2]:
>>> eval_metrics.update(labels = labels, preds = predicts)
>>> print eval_metrics.get()
(['accuracy', 'f1'], [0.6666666666666666, 0.8])

add(metric)[source]

Parameters: metric – A metric instance.
get_metric(index)[source]

Returns a child metric.

Parameters: index (int) – Index of child metric in the list of metrics.
update(labels, preds)[source]

Parameters: labels (list of NDArray) – The labels of the data. preds (list of NDArray) – Predicted values.
reset()[source]

Resets the internal evaluation result to initial state.

get()[source]

Returns the current evaluation result.

Returns: names (list of str) – Name of the metrics. values (list of float) – Value of the evaluations.
class mxnet.metric.Accuracy(axis=1, name='accuracy', output_names=None, label_names=None)[source]

Computes accuracy classification score.

The accuracy score is defined as

$\text{accuracy}(y, \hat{y}) = \frac{1}{n} \sum_{i=0}^{n-1} \text{1}(\hat{y_i} == y_i)$
Parameters: axis (int, default=1) – The axis that represents classes name (str) – Name of this metric instance for display. output_names (list of str, or None) – Name of predictions that should be used when updating with update_dict. By default include all predictions. label_names (list of str, or None) – Name of labels that should be used when updating with update_dict. By default include all labels.

Examples

>>> predicts = [mx.nd.array([[0.3, 0.7], [0, 1.], [0.4, 0.6]])]
>>> labels   = [mx.nd.array([0, 1, 1])]
>>> acc = mx.metric.Accuracy()
>>> acc.update(preds = predicts, labels = labels)
>>> print acc.get()
('accuracy', 0.6666666666666666)

update(labels, preds)[source]

Parameters: labels (list of NDArray) – The labels of the data with class indices as values, one per sample. preds (list of NDArray) – Prediction values for samples. Each prediction value can either be the class index, or a vector of likelihoods for all classes.
class mxnet.metric.TopKAccuracy(top_k=1, name='top_k_accuracy', output_names=None, label_names=None)[source]

Computes top k predictions accuracy.

TopKAccuracy differs from Accuracy in that it considers the prediction to be True as long as the ground truth label is in the top K predicated labels.

If top_k = 1, then TopKAccuracy is identical to Accuracy.

Parameters: top_k (int) – Whether targets are in top k predictions. name (str) – Name of this metric instance for display. output_names (list of str, or None) – Name of predictions that should be used when updating with update_dict. By default include all predictions. label_names (list of str, or None) – Name of labels that should be used when updating with update_dict. By default include all labels.

Examples

>>> np.random.seed(999)
>>> top_k = 3
>>> labels = [mx.nd.array([2, 6, 9, 2, 3, 4, 7, 8, 9, 6])]
>>> predicts = [mx.nd.array(np.random.rand(10, 10))]
>>> acc = mx.metric.TopKAccuracy(top_k=top_k)
>>> acc.update(labels, predicts)
>>> print acc.get()
('top_k_accuracy', 0.3)

update(labels, preds)[source]

Parameters: labels (list of NDArray) – The labels of the data. preds (list of NDArray) – Predicted values.
class mxnet.metric.F1(name='f1', output_names=None, label_names=None)[source]

Computes the F1 score of a binary classification problem.

The F1 score is equivalent to weighted average of the precision and recall, where the best value is 1.0 and the worst value is 0.0. The formula for F1 score is:

F1 = 2 * (precision * recall) / (precision + recall)


The formula for precision and recall is:

precision = true_positives / (true_positives + false_positives)
recall    = true_positives / (true_positives + false_negatives)


Note

This F1 score only supports binary classification.

Parameters: name (str) – Name of this metric instance for display. output_names (list of str, or None) – Name of predictions that should be used when updating with update_dict. By default include all predictions. label_names (list of str, or None) – Name of labels that should be used when updating with update_dict. By default include all labels.

Examples

>>> predicts = [mx.nd.array([[0.3, 0.7], [0., 1.], [0.4, 0.6]])]
>>> labels   = [mx.nd.array([0., 1., 1.])]
>>> acc = mx.metric.F1()
>>> acc.update(preds = predicts, labels = labels)
>>> print acc.get()
('f1', 0.8)

update(labels, preds)[source]

Parameters: labels (list of NDArray) – The labels of the data. preds (list of NDArray) – Predicted values.
class mxnet.metric.Perplexity(ignore_label, axis=-1, name='perplexity', output_names=None, label_names=None)[source]

Computes perplexity.

Perplexity is a measurement of how well a probability distribution or model predicts a sample. A low perplexity indicates the model is good at predicting the sample.

The perplexity of a model q is defined as

$b^{\big(-\frac{1}{N} \sum_{i=1}^N \log_b q(x_i) \big)} = \exp \big(-\frac{1}{N} \sum_{i=1}^N \log q(x_i)\big)$

where we let b = e.

$$q(x_i)$$ is the predicted value of its ground truth label on sample $$x_i$$.

For example, we have three samples $$x_1, x_2, x_3$$ and their labels are $$[0, 1, 1]$$. Suppose our model predicts $$q(x_1) = p(y_1 = 0 | x_1) = 0.3$$ and $$q(x_2) = 1.0$$, $$q(x_3) = 0.6$$. The perplexity of model q is $$exp\big(-(\log 0.3 + \log 1.0 + \log 0.6) / 3\big) = 1.77109762852$$.

Parameters: ignore_label (int or None) – Index of invalid label to ignore when counting. By default, sets to -1. If set to None, it will include all entries. axis (int (default -1)) – The axis from prediction that was used to compute softmax. By default use the last axis. name (str) – Name of this metric instance for display. output_names (list of str, or None) – Name of predictions that should be used when updating with update_dict. By default include all predictions. label_names (list of str, or None) – Name of labels that should be used when updating with update_dict. By default include all labels.

Examples

>>> predicts = [mx.nd.array([[0.3, 0.7], [0, 1.], [0.4, 0.6]])]
>>> labels   = [mx.nd.array([0, 1, 1])]
>>> perp = mx.metric.Perplexity(ignore_label=None)
>>> perp.update(labels, predicts)
>>> print perp.get()
('Perplexity', 1.7710976285155853)

update(labels, preds)[source]

Parameters: labels (list of NDArray) – The labels of the data. preds (list of NDArray) – Predicted values.
get()[source]

Returns the current evaluation result.

Returns: Representing name of the metric and evaluation result. Tuple of (str, float)
class mxnet.metric.MAE(name='mae', output_names=None, label_names=None)[source]

Computes Mean Absolute Error (MAE) loss.

The mean absolute error is given by

$\frac{\sum_i^n |y_i - \hat{y}_i|}{n}$
Parameters: name (str) – Name of this metric instance for display. output_names (list of str, or None) – Name of predictions that should be used when updating with update_dict. By default include all predictions. label_names (list of str, or None) – Name of labels that should be used when updating with update_dict. By default include all labels.

Examples

>>> predicts = [mx.nd.array(np.array([3, -0.5, 2, 7]).reshape(4,1))]
>>> labels = [mx.nd.array(np.array([2.5, 0.0, 2, 8]).reshape(4,1))]
>>> mean_absolute_error = mx.metric.MAE()
>>> mean_absolute_error.update(labels = labels, preds = predicts)
>>> print mean_absolute_error.get()
('mae', 0.5)

update(labels, preds)[source]

Parameters: labels (list of NDArray) – The labels of the data. preds (list of NDArray) – Predicted values.
class mxnet.metric.MSE(name='mse', output_names=None, label_names=None)[source]

Computes Mean Squared Error (MSE) loss.

The mean squared error is given by

$\frac{\sum_i^n (y_i - \hat{y}_i)^2}{n}$
Parameters: name (str) – Name of this metric instance for display. output_names (list of str, or None) – Name of predictions that should be used when updating with update_dict. By default include all predictions. label_names (list of str, or None) – Name of labels that should be used when updating with update_dict. By default include all labels.

Examples

>>> predicts = [mx.nd.array(np.array([3, -0.5, 2, 7]).reshape(4,1))]
>>> labels = [mx.nd.array(np.array([2.5, 0.0, 2, 8]).reshape(4,1))]
>>> mean_squared_error = mx.metric.MSE()
>>> mean_squared_error.update(labels = labels, preds = predicts)
>>> print mean_squared_error.get()
('mse', 0.375)

update(labels, preds)[source]

Parameters: labels (list of NDArray) – The labels of the data. preds (list of NDArray) – Predicted values.
class mxnet.metric.RMSE(name='rmse', output_names=None, label_names=None)[source]

Computes Root Mean Squred Error (RMSE) loss.

The root mean squared error is given by

$\sqrt{\frac{\sum_i^n (y_i - \hat{y}_i)^2}{n}}$
Parameters: name (str) – Name of this metric instance for display. output_names (list of str, or None) – Name of predictions that should be used when updating with update_dict. By default include all predictions. label_names (list of str, or None) – Name of labels that should be used when updating with update_dict. By default include all labels.

Examples

>>> predicts = [mx.nd.array(np.array([3, -0.5, 2, 7]).reshape(4,1))]
>>> labels = [mx.nd.array(np.array([2.5, 0.0, 2, 8]).reshape(4,1))]
>>> root_mean_squared_error = mx.metric.RMSE()
>>> root_mean_squared_error.update(labels = labels, preds = predicts)
>>> print root_mean_squared_error.get()
('rmse', 0.612372457981)

update(labels, preds)[source]

Parameters: labels (list of NDArray) – The labels of the data. preds (list of NDArray) – Predicted values.
class mxnet.metric.CrossEntropy(eps=1e-12, name='cross-entropy', output_names=None, label_names=None)[source]

Computes Cross Entropy loss.

The cross entropy over a batch of sample size $$N$$ is given by

$-\sum_{n=1}^{N}\sum_{k=1}^{K}t_{nk}\log (y_{nk}),$

where $$t_{nk}=1$$ if and only if sample $$n$$ belongs to class $$k$$. $$y_{nk}$$ denotes the probability of sample $$n$$ belonging to class $$k$$.

Parameters: eps (float) – Cross Entropy loss is undefined for predicted value is 0 or 1, so predicted values are added with the small constant. name (str) – Name of this metric instance for display. output_names (list of str, or None) – Name of predictions that should be used when updating with update_dict. By default include all predictions. label_names (list of str, or None) – Name of labels that should be used when updating with update_dict. By default include all labels.

Examples

>>> predicts = [mx.nd.array([[0.3, 0.7], [0, 1.], [0.4, 0.6]])]
>>> labels   = [mx.nd.array([0, 1, 1])]
>>> ce = mx.metric.CrossEntropy()
>>> ce.update(labels, predicts)
>>> print ce.get()
('cross-entropy', 0.57159948348999023)

update(labels, preds)[source]

Parameters: labels (list of NDArray) – The labels of the data. preds (list of NDArray) – Predicted values.
class mxnet.metric.NegativeLogLikelihood(eps=1e-12, name='nll-loss', output_names=None, label_names=None)[source]

Computes the negative log-likelihood loss.

The negative log-likelihoodd loss over a batch of sample size $$N$$ is given by

$-\sum_{n=1}^{N}\sum_{k=1}^{K}t_{nk}\log (y_{nk}),$

where $$K$$ is the number of classes, $$y_{nk}$$ is the prediceted probability for $$k$$-th class for $$n$$-th sample. $$t_{nk}=1$$ if and only if sample $$n$$ belongs to class $$k$$.

Parameters: eps (float) – Negative log-likelihood loss is undefined for predicted value is 0, so predicted values are added with the small constant. name (str) – Name of this metric instance for display. output_names (list of str, or None) – Name of predictions that should be used when updating with update_dict. By default include all predictions. label_names (list of str, or None) – Name of labels that should be used when updating with update_dict. By default include all labels.

Examples

>>> predicts = [mx.nd.array([[0.3, 0.7], [0, 1.], [0.4, 0.6]])]
>>> labels   = [mx.nd.array([0, 1, 1])]
>>> nll_loss = mx.metric.NegativeLogLikelihood()
>>> nll_loss.update(labels, predicts)
>>> print nll_loss.get()
('nll-loss', 0.57159948348999023)

update(labels, preds)[source]

Parameters: labels (list of NDArray) – The labels of the data. preds (list of NDArray) – Predicted values.
class mxnet.metric.PearsonCorrelation(name='pearsonr', output_names=None, label_names=None)[source]

Computes Pearson correlation.

The pearson correlation is given by

$\frac{cov(y, \hat{y})}{\sigma{y}\sigma{\hat{y}}}$
Parameters: name (str) – Name of this metric instance for display. output_names (list of str, or None) – Name of predictions that should be used when updating with update_dict. By default include all predictions. label_names (list of str, or None) – Name of labels that should be used when updating with update_dict. By default include all labels.

Examples

>>> predicts = [mx.nd.array([[0.3, 0.7], [0, 1.], [0.4, 0.6]])]
>>> labels   = [mx.nd.array([[1, 0], [0, 1], [0, 1]])]
>>> pr = mx.metric.PearsonCorrelation()
>>> pr.update(labels, predicts)
>>> print pr.get()
('pearson-correlation', 0.42163704544016178)

update(labels, preds)[source]

Parameters: labels (list of NDArray) – The labels of the data. preds (list of NDArray) – Predicted values.
class mxnet.metric.Loss(name='loss', output_names=None, label_names=None)[source]

Dummy metric for directly printing loss.

Parameters: name (str) – Name of this metric instance for display. output_names (list of str, or None) – Name of predictions that should be used when updating with update_dict. By default include all predictions. label_names (list of str, or None) – Name of labels that should be used when updating with update_dict. By default include all labels.
class mxnet.metric.Torch(name='torch', output_names=None, label_names=None)[source]

Dummy metric for torch criterions.

class mxnet.metric.Caffe(name='caffe', output_names=None, label_names=None)[source]

Dummy metric for caffe criterions.

class mxnet.metric.CustomMetric(feval, name=None, allow_extra_outputs=False, output_names=None, label_names=None)[source]

Computes a customized evaluation metric.

The feval function can return a tuple of (sum_metric, num_inst) or return an int sum_metric.

Parameters: feval (callable(label, pred)) – Customized evaluation function. name (str) – The name of the metric. (the default is None). allow_extra_outputs (bool, optional) – If true, the prediction outputs can have extra outputs. This is useful in RNN, where the states are also produced in outputs for forwarding. (the default is False). name – Name of this metric instance for display. output_names (list of str, or None) – Name of predictions that should be used when updating with update_dict. By default include all predictions. label_names (list of str, or None) – Name of labels that should be used when updating with update_dict. By default include all labels.

Examples

>>> predicts = [mx.nd.array(np.array([3, -0.5, 2, 7]).reshape(4,1))]
>>> labels = [mx.nd.array(np.array([2.5, 0.0, 2, 8]).reshape(4,1))]
>>> feval = lambda x, y : (x + y).mean()
>>> eval_metrics = mx.metric.CustomMetric(feval=feval)
>>> eval_metrics.update(labels, predicts)
>>> print eval_metrics.get()
('custom()', 6.0)

update(labels, preds)[source]

Parameters: labels (list of NDArray) – The labels of the data. preds (list of NDArray) – Predicted values.
mxnet.metric.np(numpy_feval, name=None, allow_extra_outputs=False)[source]

Creates a custom evaluation metric that receives its inputs as numpy arrays.

Parameters: numpy_feval (callable(label, pred)) – Custom evaluation function that receives labels and predictions for a minibatch as numpy arrays and returns the corresponding custom metric as a floating point number. name (str, optional) – Name of the custom metric. allow_extra_outputs (bool, optional) – Whether prediction output is allowed to have extra outputs. This is useful in cases like RNN where states are also part of output which can then be fed back to the RNN in the next step. By default, extra outputs are not allowed. Custom metric corresponding to the provided labels and predictions. float

Example

>>> def custom_metric(label, pred):
...     return np.mean(np.abs(label-pred))
...
>>> metric = mx.metric.np(custom_metric)


## Optimizer API Reference¶

Weight updating functions.

class mxnet.optimizer.Optimizer(rescale_grad=1.0, param_idx2name=None, wd=0.0, clip_gradient=None, learning_rate=0.01, lr_scheduler=None, sym=None, begin_num_update=0, multi_precision=False, param_dict=None)[source]

The base class inherited by all optimizers.

Parameters: rescale_grad (float, optional) – Multiply the gradient with rescale_grad before updating. Often choose to be 1.0/batch_size. param_idx2name (dict from int to string, optional) – A dictionary that maps int index to string name. clip_gradient (float, optional) – Clip the gradient by projecting onto the box [-clip_gradient, clip_gradient]. learning_rate (float) – The initial learning rate. lr_scheduler (LRScheduler, optional) – The learning rate scheduler. wd (float, optional) – The weight decay (or L2 regularization) coefficient. Modifies objective by adding a penalty for having large weights. sym (Symbol, optional) – The Symbol this optimizer is applying to. begin_num_update (int, optional) – The initial number of updates. multi_precision (bool, optional) – Flag to control the internal precision of the optimizer. False results in using the same precision as the weights (default), True makes internal 32-bit copy of the weights and applies gradients in 32-bit precision even if actual weights used in the model have lower precision. Turning this on can improve convergence and accuracy when training with float16. Properties – ---------- – learning_rate – The current learning rate of the optimizer. Given an Optimizer object optimizer, its learning rate can be accessed as optimizer.learning_rate.
static register(klass)[source]

Registers a new optimizer.

Once an optimizer is registered, we can create an instance of this optimizer with create_optimizer later.

Examples

>>> @mx.optimizer.Optimizer.register
... class MyOptimizer(mx.optimizer.Optimizer):
...     pass
>>> optim = mx.optimizer.Optimizer.create_optimizer('MyOptimizer')
>>> print(type(optim))


static create_optimizer(name, **kwargs)[source]

Instantiates an optimizer with a given name and kwargs.

Note

We can use the alias create for Optimizer.create_optimizer.

Parameters: name (str) – Name of the optimizer. Should be the name of a subclass of Optimizer. Case insensitive. kwargs (dict) – Parameters for the optimizer. An instantiated optimizer. Optimizer

Examples

>>> sgd = mx.optimizer.Optimizer.create_optimizer('sgd')
>>> type(sgd)


create_state(index, weight)[source]

Creates auxiliary state for a given weight.

Some optimizers require additional states, e.g. as momentum, in addition to gradients in order to update weights. This function creates state for a given weight which will be used in update. This function is called only once for each weight.

Parameters: index (int) – An unique index to identify the weight. weight (NDArray) – The weight. state – The state associated with the weight. any obj
create_state_multi_precision(index, weight)[source]

Creates auxiliary state for a given weight, including FP32 high precision copy if original weight is FP16.

This method is provided to perform automatic mixed precision training for optimizers that do not support it themselves.

Parameters: index (int) – An unique index to identify the weight. weight (NDArray) – The weight. state – The state associated with the weight. any obj
update(index, weight, grad, state)[source]

Parameters: index (int) – The unique index of the parameter into the individual learning rates and weight decays. Learning rates and weight decay may be set via set_lr_mult() and set_wd_mult(), respectively. weight (NDArray) – The parameter to be updated. grad (NDArray) – The gradient of the objective with respect to this parameter. state (any obj) – The state returned by create_state().
update_multi_precision(index, weight, grad, state)[source]

Updates the given parameter using the corresponding gradient and state. Mixed precision version.

Parameters: index (int) – The unique index of the parameter into the individual learning rates and weight decays. Learning rates and weight decay may be set via set_lr_mult() and set_wd_mult(), respectively. weight (NDArray) – The parameter to be updated. grad (NDArray) – The gradient of the objective with respect to this parameter. state (any obj) – The state returned by create_state().
set_learning_rate(lr)[source]

Sets a new learning rate of the optimizer.

Parameters: lr (float) – The new learning rate of the optimizer.
set_lr_scale(args_lrscale)[source]

[DEPRECATED] Sets lr scale. Use set_lr_mult instead.

set_lr_mult(args_lr_mult)[source]

Sets an individual learning rate multiplier for each parameter.

If you specify a learning rate multiplier for a parameter, then the learning rate for the parameter will be set as the product of the global learning rate self.lr and its multiplier.

Note

The default learning rate multiplier of a Variable can be set with lr_mult argument in the constructor.

Parameters: args_lr_mult (dict of str/int to float) – For each of its key-value entries, the learning rate multipler for the parameter specified in the key will be set as the given value. You can specify the parameter with either its name or its index. If you use the name, you should pass sym in the constructor, and the name you specified in the key of args_lr_mult should match the name of the parameter in sym. If you use the index, it should correspond to the index of the parameter used in the update method. Specifying a parameter by its index is only supported for backward compatibility, and we recommend to use the name instead.
set_wd_mult(args_wd_mult)[source]

Sets an individual weight decay multiplier for each parameter.

By default, if param_idx2name was provided in the constructor, the weight decay multipler is set as 0 for all parameters whose name don’t end with _weight or _gamma.

Note

The default weight decay multiplier for a Variable can be set with its wd_mult argument in the constructor.

Parameters: args_wd_mult (dict of string/int to float) – For each of its key-value entries, the weight decay multipler for the parameter specified in the key will be set as the given value. You can specify the parameter with either its name or its index. If you use the name, you should pass sym in the constructor, and the name you specified in the key of args_lr_mult should match the name of the parameter in sym. If you use the index, it should correspond to the index of the parameter used in the update method. Specifying a parameter by its index is only supported for backward compatibility, and we recommend to use the name instead.
mxnet.optimizer.register(klass)

Registers a new optimizer.

Once an optimizer is registered, we can create an instance of this optimizer with create_optimizer later.

Examples

>>> @mx.optimizer.Optimizer.register
... class MyOptimizer(mx.optimizer.Optimizer):
...     pass
>>> optim = mx.optimizer.Optimizer.create_optimizer('MyOptimizer')
>>> print(type(optim))


class mxnet.optimizer.SGD(momentum=0.0, lazy_update=True, **kwargs)[source]

The SGD optimizer with momentum and weight decay.

If the storage types of weight and grad are both row_sparse, and lazy_update is True, lazy updates are applied by:

for row in grad.indices:
state[row] = momentum[row] * state[row] + rescaled_grad[row]
weight[row] = weight[row] - state[row]


The sparse update only updates the momentum for the weights whose row_sparse gradient indices appear in the current batch, rather than updating it for all indices. Compared with the original update, it can provide large improvements in model training throughput for some applications. However, it provides slightly different semantics than the original update, and may lead to different empirical results.

Otherwise, standard updates are applied by:

rescaled_grad = lr * rescale_grad * clip(grad, clip_gradient) + wd * weight
state = momentum * state + rescaled_grad
weight = weight - state


For details of the update algorithm see sgd_update and sgd_mom_update.

This optimizer accepts the following parameters in addition to those accepted by Optimizer.

Parameters: momentum (float, optional) – The momentum value. lazy_update (bool, optional) – Default is True. If True, lazy updates are applied if the storage types of weight and grad are both row_sparse. multi_precision (bool, optional) – Flag to control the internal precision of the optimizer. False results in using the same precision as the weights (default), True makes internal 32-bit copy of the weights and applies gradients in 32-bit precision even if actual weights used in the model have lower precision. Turning this on can improve convergence and accuracy when training with float16.
class mxnet.optimizer.Signum(learning_rate=0.01, momentum=0.9, wd_lh=0.0, **kwargs)[source]

The Signum optimizer that takes the sign of gradient or momentum.

The optimizer updates the weight by:

rescaled_grad = rescale_grad * clip(grad, clip_gradient) + wd * weight state = momentum * state + (1-momentum)*rescaled_grad weight = (1 - lr * wd_lh) * weight - lr * sign(state)

See the original paper at: https://jeremybernste.in/projects/amazon/signum.pdf

For details of the update algorithm see signsgd_update and signum_update.

This optimizer accepts the following parameters in addition to those accepted by Optimizer.

Parameters: momentum (float, optional) – The momentum value. wd_lh (float, optional) – The amount of decoupled weight decay regularization, see details in the original paper at: https://arxiv.org/abs/1711.05101
class mxnet.optimizer.FTML(beta1=0.6, beta2=0.999, epsilon=1e-08, **kwargs)[source]

The FTML optimizer.

This class implements the optimizer described in FTML - Follow the Moving Leader in Deep Learning, available at http://proceedings.mlr.press/v70/zheng17a/zheng17a.pdf.

This optimizer accepts the following parameters in addition to those accepted by Optimizer.

Parameters: beta1 (float, optional) – 0 < beta1 < 1. Generally close to 0.5. beta2 (float, optional) – 0 < beta2 < 1. Generally close to 1. epsilon (float, optional) – Small value to avoid division by 0.
class mxnet.optimizer.DCASGD(momentum=0.0, lamda=0.04, **kwargs)[source]

The DCASGD optimizer.

This class implements the optimizer described in Asynchronous Stochastic Gradient Descent with Delay Compensation for Distributed Deep Learning, available at https://arxiv.org/abs/1609.08326.

This optimizer accepts the following parameters in addition to those accepted by Optimizer.

Parameters: momentum (float, optional) – The momentum value. lamda (float, optional) – Scale DC value.
class mxnet.optimizer.NAG(**kwargs)[source]

Nesterov accelerated SGD.

This optimizer updates each weight by:

state = momentum * state + grad + wd * weight
weight = weight - (lr * (grad + momentum * state))


This optimizer accepts the same arguments as SGD.

class mxnet.optimizer.SGLD(**kwargs)[source]

This class implements the optimizer described in the paper Stochastic Gradient Riemannian Langevin Dynamics on the Probability Simplex, available at https://papers.nips.cc/paper/4883-stochastic-gradient-riemannian-langevin-dynamics-on-the-probability-simplex.pdf.

class mxnet.optimizer.ccSGD(*args, **kwargs)[source]

[DEPRECATED] Same as SGD. Left here for backward compatibility.

class mxnet.optimizer.Adam(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, lazy_update=True, **kwargs)[source]

This class implements the optimizer described in Adam: A Method for Stochastic Optimization, available at http://arxiv.org/abs/1412.6980.

If the storage types of weight and grad are both row_sparse, and lazy_update is True, lazy updates are applied by:

for row in grad.indices:
m[row] = beta1 * m[row] + (1 - beta1) * rescaled_grad[row]
v[row] = beta2 * v[row] + (1 - beta2) * (rescaled_grad[row]**2)
w[row] = w[row] - learning_rate * m[row] / (sqrt(v[row]) + epsilon)


The lazy update only updates the mean and var for the weights whose row_sparse gradient indices appear in the current batch, rather than updating it for all indices. Compared with the original update, it can provide large improvements in model training throughput for some applications. However, it provides slightly different semantics than the original update, and may lead to different empirical results.

Otherwise, standard updates are applied by:

rescaled_grad = clip(grad * rescale_grad + wd * weight, clip_gradient)
m = beta1 * m + (1 - beta1) * rescaled_grad
v = beta2 * v + (1 - beta2) * (rescaled_grad**2)
w = w - learning_rate * m / (sqrt(v) + epsilon)


This optimizer accepts the following parameters in addition to those accepted by Optimizer.

For details of the update algorithm, see adam_update.

Parameters: beta1 (float, optional) – Exponential decay rate for the first moment estimates. beta2 (float, optional) – Exponential decay rate for the second moment estimates. epsilon (float, optional) – Small value to avoid division by 0. lazy_update (bool, optional) – Default is True. If True, lazy updates are applied if the storage types of weight and grad are both row_sparse.
class mxnet.optimizer.AdaGrad(eps=1e-07, **kwargs)[source]

This optimizer accepts the following parameters in addition to those accepted by Optimizer.

Parameters: eps (float, optional) – Small value to avoid division by 0.
class mxnet.optimizer.RMSProp(learning_rate=0.001, gamma1=0.9, gamma2=0.9, epsilon=1e-08, centered=False, clip_weights=None, **kwargs)[source]

The RMSProp optimizer.

Two versions of RMSProp are implemented:

If centered=False, we follow http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf by Tieleman & Hinton, 2012. For details of the update algorithm see rmsprop_update.

If centered=True, we follow http://arxiv.org/pdf/1308.0850v5.pdf (38)-(45) by Alex Graves, 2013. For details of the update algorithm see rmspropalex_update.

This optimizer accepts the following parameters in addition to those accepted by Optimizer.

Parameters: gamma1 (float, optional) – A decay factor of moving average over past squared gradient. gamma2 (float, optional) – A “momentum” factor. Only used if centered=True. epsilon (float, optional) – Small value to avoid division by 0. centered (bool, optional) – Flag to control which version of RMSProp to use. True will use Graves’s version of RMSProp, False will use Tieleman & Hinton’s version of RMSProp. clip_weights (float, optional) – Clips weights into range [-clip_weights, clip_weights].
class mxnet.optimizer.AdaDelta(rho=0.9, epsilon=1e-05, **kwargs)[source]

This optimizer accepts the following parameters in addition to those accepted by Optimizer.

Parameters: rho (float) – Decay rate for both squared gradients and delta. epsilon (float) – Small value to avoid division by 0.
class mxnet.optimizer.Ftrl(lamda1=0.01, learning_rate=0.1, beta=1, **kwargs)[source]

The Ftrl optimizer.

Referenced from Ad Click Prediction: a View from the Trenches, available at http://dl.acm.org/citation.cfm?id=2488200.

eta :
$\eta_{t,i} = \frac{learningrate}{\beta+\sqrt{\sum_{s=1}^tg_{s,i}^2}}$

The optimizer updates the weight by:

rescaled_grad = clip(grad * rescale_grad, clip_gradient)
z += rescaled_grad - (sqrt(n + rescaled_grad**2) - sqrt(n)) * weight / learning_rate
w = (sign(z) * lamda1 - z) / ((beta + sqrt(n)) / learning_rate + wd) * (abs(z) > lamda1)


If the storage types of weight, state and grad are all row_sparse, sparse updates are applied by:

for row in grad.indices:
z[row] += rescaled_grad[row] - (sqrt(n[row] + rescaled_grad[row]**2) - sqrt(n[row])) * weight[row] / learning_rate
w[row] = (sign(z[row]) * lamda1 - z[row]) / ((beta + sqrt(n[row])) / learning_rate + wd) * (abs(z[row]) > lamda1)


The sparse update only updates the z and n for the weights whose row_sparse gradient indices appear in the current batch, rather than updating it for all indices. Compared with the original update, it can provide large improvements in model training throughput for some applications. However, it provides slightly different semantics than the original update, and may lead to different empirical results.

For details of the update algorithm, see ftrl_update.

This optimizer accepts the following parameters in addition to those accepted by Optimizer.

Parameters: lamda1 (float, optional) – L1 regularization coefficient. learning_rate (float, optional) – The initial learning rate. beta (float, optional) – Per-coordinate learning rate correlation parameter.
class mxnet.optimizer.Adamax(learning_rate=0.002, beta1=0.9, beta2=0.999, **kwargs)[source]

It is a variant of Adam based on the infinity norm available at http://arxiv.org/abs/1412.6980 Section 7.

This optimizer accepts the following parameters in addition to those accepted by Optimizer.

Parameters: beta1 (float, optional) – Exponential decay rate for the first moment estimates. beta2 (float, optional) – Exponential decay rate for the second moment estimates.
class mxnet.optimizer.Nadam(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, schedule_decay=0.004, **kwargs)[source]

Much like Adam is essentially RMSprop with momentum, Nadam is Adam RMSprop with Nesterov momentum available at http://cs229.stanford.edu/proj2015/054_report.pdf.

This optimizer accepts the following parameters in addition to those accepted by Optimizer.

Parameters: beta1 (float, optional) – Exponential decay rate for the first moment estimates. beta2 (float, optional) – Exponential decay rate for the second moment estimates. epsilon (float, optional) – Small value to avoid division by 0. schedule_decay (float, optional) – Exponential decay rate for the momentum schedule
class mxnet.optimizer.Test(**kwargs)[source]

The Test optimizer

create_state(index, weight)[source]

Creates a state to duplicate weight.

update(index, weight, grad, state)[source]

mxnet.optimizer.create(name, **kwargs)

Instantiates an optimizer with a given name and kwargs.

Note

We can use the alias create for Optimizer.create_optimizer.

Parameters: name (str) – Name of the optimizer. Should be the name of a subclass of Optimizer. Case insensitive. kwargs (dict) – Parameters for the optimizer. An instantiated optimizer. Optimizer

Examples

>>> sgd = mx.optimizer.Optimizer.create_optimizer('sgd')
>>> type(sgd)


class mxnet.optimizer.Updater(optimizer)[source]

Updater for kvstore.

set_states(states)[source]

Sets updater states.

get_states(dump_optimizer=False)[source]

Gets updater states.

Parameters: dump_optimizer (bool, default False) – Whether to also save the optimizer itself. This would also save optimizer information such as learning rate and weight decay schedules.
mxnet.optimizer.get_updater(optimizer)[source]

Returns a closure of the updater needed for kvstore.

Parameters: optimizer (Optimizer) – The optimizer. updater – The closure of the updater. function

## Model API Reference¶

MXNet model module

mxnet.model.BatchEndParam

alias of BatchEndParams

mxnet.model.save_checkpoint(prefix, epoch, symbol, arg_params, aux_params)[source]

Checkpoint the model data into file.

Parameters: prefix (str) – Prefix of model name. epoch (int) – The epoch number of the model. symbol (Symbol) – The input Symbol. arg_params (dict of str to NDArray) – Model parameter, dict of name to NDArray of net’s weights. aux_params (dict of str to NDArray) – Model parameter, dict of name to NDArray of net’s auxiliary states.

Notes

• prefix-symbol.json will be saved for symbol.
• prefix-epoch.params will be saved for parameters.
mxnet.model.load_checkpoint(prefix, epoch)[source]

Parameters: prefix (str) – Prefix of model name. epoch (int) – Epoch number of model we would like to load. symbol (Symbol) – The symbol configuration of computation network. arg_params (dict of str to NDArray) – Model parameter, dict of name to NDArray of net’s weights. aux_params (dict of str to NDArray) – Model parameter, dict of name to NDArray of net’s auxiliary states.

Notes

• Symbol will be loaded from prefix-symbol.json.
• Parameters will be loaded from prefix-epoch.params.
class mxnet.model.FeedForward(symbol, ctx=None, num_epoch=None, epoch_size=None, optimizer='sgd', initializer=, numpy_batch_size=128, arg_params=None, aux_params=None, allow_extra_params=False, begin_epoch=0, **kwargs)[source]

Model class of MXNet for training and predicting feedforward nets. This class is designed for a single-data single output supervised network.

Parameters: symbol (Symbol) – The symbol configuration of computation network. ctx (Context or list of Context, optional) – The device context of training and prediction. To use multi GPU training, pass in a list of gpu contexts. num_epoch (int, optional) – Training parameter, number of training epochs(epochs). epoch_size (int, optional) – Number of batches in a epoch. In default, it is set to ceil(num_train_examples / batch_size). optimizer (str or Optimizer, optional) – Training parameter, name or optimizer object for training. initializer (initializer function, optional) – Training parameter, the initialization scheme used. numpy_batch_size (int, optional) – The batch size of training data. Only needed when input array is numpy. arg_params (dict of str to NDArray, optional) – Model parameter, dict of name to NDArray of net’s weights. aux_params (dict of str to NDArray, optional) – Model parameter, dict of name to NDArray of net’s auxiliary states. allow_extra_params (boolean, optional) – Whether allow extra parameters that are not needed by symbol to be passed by aux_params and arg_params. If this is True, no error will be thrown when aux_params and arg_params contain more parameters than needed. begin_epoch (int, optional) – The begining training epoch. kwargs (dict) – The additional keyword arguments passed to optimizer.
predict(X, num_batch=None, return_data=False, reset=True)[source]

Run the prediction, always only use one device.

Parameters: X (mxnet.DataIter) – num_batch (int or None) – The number of batch to run. Go though all batches if None. y – The predicted value of the output. numpy.ndarray or a list of numpy.ndarray if the network has multiple outputs.
score(X, eval_metric='acc', num_batch=None, batch_end_callback=None, reset=True)[source]

Run the model given an input and calculate the score as assessed by an evaluation metric.

Parameters: X (mxnet.DataIter) – eval_metric (metric.metric) – The metric for calculating score. num_batch (int or None) – The number of batches to run. Go though all batches if None. s – The final score. float
fit(X, y=None, eval_data=None, eval_metric='acc', epoch_end_callback=None, batch_end_callback=None, kvstore='local', logger=None, work_load_list=None, monitor=None, eval_end_callback=, eval_batch_end_callback=None)[source]

Fit the model.

Parameters: X (DataIter, or numpy.ndarray/NDArray) – Training data. If X is a DataIter, the name or (if name not available) the position of its outputs should match the corresponding variable names defined in the symbolic graph. y (numpy.ndarray/NDArray, optional) – Training set label. If X is numpy.ndarray or NDArray, y is required to be set. While y can be 1D or 2D (with 2nd dimension as 1), its first dimension must be the same as X, i.e. the number of data points and labels should be equal. eval_data (DataIter or numpy.ndarray/list/NDArray pair) – If eval_data is numpy.ndarray/list/NDArray pair, it should be (valid_data, valid_label). eval_metric (metric.EvalMetric or str or callable) – The evaluation metric. This could be the name of evaluation metric or a custom evaluation function that returns statistics based on a minibatch. epoch_end_callback (callable(epoch, symbol, arg_params, aux_states)) – A callback that is invoked at end of each epoch. This can be used to checkpoint model each epoch. batch_end_callback (callable(epoch)) – A callback that is invoked at end of each batch for purposes of printing. kvstore (KVStore or str, optional) – The KVStore or a string kvstore type: ‘local’, ‘dist_sync’, ‘dist_async’ In default uses ‘local’, often no need to change for single machiine. logger (logging logger, optional) – When not specified, default logger will be used. work_load_list (float or int, optional) – The list of work load for different devices, in the same order as ctx.

Note

KVStore behavior - ‘local’, multi-devices on a single machine, will automatically choose best type. - ‘dist_sync’, multiple machines communicating via BSP. - ‘dist_async’, multiple machines with asynchronous communication.

save(prefix, epoch=None)[source]

Checkpoint the model checkpoint into file. You can also use pickle to do the job if you only work on Python. The advantage of load and save (as compared to pickle) is that the resulting file can be loaded from other MXNet language bindings. One can also directly load/save from/to cloud storage(S3, HDFS)

Parameters: prefix (str) – Prefix of model name.

Notes

• prefix-symbol.json will be saved for symbol.
• prefix-epoch.params will be saved for parameters.
static load(prefix, epoch, ctx=None, **kwargs)[source]

Parameters: prefix (str) – Prefix of model name. epoch (int) – epoch number of model we would like to load. ctx (Context or list of Context, optional) – The device context of training and prediction. kwargs (dict) – Other parameters for model, including num_epoch, optimizer and numpy_batch_size. model – The loaded model that can be used for prediction. FeedForward

Notes

• prefix-symbol.json will be saved for symbol.
• prefix-epoch.params will be saved for parameters.
static create(symbol, X, y=None, ctx=None, num_epoch=None, epoch_size=None, optimizer='sgd', initializer=, eval_data=None, eval_metric='acc', epoch_end_callback=None, batch_end_callback=None, kvstore='local', logger=None, work_load_list=None, eval_end_callback=, eval_batch_end_callback=None, **kwargs)[source]

Functional style to create a model. This function is more consistent with functional languages such as R, where mutation is not allowed.

Parameters: symbol (Symbol) – The symbol configuration of a computation network. X (DataIter) – Training data. y (numpy.ndarray, optional) – If X is a numpy.ndarray, y must be set. ctx (Context or list of Context, optional) – The device context of training and prediction. To use multi-GPU training, pass in a list of GPU contexts. num_epoch (int, optional) – The number of training epochs(epochs). epoch_size (int, optional) – Number of batches in a epoch. In default, it is set to ceil(num_train_examples / batch_size). optimizer (str or Optimizer, optional) – The name of the chosen optimizer, or an optimizer object, used for training. initializer (initializer function, optional) – The initialization scheme used. eval_data (DataIter or numpy.ndarray pair) – If eval_set is numpy.ndarray pair, it should be (valid_data, valid_label). eval_metric (metric.EvalMetric or str or callable) – The evaluation metric. Can be the name of an evaluation metric or a custom evaluation function that returns statistics based on a minibatch. epoch_end_callback (callable(epoch, symbol, arg_params, aux_states)) – A callback that is invoked at end of each epoch. This can be used to checkpoint model each epoch. batch_end_callback (callable(epoch)) – A callback that is invoked at end of each batch for print purposes. kvstore (KVStore or str, optional) – The KVStore or a string kvstore type: ‘local’, ‘dist_sync’, ‘dis_async’. Defaults to ‘local’, often no need to change for single machine. logger (logging logger, optional) – When not specified, default logger will be used. work_load_list (list of float or int, optional) – The list of work load for different devices, in the same order as ctx.