mxnet.module

A module is like a FeedForward model. But we would like to make it easier to compose, similar to Torch modules.

Classes

BaseModule([logger])

The base class of a module.

BucketingModule(sym_gen[, …])

This module helps to deal efficiently with varying-length inputs.

Module(symbol[, data_names, label_names, …])

Module is a basic module that wrap a Symbol.

PythonLossModule([name, data_names, …])

A convenient module class that implements many of the module APIs as empty functions.

PythonModule(data_names, label_names, …[, …])

A convenient module class that implements many of the module APIs as empty functions.

SequentialModule([logger])

A SequentialModule is a container module that can chain multiple modules together.

class mxnet.module.BaseModule(logger=<module 'logging' from '/work/conda_env/lib/python3.8/logging/__init__.py'>)[source]

Bases: object

The base class of a module.

A module represents a computation component. One can think of module as a computation machine. A module can execute forward and backward passes and update parameters in a model. We aim to make the APIs easy to use, especially in the case when we need to use the imperative API to work with multiple modules (e.g. stochastic depth network).

A module has several states:

  • Initial state: Memory is not allocated yet, so the module is not ready for computation yet.

  • Binded: Shapes for inputs, outputs, and parameters are all known, memory has been allocated, and the module is ready for computation.

  • Parameters are initialized: For modules with parameters, doing computation before initializing the parameters might result in undefined outputs.

  • Optimizer is installed: An optimizer can be installed to a module. After this, the parameters of the module can be updated according to the optimizer after gradients are computed (forward-backward).

Methods

backward([out_grads])

Backward computation.

bind(data_shapes[, label_shapes, …])

Binds the symbols to construct executors.

fit(train_data[, eval_data, eval_metric, …])

Trains the module parameters.

forward(data_batch[, is_train])

Forward computation.

forward_backward(data_batch)

A convenient function that calls both forward and backward.

get_input_grads([merge_multi_context])

Gets the gradients to the inputs, computed in the previous backward computation.

get_outputs([merge_multi_context])

Gets outputs of the previous forward computation.

get_params()

Gets parameters, those are potentially copies of the actual parameters used to do computation on the device.

get_states([merge_multi_context])

Gets states from all devices

init_optimizer([kvstore, optimizer, …])

Installs and initializes optimizers, as well as initialize kvstore for

init_params([initializer, arg_params, …])

Initializes the parameters and auxiliary states.

install_monitor(mon)

Installs monitor on all executors.

iter_predict(eval_data[, num_batch, reset, …])

Iterates over predictions.

load_params(fname)

Loads model parameters from file.

predict(eval_data[, num_batch, …])

Runs prediction and collects the outputs.

prepare(data_batch[, sparse_row_id_fn])

Prepares the module for processing a data batch.

save_params(fname)

Saves model parameters to file.

score(eval_data, eval_metric[, num_batch, …])

Runs prediction on eval_data and evaluates the performance according to the given eval_metric.

set_params(arg_params, aux_params[, …])

Assigns parameter and aux state values.

set_states([states, value])

Sets value for states.

update()

Updates parameters according to the installed optimizer and the gradients computed in the previous forward-backward batch.

update_metric(eval_metric, labels[, pre_sliced])

Evaluates and accumulates evaluation metric on outputs of the last forward computation.

Attributes

data_names

A list of names for data required by this module.

data_shapes

A list of (name, shape) pairs specifying the data inputs to this module.

label_shapes

A list of (name, shape) pairs specifying the label inputs to this module.

output_names

A list of names for the outputs of this module.

output_shapes

A list of (name, shape) pairs specifying the outputs of this module.

symbol

Gets the symbol associated with this module.

In order for a module to interact with others, it must be able to report the following information in its initial state (before binding):

  • data_names: list of type string indicating the names of the required input data.

  • output_names: list of type string indicating the names of the required outputs.

After binding, a module should be able to report the following richer information:

  • state information
    • binded: bool, indicates whether the memory buffers needed for computation have been allocated.

    • for_training: whether the module is bound for training.

    • params_initialized: bool, indicates whether the parameters of this module have been initialized.

    • optimizer_initialized: bool, indicates whether an optimizer is defined and initialized.

    • inputs_need_grad: bool, indicates whether gradients with respect to the input data are needed. Might be useful when implementing composition of modules.

  • input/output information
    • data_shapes: a list of (name, shape). In theory, since the memory is allocated, we could directly provide the data arrays. But in the case of data parallelism, the data arrays might not be of the same shape as viewed from the external world.

    • label_shapes: a list of (name, shape). This might be [] if the module does not need labels (e.g. it does not contains a loss function at the top), or a module is not bound for training.

    • output_shapes: a list of (name, shape) for outputs of the module.

  • parameters (for modules with parameters)
    • get_params(): return a tuple (arg_params, aux_params). Each of those is a dictionary of name to NDArray mapping. Those NDArray always lives on CPU. The actual parameters used for computing might live on other devices (GPUs), this function will retrieve (a copy of) the latest parameters.

    • set_params(arg_params, aux_params): assign parameters to the devices doing the computation.

    • init_params(...): a more flexible interface to assign or initialize the parameters.

  • setup
    • bind(): prepare environment for computation.

    • init_optimizer(): install optimizer for parameter updating.

    • prepare(): prepare the module based on the current data batch.

  • computation
    • forward(data_batch): forward operation.

    • backward(out_grads=None): backward operation.

    • update(): update parameters according to installed optimizer.

    • get_outputs(): get outputs of the previous forward operation.

    • get_input_grads(): get the gradients with respect to the inputs computed in the previous backward operation.

    • update_metric(metric, labels, pre_sliced=False): update performance metric for the previous forward computed results.

  • other properties (mostly for backward compatibility)
    • symbol: the underlying symbolic graph for this module (if any) This property is not necessarily constant. For example, for BucketingModule, this property is simply the current symbol being used. For other modules, this value might not be well defined.

When those intermediate-level API are implemented properly, the following high-level API will be automatically available for a module:

  • fit: train the module parameters on a data set.

  • predict: run prediction on a data set and collect outputs.

  • score: run prediction on a data set and evaluate performance.

Examples

>>> # An example of creating a mxnet module.
>>> import mxnet as mx
>>> data = mx.symbol.Variable('data')
>>> fc1  = mx.symbol.FullyConnected(data, name='fc1', num_hidden=128)
>>> act1 = mx.symbol.Activation(fc1, name='relu1', act_type="relu")
>>> fc2  = mx.symbol.FullyConnected(act1, name = 'fc2', num_hidden = 64)
>>> act2 = mx.symbol.Activation(fc2, name='relu2', act_type="relu")
>>> fc3  = mx.symbol.FullyConnected(act2, name='fc3', num_hidden=10)
>>> out  = mx.symbol.SoftmaxOutput(fc3, name = 'softmax')
>>> mod = mx.mod.Module(out)
backward(out_grads=None)[source]

Backward computation.

Parameters

out_grads (NDArray or list of NDArray, optional) – Gradient on the outputs to be propagated back. This parameter is only needed when bind is called on outputs that are not a loss function.

Examples

>>> # An example of backward computation.
>>> mod.backward()
>>> print mod.get_input_grads()[0].asnumpy()
[[[  1.10182791e-05   5.12257748e-06   4.01927764e-06   8.32566820e-06
    -1.59775993e-06   7.24269375e-06   7.28067835e-06  -1.65902311e-05
     5.46342608e-06   8.44196393e-07]
     ...]]
bind(data_shapes, label_shapes=None, for_training=True, inputs_need_grad=False, force_rebind=False, shared_module=None, grad_req='write')[source]

Binds the symbols to construct executors. This is necessary before one can perform computation with the module.

Parameters
  • data_shapes (list of (str, tuple) or DataDesc objects) – Typically is data_iter.provide_data. Can also be a list of (data name, data shape).

  • label_shapes (list of (str, tuple) or DataDesc objects) – Typically is data_iter.provide_label. Can also be a list of (label name, label shape).

  • for_training (bool) – Default is True. Whether the executors should be bind for training.

  • inputs_need_grad (bool) – Default is False. Whether the gradients to the input data need to be computed. Typically this is not needed. But this might be needed when implementing composition of modules.

  • force_rebind (bool) – Default is False. This function does nothing if the executors are already bound. But with this True, the executors will be forced to rebind.

  • shared_module (Module) – Default is None. This is used in bucketing. When not None, the shared module essentially corresponds to a different bucket – a module with different symbol but with the same sets of parameters (e.g. unrolled RNNs with different lengths).

  • grad_req (str, list of str, dict of str to str) – Requirement for gradient accumulation. Can be ‘write’, ‘add’, or ‘null’ (default to ‘write’). Can be specified globally (str) or for each argument (list, dict).

Examples

>>> # An example of binding symbols.
>>> mod.bind(data_shapes=[('data', (1, 10, 10))])
>>> # Assume train_iter is already created.
>>> mod.bind(data_shapes=train_iter.provide_data, label_shapes=train_iter.provide_label)
property data_names

A list of names for data required by this module.

property data_shapes

A list of (name, shape) pairs specifying the data inputs to this module.

fit(train_data, eval_data=None, eval_metric='acc', epoch_end_callback=None, batch_end_callback=None, kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), eval_end_callback=None, eval_batch_end_callback=None, initializer=<mxnet.initializer.Uniform object>, arg_params=None, aux_params=None, allow_missing=False, force_rebind=False, force_init=False, begin_epoch=0, num_epoch=None, validation_metric=None, monitor=None, sparse_row_id_fn=None)[source]

Trains the module parameters.

Checkout Module Tutorial to see an end-to-end use-case.

Parameters
  • train_data (DataIter) – Train DataIter.

  • eval_data (DataIter) – If not None, will be used as validation set and the performance after each epoch will be evaluated.

  • eval_metric (str or EvalMetric) – Defaults to ‘accuracy’. The performance measure used to display during training. Other possible predefined metrics are: ‘ce’ (CrossEntropy), ‘f1’, ‘mae’, ‘mse’, ‘rmse’, ‘top_k_accuracy’.

  • epoch_end_callback (function or list of functions) – Each callback will be called with the current epoch, symbol, arg_params and aux_params.

  • batch_end_callback (function or list of function) – Each callback will be called with a BatchEndParam.

  • kvstore (str or KVStore) – Defaults to ‘local’.

  • optimizer (str or Optimizer) – Defaults to ‘sgd’.

  • optimizer_params (dict) – Defaults to (('learning_rate', 0.01),). The parameters for the optimizer constructor. The default value is not a dict, just to avoid pylint warning on dangerous default values.

  • eval_end_callback (function or list of function) – These will be called at the end of each full evaluation, with the metrics over the entire evaluation set.

  • eval_batch_end_callback (function or list of function) – These will be called at the end of each mini-batch during evaluation.

  • initializer (Initializer) – The initializer is called to initialize the module parameters when they are not already initialized.

  • arg_params (dict) – Defaults to None, if not None, should be existing parameters from a trained model or loaded from a checkpoint (previously saved model). In this case, the value here will be used to initialize the module parameters, unless they are already initialized by the user via a call to init_params or fit. arg_params has a higher priority than initializer.

  • aux_params (dict) – Defaults to None. Similar to arg_params, except for auxiliary states.

  • allow_missing (bool) – Defaults to False. Indicates whether to allow missing parameters when arg_params and aux_params are not None. If this is True, then the missing parameters will be initialized via the initializer.

  • force_rebind (bool) – Defaults to False. Whether to force rebinding the executors if already bound.

  • force_init (bool) – Defaults to False. Indicates whether to force initialization even if the parameters are already initialized.

  • begin_epoch (int) – Defaults to 0. Indicates the starting epoch. Usually, if resumed from a checkpoint saved at a previous training phase at epoch N, then this value should be N+1.

  • num_epoch (int) – Number of epochs for training.

  • sparse_row_id_fn (A callback function) – The function takes data_batch as an input and returns a dict of str -> NDArray. The resulting dict is used for pulling row_sparse parameters from the kvstore, where the str key is the name of the param, and the value is the row id of the param to pull.

Examples

>>> # An example of using fit for training.
>>> # Assume training dataIter and validation dataIter are ready
>>> # Assume loading a previously checkpointed model
>>> sym, arg_params, aux_params = mx.model.load_checkpoint(model_prefix, 3)
>>> mod.fit(train_data=train_dataiter, eval_data=val_dataiter, optimizer='sgd',
...     optimizer_params={'learning_rate':0.01, 'momentum': 0.9},
...     arg_params=arg_params, aux_params=aux_params,
...     eval_metric='acc', num_epoch=10, begin_epoch=3)
forward(data_batch, is_train=None)[source]

Forward computation. It supports data batches with different shapes, such as different batch sizes or different image sizes. If reshaping of data batch relates to modification of symbol or module, such as changing image layout ordering or switching from training to predicting, module rebinding is required.

Parameters
  • data_batch (DataBatch) – Could be anything with similar API implemented.

  • is_train (bool) – Default is None, which means is_train takes the value of self.for_training.

Examples

>>> import mxnet as mx
>>> from collections import namedtuple
>>> Batch = namedtuple('Batch', ['data'])
>>> data = mx.sym.Variable('data')
>>> out = data * 2
>>> mod = mx.mod.Module(symbol=out, label_names=None)
>>> mod.bind(data_shapes=[('data', (1, 10))])
>>> mod.init_params()
>>> data1 = [mx.nd.ones((1, 10))]
>>> mod.forward(Batch(data1))
>>> print mod.get_outputs()[0].asnumpy()
[[ 2.  2.  2.  2.  2.  2.  2.  2.  2.  2.]]
>>> # Forward with data batch of different shape
>>> data2 = [mx.nd.ones((3, 5))]
>>> mod.forward(Batch(data2))
>>> print mod.get_outputs()[0].asnumpy()
[[ 2.  2.  2.  2.  2.]
 [ 2.  2.  2.  2.  2.]
 [ 2.  2.  2.  2.  2.]]
forward_backward(data_batch)[source]

A convenient function that calls both forward and backward.

get_input_grads(merge_multi_context=True)[source]

Gets the gradients to the inputs, computed in the previous backward computation.

If merge_multi_context is True, it is like [grad1, grad2]. Otherwise, it is like [[grad1_dev1, grad1_dev2], [grad2_dev1, grad2_dev2]]. All the output elements have type NDArray. When merge_multi_context is False, those NDArray instances might live on different devices.

Parameters

merge_multi_context (bool) – Defaults to True. In the case when data-parallelism is used, the gradients will be collected from multiple devices. A True value indicates that we should merge the collected results so that they look like from a single executor.

Returns

Input gradients.

Return type

list of NDArray or list of list of NDArray

Examples

>>> # An example of getting input gradients.
>>> print mod.get_input_grads()[0].asnumpy()
[[[  1.10182791e-05   5.12257748e-06   4.01927764e-06   8.32566820e-06
    -1.59775993e-06   7.24269375e-06   7.28067835e-06  -1.65902311e-05
    5.46342608e-06   8.44196393e-07]
    ...]]
get_outputs(merge_multi_context=True)[source]

Gets outputs of the previous forward computation.

If merge_multi_context is True, it is like [out1, out2]. Otherwise, it returns out put of form [[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]. All the output elements have type NDArray. When merge_multi_context is False, those NDArray instances might live on different devices.

Parameters

merge_multi_context (bool) – Defaults to True. In the case when data-parallelism is used, the outputs will be collected from multiple devices. A True value indicates that we should merge the collected results so that they look like from a single executor.

Returns

Output

Return type

list of NDArray or list of list of NDArray.

Examples

>>> # An example of getting forward output.
>>> print mod.get_outputs()[0].asnumpy()
[[ 0.09999977  0.10000153  0.10000716  0.10000195  0.09999853  0.09999743
   0.10000272  0.10000113  0.09999088  0.09999888]]
get_params()[source]

Gets parameters, those are potentially copies of the actual parameters used to do computation on the device.

Returns

A pair of dictionaries each mapping parameter names to NDArray values.

Return type

(arg_params, aux_params)

Examples

>>> # An example of getting module parameters.
>>> print mod.get_params()
({'fc2_weight': <NDArray 64x128 @cpu(0)>, 'fc1_weight': <NDArray 128x100 @cpu(0)>,
'fc3_bias': <NDArray 10 @cpu(0)>, 'fc3_weight': <NDArray 10x64 @cpu(0)>,
'fc2_bias': <NDArray 64 @cpu(0)>, 'fc1_bias': <NDArray 128 @cpu(0)>}, {})
get_states(merge_multi_context=True)[source]

Gets states from all devices

If merge_multi_context is True, returns output of form [out1, out2]. Otherwise, it returns output of the form [[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]. All output elements are NDArray.

Parameters

merge_multi_context (bool) – Defaults to True. In the case when data-parallelism is used, the states will be collected from multiple devices. A True value indicates that we should merge the collected results so that they look like from a single executor.

Returns

Return type

A list of NDArray or a list of list of NDArray.

init_optimizer(kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), force_init=False)[source]
Installs and initializes optimizers, as well as initialize kvstore for

distributed training

Parameters
  • kvstore (str or KVStore) – Defaults to ‘local’.

  • optimizer (str or Optimizer) – Defaults to ‘sgd’.

  • optimizer_params (dict) – Defaults to (('learning_rate', 0.01),). The default value is not a dictionary, just to avoid pylint warning of dangerous default values.

  • force_init (bool) – Defaults to False, indicates whether to force re-initializing an optimizer if it is already installed.

Examples

>>> # An example of initializing optimizer.
>>> mod.init_optimizer(optimizer='sgd', optimizer_params=(('learning_rate', 0.005),))
init_params(initializer=<mxnet.initializer.Uniform object>, arg_params=None, aux_params=None, allow_missing=False, force_init=False, allow_extra=False)[source]

Initializes the parameters and auxiliary states.

Parameters
  • initializer (Initializer) – Called to initialize parameters if needed.

  • arg_params (dict) – If not None, should be a dictionary of existing arg_params. Initialization will be copied from that.

  • aux_params (dict) – If not None, should be a dictionary of existing aux_params. Initialization will be copied from that.

  • allow_missing (bool) – If True, params could contain missing values, and the initializer will be called to fill those missing params.

  • force_init (bool) – If True, force_init will force re-initialize even if already initialized.

  • allow_extra (boolean, optional) – Whether allow extra parameters that are not needed by symbol. If this is True, no error will be thrown when arg_params or aux_params contain extra parameters that is not needed by the executor.

Examples

>>> # An example of initializing module parameters.
>>> mod.init_params()
install_monitor(mon)[source]

Installs monitor on all executors.

iter_predict(eval_data, num_batch=None, reset=True, sparse_row_id_fn=None)[source]

Iterates over predictions.

Examples

>>> for pred, i_batch, batch in module.iter_predict(eval_data):
...     # pred is a list of outputs from the module
...     # i_batch is a integer
...     # batch is the data batch from the data iterator
Parameters
  • eval_data (DataIter) – Evaluation data to run prediction on.

  • num_batch (int) – Default is None, indicating running all the batches in the data iterator.

  • reset (bool) – Default is True, indicating whether we should reset the data iter before start doing prediction.

  • sparse_row_id_fn (A callback function) – The function takes data_batch as an input and returns a dict of str -> NDArray. The resulting dict is used for pulling row_sparse parameters from the kvstore, where the str key is the name of the param, and the value is the row id of the param to pull.

property label_shapes

A list of (name, shape) pairs specifying the label inputs to this module. If this module does not accept labels – either it is a module without loss function, or it is not bound for training, then this should return an empty list [].

load_params(fname)[source]

Loads model parameters from file.

Parameters

fname (str) – Path to input param file.

Examples

>>> # An example of loading module parameters.
>>> mod.load_params('myfile')
property output_names

A list of names for the outputs of this module.

property output_shapes

A list of (name, shape) pairs specifying the outputs of this module.

predict(eval_data, num_batch=None, merge_batches=True, reset=True, always_output_list=False, sparse_row_id_fn=None)[source]

Runs prediction and collects the outputs.

When merge_batches is True (by default), the return value will be a list [out1, out2, out3], where each element is formed by concatenating the outputs for all the mini-batches. When always_output_list is False (as by default), then in the case of a single output, out1 is returned instead of [out1].

When merge_batches is False, the return value will be a nested list like [[out1_batch1, out2_batch1], [out1_batch2], ...]. This mode is useful because in some cases (e.g. bucketing), the module does not necessarily produce the same number of outputs.

The objects in the results have type NDArray. If you need to work with a numpy array, just call .asnumpy() on each NDArray.

Parameters
  • eval_data (DataIter or NDArray or numpy array) – Evaluation data to run prediction on.

  • num_batch (int) – Defaults to None, indicates running all the batches in the data iterator.

  • merge_batches (bool) – Defaults to True, see above for return values.

  • reset (bool) – Defaults to True, indicates whether we should reset the data iter before doing prediction.

  • always_output_list (bool) – Defaults to False, see above for return values.

  • sparse_row_id_fn (A callback function) – The function takes data_batch as an input and returns a dict of str -> NDArray. The resulting dict is used for pulling row_sparse parameters from the kvstore, where the str key is the name of the param, and the value is the row id of the param to pull.

Returns

Prediction results.

Return type

list of NDArray or list of list of NDArray

Examples

>>> # An example of using `predict` for prediction.
>>> # Predict on the first 10 batches of val_dataiter
>>> mod.predict(eval_data=val_dataiter, num_batch=10)
prepare(data_batch, sparse_row_id_fn=None)[source]

Prepares the module for processing a data batch.

Usually involves switching bucket and reshaping. For modules that contain row_sparse parameters in KVStore, it prepares the row_sparse parameters based on the sparse_row_id_fn.

When KVStore is used to update parameters for multi-device or multi-machine training, a copy of the parameters are stored in KVStore. Note that for row_sparse parameters, the update() updates the copy of parameters in KVStore, but doesn’t broadcast the updated parameters to all devices / machines. The prepare function is used to broadcast row_sparse parameters with the next batch of data.

Parameters
  • data_batch (DataBatch) – The current batch of data for forward computation.

  • sparse_row_id_fn (A callback function) – The function takes data_batch as an input and returns a dict of str -> NDArray. The resulting dict is used for pulling row_sparse parameters from the kvstore, where the str key is the name of the param, and the value is the row id of the param to pull.

save_params(fname)[source]

Saves model parameters to file.

Parameters

fname (str) – Path to output param file.

Examples

>>> # An example of saving module parameters.
>>> mod.save_params('myfile')
score(eval_data, eval_metric, num_batch=None, batch_end_callback=None, score_end_callback=None, reset=True, epoch=0, sparse_row_id_fn=None)[source]

Runs prediction on eval_data and evaluates the performance according to the given eval_metric.

Checkout Module Tutorial to see an end-to-end use-case.

Parameters
  • eval_data (DataIter) – Evaluation data to run prediction on.

  • eval_metric (EvalMetric or list of EvalMetrics) – Evaluation metric to use.

  • num_batch (int) – Number of batches to run. Defaults to None, indicating run until the DataIter finishes.

  • batch_end_callback (function) – Could also be a list of functions.

  • reset (bool) – Defaults to True. Indicates whether we should reset eval_data before starting evaluating.

  • epoch (int) – Defaults to 0. For compatibility, this will be passed to callbacks (if any). During training, this will correspond to the training epoch number.

  • sparse_row_id_fn (A callback function) – The function takes data_batch as an input and returns a dict of str -> NDArray. The resulting dict is used for pulling row_sparse parameters from the kvstore, where the str key is the name of the param, and the value is the row id of the param to pull.

Examples

>>> # An example of using score for prediction.
>>> # Evaluate accuracy on val_dataiter
>>> metric = mx.metric.Accuracy()
>>> mod.score(val_dataiter, metric)
>>> mod.score(val_dataiter, ['mse', 'acc'])
set_params(arg_params, aux_params, allow_missing=False, force_init=True, allow_extra=False)[source]

Assigns parameter and aux state values.

Parameters
  • arg_params (dict) – Dictionary of name to value (NDArray) mapping.

  • aux_params (dict) – Dictionary of name to value (NDArray) mapping.

  • allow_missing (bool) – If True, params could contain missing values, and the initializer will be called to fill those missing params.

  • force_init (bool) – If True, will force re-initialize even if already initialized.

  • allow_extra (boolean, optional) – Whether allow extra parameters that are not needed by symbol. If this is True, no error will be thrown when arg_params or aux_params contain extra parameters that is not needed by the executor.

Examples

>>> # An example of setting module parameters.
>>> sym, arg_params, aux_params = mx.model.load_checkpoint(model_prefix, n_epoch_load)
>>> mod.set_params(arg_params=arg_params, aux_params=aux_params)
set_states(states=None, value=None)[source]

Sets value for states. Only one of states & value can be specified.

Parameters
  • states (list of list of NDArray) – Source states arrays formatted like [[state1_dev1, state1_dev2], [state2_dev1, state2_dev2]].

  • value (number) – A single scalar value for all state arrays.

property symbol

Gets the symbol associated with this module.

Except for Module, for other types of modules (e.g. BucketingModule), this property might not be a constant throughout its life time. Some modules might not even be associated with any symbols.

update()[source]

Updates parameters according to the installed optimizer and the gradients computed in the previous forward-backward batch.

When KVStore is used to update parameters for multi-device or multi-machine training, a copy of the parameters are stored in KVStore. Note that for row_sparse parameters, this function does update the copy of parameters in KVStore, but doesn’t broadcast the updated parameters to all devices / machines. Please call prepare to broadcast row_sparse parameters with the next batch of data.

Examples

>>> # An example of updating module parameters.
>>> mod.init_optimizer(kvstore='local', optimizer='sgd',
...     optimizer_params=(('learning_rate', 0.01), ))
>>> mod.backward()
>>> mod.update()
>>> print mod.get_params()[0]['fc3_weight'].asnumpy()
[[  5.86930104e-03   5.28078526e-03  -8.88729654e-03  -1.08308345e-03
    6.13054074e-03   4.27560415e-03   1.53817423e-03   4.62131854e-03
    4.69872449e-03  -2.42400169e-03   9.94111411e-04   1.12386420e-03
    ...]]
update_metric(eval_metric, labels, pre_sliced=False)[source]

Evaluates and accumulates evaluation metric on outputs of the last forward computation.

Parameters
  • eval_metric (EvalMetric) – Evaluation metric to use.

  • labels (list of NDArray if pre_sliced parameter is set to False,) – list of lists of NDArray otherwise. Typically data_batch.label.

  • pre_sliced (bool) – Whether the labels are already sliced per device (default: False).

Examples

>>> # An example of updating evaluation metric.
>>> mod.forward(data_batch)
>>> mod.update_metric(metric, data_batch.label)
class mxnet.module.BucketingModule(sym_gen, default_bucket_key=None, logger=<module 'logging' from '/work/conda_env/lib/python3.8/logging/__init__.py'>, context=cpu(0), work_load_list=None, fixed_param_names=None, state_names=None, group2ctxs=None, compression_params=None)[source]

Bases: mxnet.module.base_module.BaseModule

This module helps to deal efficiently with varying-length inputs.

Parameters
  • sym_gen (function) – A function when called with a bucket key, returns a triple (symbol, data_names, label_names).

  • default_bucket_key (str (or any python object)) – The key for the default bucket.

  • logger (Logger) –

  • context (Context or list of Context) – Defaults to mx.cpu()

  • work_load_list (list of number) – Defaults to None, indicating uniform workload.

  • fixed_param_names (list of str) – Defaults to None, indicating no network parameters are fixed.

  • state_names (list of str) – States are similar to data and label, but not provided by data iterator. Instead they are initialized to 0 and can be set by set_states()

  • group2ctxs (dict of str to context or list of context,) – or list of dict of str to context Default is None. Mapping the ctx_group attribute to the context assignment.

  • compression_params (dict) – Specifies type of gradient compression and additional arguments depending on the type of compression being used. For example, 2bit compression requires a threshold. Arguments would then be {‘type’:’2bit’, ‘threshold’:0.5} See mxnet.KVStore.set_gradient_compression method for more details on gradient compression.

Methods

backward([out_grads])

Backward computation.

bind(data_shapes[, label_shapes, …])

Binding for a BucketingModule means setting up the buckets and binding the executor for the default bucket key.

forward(data_batch[, is_train])

Forward computation.

get_input_grads([merge_multi_context])

Gets the gradients with respect to the inputs of the module.

get_outputs([merge_multi_context])

Gets outputs from a previous forward computation.

get_params()

Gets current parameters.

get_states([merge_multi_context])

Gets states from all devices.

init_optimizer([kvstore, optimizer, …])

Installs and initializes optimizers.

init_params([initializer, arg_params, …])

Initializes parameters.

install_monitor(mon)

Installs monitor on all executors

load(prefix, epoch[, sym_gen, …])

Creates a model from previously saved checkpoint.

load_dict([sym_dict, sym_gen, …])

Creates a model from a dict mapping bucket_key to symbols and shared arg_params and aux_params.

prepare(data_batch[, sparse_row_id_fn])

Prepares the module for processing a data batch.

save_checkpoint(prefix, epoch[, remove_amp_cast])

Saves current progress to checkpoint for all buckets in BucketingModule Use mx.callback.module_checkpoint as epoch_end_callback to save during training.

set_params(arg_params, aux_params[, …])

Assigns parameters and aux state values.

set_states([states, value])

Sets value for states.

switch_bucket(bucket_key, data_shapes[, …])

Switches to a different bucket.

update()

Updates parameters according to installed optimizer and the gradient computed in the previous forward-backward cycle.

update_metric(eval_metric, labels[, pre_sliced])

Evaluates and accumulates evaluation metric on outputs of the last forward computation.

Attributes

data_names

A list of names for data required by this module.

data_shapes

Get data shapes.

label_shapes

Get label shapes.

output_names

A list of names for the outputs of this module.

output_shapes

Gets output shapes.

symbol

The symbol of the current bucket being used.

backward(out_grads=None)[source]

Backward computation.

bind(data_shapes, label_shapes=None, for_training=True, inputs_need_grad=False, force_rebind=False, shared_module=None, grad_req='write')[source]

Binding for a BucketingModule means setting up the buckets and binding the executor for the default bucket key. Executors corresponding to other keys are bound afterwards with switch_bucket.

Parameters
  • data_shapes (list of (str, tuple)) – This should correspond to the symbol for the default bucket.

  • label_shapes (list of (str, tuple)) – This should correspond to the symbol for the default bucket.

  • for_training (bool) – Default is True.

  • inputs_need_grad (bool) – Default is False.

  • force_rebind (bool) – Default is False.

  • shared_module (BucketingModule) – Default is None. This value is currently not used.

  • grad_req (str, list of str, dict of str to str) – Requirement for gradient accumulation. Can be ‘write’, ‘add’, or ‘null’ (default to ‘write’). Can be specified globally (str) or for each argument (list, dict).

  • bucket_key (str (or any python object)) – bucket key for binding. by default use the default_bucket_key

property data_names

A list of names for data required by this module.

property data_shapes

Get data shapes.

Returns

Return type

A list of (name, shape) pairs.

forward(data_batch, is_train=None)[source]

Forward computation.

Parameters
  • data_batch (DataBatch) –

  • is_train (bool) – Defaults to None, in which case is_train is take as self.for_training.

get_input_grads(merge_multi_context=True)[source]

Gets the gradients with respect to the inputs of the module.

Parameters

merge_multi_context (bool) – Defaults to True. In the case when data-parallelism is used, the outputs will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.

Returns

If merge_multi_context is True, it is like [grad1, grad2]. Otherwise, it is like [[grad1_dev1, grad1_dev2], [grad2_dev1, grad2_dev2]]. All the output elements are NDArray.

Return type

list of NDArrays or list of list of NDArrays

get_outputs(merge_multi_context=True)[source]

Gets outputs from a previous forward computation.

Parameters

merge_multi_context (bool) – Defaults to True. In the case when data-parallelism is used, the outputs will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.

Returns

If merge_multi_context is True, it is like [out1, out2]. Otherwise, it is like [[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]. All the output elements are numpy arrays.

Return type

list of numpy arrays or list of list of numpy arrays

get_params()[source]

Gets current parameters.

Returns

A pair of dictionaries each mapping parameter names to NDArray values.

Return type

(arg_params, aux_params)

get_states(merge_multi_context=True)[source]

Gets states from all devices.

Parameters

merge_multi_context (bool) – Default is True. In the case when data-parallelism is used, the states will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.

Returns

If merge_multi_context is True, it is like [out1, out2]. Otherwise, it is like [[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]. All the output elements are NDArray.

Return type

list of NDArrays or list of list of NDArrays

init_optimizer(kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), force_init=False)[source]

Installs and initializes optimizers.

Parameters
  • kvstore (str or KVStore) – Defaults to ‘local’.

  • optimizer (str or Optimizer) – Defaults to ‘sgd’

  • optimizer_params (dict) – Defaults to ((‘learning_rate’, 0.01),). The default value is not a dictionary, just to avoid pylint warning of dangerous default values.

  • force_init (bool) – Defaults to False, indicating whether we should force re-initializing the optimizer in the case an optimizer is already installed.

init_params(initializer=<mxnet.initializer.Uniform object>, arg_params=None, aux_params=None, allow_missing=False, force_init=False, allow_extra=False)[source]

Initializes parameters.

Parameters
  • initializer (Initializer) –

  • arg_params (dict) – Defaults to None. Existing parameters. This has higher priority than initializer.

  • aux_params (dict) – Defaults to None. Existing auxiliary states. This has higher priority than initializer.

  • allow_missing (bool) – Allow missing values in arg_params and aux_params (if not None). In this case, missing values will be filled with initializer.

  • force_init (bool) – Defaults to False.

  • allow_extra (boolean, optional) – Whether allow extra parameters that are not needed by symbol. If this is True, no error will be thrown when arg_params or aux_params contain extra parameters that is not needed by the executor.

install_monitor(mon)[source]

Installs monitor on all executors

property label_shapes

Get label shapes.

Returns

The return value could be None if the module does not need labels, or if the module is not bound for training (in this case, label information is not available).

Return type

A list of (name, shape) pairs.

static load(prefix, epoch, sym_gen=None, default_bucket_key=None, **kwargs)[source]

Creates a model from previously saved checkpoint.

Parameters
  • prefix (str) – path prefix of saved model files. You should have “prefix-symbol.json”, “prefix-xxxx.params”, and optionally “prefix-xxxx.states”, where xxxx is the epoch number.

  • epoch (int) – epoch to load.

  • sym_gen (function) – A function when called with a bucket key, returns a triple (symbol, data_names, label_names). provide sym_gen which was used when saving bucketing module.

  • logger (Logger) – Default is logging.

  • context (Context or list of Context) – Default is cpu().

  • work_load_list (list of number) – Default None, indicating uniform workload.

  • fixed_param_names (list of str) – Default None, indicating no network parameters are fixed.

  • state_names (list of str) – States are similar to data and label, but not provided by data iterator. Instead they are initialized to 0 and can be set by set_states()

  • group2ctxs (dict of str to context or list of context,) – or list of dict of str to context Default is None. Mapping the ctx_group attribute to the context assignment.

  • compression_params (dict) – Specifies type of gradient compression and additional arguments depending on the type of compression being used. For example, 2bit compression requires a threshold. Arguments would then be {‘type’:’2bit’, ‘threshold’:0.5} See mxnet.KVStore.set_gradient_compression method for more details on gradient compression.

static load_dict(sym_dict=None, sym_gen=None, default_bucket_key=None, arg_params=None, aux_params=None, **kwargs)[source]

Creates a model from a dict mapping bucket_key to symbols and shared arg_params and aux_params.

Parameters
  • sym_dict (dict mapping bucket_key to symbol) – Dict mapping bucket key to symbol

  • sym_gen (function) – A function when called with a bucket key, returns a triple (symbol, data_names, label_names). provide sym_gen which was used when saving bucketing module.

  • default_bucket_key (str (or any python object)) – The key for the default bucket.

  • arg_params (dict) – Required for loading the BucketingModule. Dict of name to parameter ndarrays.

  • aux_params (dict) – Required for loading the BucketingModule. Dict of name to auxiliary state ndarrays.

  • logger (Logger) – Default is logging.

  • context (Context or list of Context) – Default is cpu().

  • work_load_list (list of number) – Default None, indicating uniform workload.

  • fixed_param_names (list of str) – Default None, indicating no network parameters are fixed.

  • state_names (list of str) – States are similar to data and label, but not provided by data iterator. Instead they are initialized to 0 and can be set by set_states()

  • group2ctxs (dict of str to context or list of context,) – or list of dict of str to context Default is None. Mapping the ctx_group attribute to the context assignment.

  • compression_params (dict) – Specifies type of gradient compression and additional arguments depending on the type of compression being used. For example, 2bit compression requires a threshold. Arguments would then be {‘type’:’2bit’, ‘threshold’:0.5} See mxnet.KVStore.set_gradient_compression method for more details on gradient compression.

property output_names

A list of names for the outputs of this module.

property output_shapes

Gets output shapes.

Returns

Return type

A list of (name, shape) pairs.

prepare(data_batch, sparse_row_id_fn=None)[source]

Prepares the module for processing a data batch.

Usually involves switching bucket and reshaping. For modules that contain row_sparse parameters in KVStore, it prepares the row_sparse parameters based on the sparse_row_id_fn.

Parameters
  • data_batch (DataBatch) – The current batch of data for forward computation.

  • sparse_row_id_fn (A callback function) – The function takes data_batch as an input and returns a dict of str -> NDArray. The resulting dict is used for pulling row_sparse parameters from the kvstore, where the str key is the name of the param, and the value is the row id of the param to pull.

save_checkpoint(prefix, epoch, remove_amp_cast=False)[source]

Saves current progress to checkpoint for all buckets in BucketingModule Use mx.callback.module_checkpoint as epoch_end_callback to save during training.

Parameters
  • prefix (str) – The file prefix to checkpoint to.

  • epoch (int) – The current epoch number.

set_params(arg_params, aux_params, allow_missing=False, force_init=True, allow_extra=False)[source]

Assigns parameters and aux state values.

Parameters
  • arg_params (dict) – Dictionary of name to value (NDArray) mapping.

  • aux_params (dict) – Dictionary of name to value (NDArray) mapping.

  • allow_missing (bool) – If true, params could contain missing values, and the initializer will be called to fill those missing params.

  • force_init (bool) – If true, will force re-initialize even if already initialized.

  • allow_extra (boolean, optional) – Whether allow extra parameters that are not needed by symbol. If this is True, no error will be thrown when arg_params or aux_params contain extra parameters that is not needed by the executor.

Examples

>>> # An example of setting module parameters.
>>> sym, arg_params, aux_params = mx.model.load_checkpoint(model_prefix, n_epoch_load)
>>> mod.set_params(arg_params=arg_params, aux_params=aux_params)
set_states(states=None, value=None)[source]

Sets value for states. Only one of states & values can be specified.

Parameters
  • states (list of list of NDArrays) – Source states arrays formatted like [[state1_dev1, state1_dev2], [state2_dev1, state2_dev2]].

  • value (number) – A single scalar value for all state arrays.

switch_bucket(bucket_key, data_shapes, label_shapes=None)[source]

Switches to a different bucket. This will change self.curr_module.

Parameters
  • bucket_key (str (or any python object)) – The key of the target bucket.

  • data_shapes (list of (str, tuple)) – Typically data_batch.provide_data.

  • label_shapes (list of (str, tuple)) – Typically data_batch.provide_label.

property symbol

The symbol of the current bucket being used.

update()[source]

Updates parameters according to installed optimizer and the gradient computed in the previous forward-backward cycle.

When KVStore is used to update parameters for multi-device or multi-machine training, a copy of the parameters are stored in KVStore. Note that for row_sparse parameters, this function does update the copy of parameters in KVStore, but doesn’t broadcast the updated parameters to all devices / machines. Please call prepare to broadcast row_sparse parameters with the next batch of data.

update_metric(eval_metric, labels, pre_sliced=False)[source]

Evaluates and accumulates evaluation metric on outputs of the last forward computation.

Parameters
  • eval_metric (EvalMetric) –

  • labels (list of NDArray) – Typically data_batch.label.

class mxnet.module.Module(symbol, data_names=('data', ), label_names=('softmax_label', ), logger=<module 'logging' from '/work/conda_env/lib/python3.8/logging/__init__.py'>, context=cpu(0), work_load_list=None, fixed_param_names=None, state_names=None, group2ctxs=None, compression_params=None)[source]

Bases: mxnet.module.base_module.BaseModule

Module is a basic module that wrap a Symbol. It is functionally the same as the FeedForward model, except under the module API.

Parameters
  • symbol (Symbol) –

  • data_names (list of str) – Defaults to (‘data’) for a typical model used in image classification.

  • label_names (list of str) – Defaults to (‘softmax_label’) for a typical model used in image classification.

  • logger (Logger) – Defaults to logging.

  • context (Context or list of Context) – Defaults to mx.cpu().

  • work_load_list (list of number) – Default None, indicating uniform workload.

  • fixed_param_names (list of str) – Default None, indicating no network parameters are fixed.

  • state_names (list of str) – states are similar to data and label, but not provided by data iterator. Instead they are initialized to 0 and can be set by set_states().

  • group2ctxs (dict of str to context or list of context,) – or list of dict of str to context Default is None. Mapping the ctx_group attribute to the context assignment.

  • compression_params (dict) – Specifies type of gradient compression and additional arguments depending on the type of compression being used. For example, 2bit compression requires a threshold. Arguments would then be {‘type’:’2bit’, ‘threshold’:0.5} See mxnet.KVStore.set_gradient_compression method for more details on gradient compression.

Methods

backward([out_grads])

Backward computation.

bind(data_shapes[, label_shapes, …])

Binds the symbols to construct executors.

borrow_optimizer(shared_module)

Borrows optimizer from a shared module.

forward(data_batch[, is_train])

Forward computation.

get_input_grads([merge_multi_context])

Gets the gradients with respect to the inputs of the module.

get_outputs([merge_multi_context])

Gets outputs of the previous forward computation.

get_params()

Gets current parameters.

get_states([merge_multi_context])

Gets states from all devices.

init_optimizer([kvstore, optimizer, …])

Installs and initializes optimizers.

init_params([initializer, arg_params, …])

Initializes the parameters and auxiliary states.

install_monitor(mon)

Installs monitor on all executors.

load(prefix, epoch[, load_optimizer_states])

Creates a model from previously saved checkpoint.

load_optimizer_states(fname)

Loads optimizer (updater) state from a file.

prepare(data_batch[, sparse_row_id_fn])

Prepares the module for processing a data batch.

reshape(data_shapes[, label_shapes])

Reshapes the module for new input shapes.

save_checkpoint(prefix, epoch[, …])

Saves current progress to checkpoint.

save_optimizer_states(fname)

Saves optimizer (updater) state to a file.

set_params(arg_params, aux_params[, …])

Assigns parameter and aux state values.

set_states([states, value])

Sets value for states.

update()

Updates parameters according to the installed optimizer and the gradients computed in the previous forward-backward batch.

update_metric(eval_metric, labels[, pre_sliced])

Evaluates and accumulates evaluation metric on outputs of the last forward computation.

Attributes

data_names

A list of names for data required by this module.

data_shapes

Gets data shapes.

label_names

A list of names for labels required by this module.

label_shapes

Gets label shapes.

output_names

A list of names for the outputs of this module.

output_shapes

Gets output shapes.

backward(out_grads=None)[source]

Backward computation.

Parameters

out_grads (NDArray or list of NDArray, optional) – Gradient on the outputs to be propagated back. This parameter is only needed when bind is called on outputs that are not a loss function.

bind(data_shapes, label_shapes=None, for_training=True, inputs_need_grad=False, force_rebind=False, shared_module=None, grad_req='write')[source]

Binds the symbols to construct executors. This is necessary before one can perform computation with the module.

Parameters
  • data_shapes (list of (str, tuple)) – Typically is data_iter.provide_data.

  • label_shapes (list of (str, tuple)) – Typically is data_iter.provide_label.

  • for_training (bool) – Default is True. Whether the executors should be bound for training.

  • inputs_need_grad (bool) – Default is False. Whether the gradients to the input data need to be computed. Typically this is not needed. But this might be needed when implementing composition of modules.

  • force_rebind (bool) – Default is False. This function does nothing if the executors are already bound. But with this True, the executors will be forced to rebind.

  • shared_module (Module) – Default is None. This is used in bucketing. When not None, the shared module essentially corresponds to a different bucket – a module with different symbol but with the same sets of parameters (e.g. unrolled RNNs with different lengths).

borrow_optimizer(shared_module)[source]

Borrows optimizer from a shared module. Used in bucketing, where exactly the same optimizer (esp. kvstore) is used.

Parameters

shared_module (Module) –

property data_names

A list of names for data required by this module.

property data_shapes

Gets data shapes.

Returns

Return type

A list of (name, shape) pairs.

forward(data_batch, is_train=None)[source]

Forward computation. It supports data batches with different shapes, such as different batch sizes or different image sizes. If reshaping of data batch relates to modification of symbol or module, such as changing image layout ordering or switching from training to predicting, module rebinding is required.

Parameters
  • data_batch (DataBatch) – Could be anything with similar API implemented.

  • is_train (bool) – Default is None, which means is_train takes the value of self.for_training.

get_input_grads(merge_multi_context=True)[source]

Gets the gradients with respect to the inputs of the module.

If merge_multi_context is True, it is like [grad1, grad2]. Otherwise, it is like [[grad1_dev1, grad1_dev2], [grad2_dev1, grad2_dev2]]. All the output elements are NDArray.

Parameters

merge_multi_context (bool) – Default is True. In the case when data-parallelism is used, the outputs will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.

Returns

Input gradients

Return type

list of NDArray or list of list of NDArray

get_outputs(merge_multi_context=True)[source]

Gets outputs of the previous forward computation.

If merge_multi_context is True, it is like [out1, out2]. Otherwise, it is like [[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]. All the output elements are NDArray. When merge_multi_context is False, those NDArray might live on different devices.

Parameters

merge_multi_context (bool) – Default is True. In the case when data-parallelism is used, the outputs will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.

Returns

Output.

Return type

list of NDArray or list of list of NDArray

get_params()[source]

Gets current parameters.

Returns

A pair of dictionaries each mapping parameter names to NDArray values.

Return type

(arg_params, aux_params)

get_states(merge_multi_context=True)[source]

Gets states from all devices.

If merge_multi_context is True, it is like [out1, out2]. Otherwise, it is like [[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]. All the output elements are NDArray.

Parameters

merge_multi_context (bool) – Default is True. In the case when data-parallelism is used, the states will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.

Returns

States

Return type

list of NDArray or list of list of NDArray

init_optimizer(kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), force_init=False)[source]

Installs and initializes optimizers.

Parameters
  • kvstore (str or KVStore) – Default ‘local’.

  • optimizer (str or Optimizer) – Default ‘sgd’

  • optimizer_params (dict) – Default ((‘learning_rate’, 0.01),). The default value is not a dictionary, just to avoid pylint warning of dangerous default values.

  • force_init (bool) – Default False, indicating whether we should force re-initializing the optimizer in the case an optimizer is already installed.

init_params(initializer=<mxnet.initializer.Uniform object>, arg_params=None, aux_params=None, allow_missing=False, force_init=False, allow_extra=False)[source]

Initializes the parameters and auxiliary states.

Parameters
  • initializer (Initializer) – Called to initialize parameters if needed.

  • arg_params (dict) – If not None, should be a dictionary of existing arg_params. Initialization will be copied from that.

  • aux_params (dict) – If not None, should be a dictionary of existing aux_params. Initialization will be copied from that.

  • allow_missing (bool) – If True, params could contain missing values, and the initializer will be called to fill those missing params.

  • force_init (bool) – If True, will force re-initialize even if already initialized.

  • allow_extra (boolean, optional) – Whether allow extra parameters that are not needed by symbol. If this is True, no error will be thrown when arg_params or aux_params contain extra parameters that is not needed by the executor.

install_monitor(mon)[source]

Installs monitor on all executors.

property label_names

A list of names for labels required by this module.

property label_shapes

Gets label shapes.

Returns

The return value could be None if the module does not need labels, or if the module is not bound for training (in this case, label information is not available).

Return type

A list of (name, shape) pairs.

static load(prefix, epoch, load_optimizer_states=False, **kwargs)[source]

Creates a model from previously saved checkpoint.

Parameters
  • prefix (str) – path prefix of saved model files. You should have “prefix-symbol.json”, “prefix-xxxx.params”, and optionally “prefix-xxxx.states”, where xxxx is the epoch number.

  • epoch (int) – epoch to load.

  • load_optimizer_states (bool) – whether to load optimizer states. Checkpoint needs to have been made with save_optimizer_states=True.

  • data_names (list of str) – Default is (‘data’) for a typical model used in image classification.

  • label_names (list of str) – Default is (‘softmax_label’) for a typical model used in image classification.

  • logger (Logger) – Default is logging.

  • context (Context or list of Context) – Default is cpu().

  • work_load_list (list of number) – Default None, indicating uniform workload.

  • fixed_param_names (list of str) – Default None, indicating no network parameters are fixed.

load_optimizer_states(fname)[source]

Loads optimizer (updater) state from a file.

Parameters

fname (str) – Path to input states file.

property output_names

A list of names for the outputs of this module.

property output_shapes

Gets output shapes.

Returns

Return type

A list of (name, shape) pairs.

prepare(data_batch, sparse_row_id_fn=None)[source]

Prepares the module for processing a data batch.

Usually involves switching bucket and reshaping. For modules that contain row_sparse parameters in KVStore, it prepares the row_sparse parameters based on the sparse_row_id_fn.

When KVStore is used to update parameters for multi-device or multi-machine training, a copy of the parameters are stored in KVStore. Note that for row_sparse parameters, the update() updates the copy of parameters in KVStore, but doesn’t broadcast the updated parameters to all devices / machines. The prepare function is used to broadcast row_sparse parameters with the next batch of data.

Parameters
  • data_batch (DataBatch) – The current batch of data for forward computation.

  • sparse_row_id_fn (A callback function) – The function takes data_batch as an input and returns a dict of str -> NDArray. The resulting dict is used for pulling row_sparse parameters from the kvstore, where the str key is the name of the param, and the value is the row id of the param to pull.

reshape(data_shapes, label_shapes=None)[source]

Reshapes the module for new input shapes.

Parameters
  • data_shapes (list of (str, tuple)) – Typically is data_iter.provide_data.

  • label_shapes (list of (str, tuple)) – Typically is data_iter.provide_label.

save_checkpoint(prefix, epoch, save_optimizer_states=False, remove_amp_cast=True)[source]

Saves current progress to checkpoint. Use mx.callback.module_checkpoint as epoch_end_callback to save during training.

Parameters
  • prefix (str) – The file prefix to checkpoint to.

  • epoch (int) – The current epoch number.

  • save_optimizer_states (bool) – Whether to save optimizer states to continue training.

save_optimizer_states(fname)[source]

Saves optimizer (updater) state to a file.

Parameters

fname (str) – Path to output states file.

set_params(arg_params, aux_params, allow_missing=False, force_init=True, allow_extra=False)[source]

Assigns parameter and aux state values.

Parameters
  • arg_params (dict) – Dictionary of name to NDArray.

  • aux_params (dict) – Dictionary of name to NDArray.

  • allow_missing (bool) – If True, params could contain missing values, and the initializer will be called to fill those missing params.

  • force_init (bool) – If True, will force re-initialize even if already initialized.

  • allow_extra (boolean, optional) – Whether allow extra parameters that are not needed by symbol. If this is True, no error will be thrown when arg_params or aux_params contain extra parameters that is not needed by the executor.

Examples

>>> # An example of setting module parameters.
>>> sym, arg_params, aux_params = mx.model.load_checkpoint(model_prefix, n_epoch_load)
>>> mod.set_params(arg_params=arg_params, aux_params=aux_params)
set_states(states=None, value=None)[source]

Sets value for states. Only one of the states & value can be specified.

Parameters
  • states (list of list of NDArrays) – source states arrays formatted like [[state1_dev1, state1_dev2], [state2_dev1, state2_dev2]].

  • value (number) – a single scalar value for all state arrays.

update()[source]

Updates parameters according to the installed optimizer and the gradients computed in the previous forward-backward batch.

When KVStore is used to update parameters for multi-device or multi-machine training, a copy of the parameters are stored in KVStore. Note that for row_sparse parameters, this function does update the copy of parameters in KVStore, but doesn’t broadcast the updated parameters to all devices / machines. Please call prepare to broadcast row_sparse parameters with the next batch of data.

update_metric(eval_metric, labels, pre_sliced=False)[source]

Evaluates and accumulates evaluation metric on outputs of the last forward computation.

Parameters
  • eval_metric (EvalMetric) – Evaluation metric to use.

  • labels (list of NDArray if pre_sliced parameter is set to False,) – list of lists of NDArray otherwise. Typically data_batch.label.

  • pre_sliced (bool) – Whether the labels are already sliced per device (default: False).

class mxnet.module.PythonLossModule(name='pyloss', data_names=('data', ), label_names=('softmax_label', ), logger=<module 'logging' from '/work/conda_env/lib/python3.8/logging/__init__.py'>, grad_func=None)[source]

Bases: mxnet.module.python_module.PythonModule

A convenient module class that implements many of the module APIs as empty functions.

Parameters
  • name (str) – Names of the module. The outputs will be named [name + ‘_output’].

  • data_names (list of str) – Defaults to ['data']. Names of the data expected by this module. Should be a list of only one name.

  • label_names (list of str) – Default ['softmax_label']. Names of the labels expected by the module. Should be a list of only one name.

  • grad_func (function) – Optional. If not None, should be a function that takes scores and labels, both of type NDArray, and return the gradients with respect to the scores according to this loss function. The return value could be a numpy array or an NDArray.

Methods

backward([out_grads])

Backward computation.

forward(data_batch[, is_train])

Forward computation.

get_input_grads([merge_multi_context])

Gets the gradients to the inputs, computed in the previous backward computation.

get_outputs([merge_multi_context])

Gets outputs of the previous forward computation.

install_monitor(mon)

Installs monitor on all executors.

backward(out_grads=None)[source]

Backward computation.

Parameters

out_grads (NDArray or list of NDArray, optional) – Gradient on the outputs to be propagated back. This parameter is only needed when bind is called on outputs that are not a loss function.

forward(data_batch, is_train=None)[source]

Forward computation. Here we do nothing but to keep a reference to the scores and the labels so that we can do backward computation.

Parameters
  • data_batch (DataBatch) – Could be anything with similar API implemented.

  • is_train (bool) – Default is None, which means is_train takes the value of self.for_training.

get_input_grads(merge_multi_context=True)[source]

Gets the gradients to the inputs, computed in the previous backward computation.

Parameters

merge_multi_context (bool) – Should always be True because we do not use multiple context for computation.

get_outputs(merge_multi_context=True)[source]

Gets outputs of the previous forward computation. As a output loss module, we treat the inputs to this module as scores, and simply return them.

Parameters

merge_multi_context (bool) – Should always be True, because we do not use multiple contexts for computing.

install_monitor(mon)[source]

Installs monitor on all executors.

class mxnet.module.PythonModule(data_names, label_names, output_names, logger=<module 'logging' from '/work/conda_env/lib/python3.8/logging/__init__.py'>)[source]

Bases: mxnet.module.base_module.BaseModule

A convenient module class that implements many of the module APIs as empty functions.

Parameters
  • data_names (list of str) – Names of the data expected by the module.

  • label_names (list of str) – Names of the labels expected by the module. Could be None if the module does not need labels.

  • output_names (list of str) – Names of the outputs.

Methods

bind(data_shapes[, label_shapes, …])

Binds the symbols to construct executors.

get_params()

Gets parameters, those are potentially copies of the actual parameters used to do computation on the device.

init_optimizer([kvstore, optimizer, …])

Installs and initializes optimizers.

init_params([initializer, arg_params, …])

Initializes the parameters and auxiliary states.

update()

Updates parameters according to the installed optimizer and the gradients computed in the previous forward-backward batch.

update_metric(eval_metric, labels[, pre_sliced])

Evaluates and accumulates evaluation metric on outputs of the last forward computation.

Attributes

data_names

A list of names for data required by this module.

data_shapes

A list of (name, shape) pairs specifying the data inputs to this module.

label_shapes

A list of (name, shape) pairs specifying the label inputs to this module.

output_names

A list of names for the outputs of this module.

output_shapes

A list of (name, shape) pairs specifying the outputs of this module.

bind(data_shapes, label_shapes=None, for_training=True, inputs_need_grad=False, force_rebind=False, shared_module=None, grad_req='write')[source]

Binds the symbols to construct executors. This is necessary before one can perform computation with the module.

Parameters
  • data_shapes (list of (str, tuple)) – Typically is data_iter.provide_data.

  • label_shapes (list of (str, tuple)) – Typically is data_iter.provide_label.

  • for_training (bool) – Default is True. Whether the executors should be bind for training.

  • inputs_need_grad (bool) – Default is False. Whether the gradients to the input data need to be computed. Typically this is not needed. But this might be needed when implementing composition of modules.

  • force_rebind (bool) – Default is False. This function does nothing if the executors are already bound. But with this True, the executors will be forced to rebind.

  • shared_module (Module) – Default is None. This is used in bucketing. When not None, the shared module essentially corresponds to a different bucket – a module with different symbol but with the same sets of parameters (e.g. unrolled RNNs with different lengths).

  • grad_req (str, list of str, dict of str to str) – Requirement for gradient accumulation. Can be ‘write’, ‘add’, or ‘null’ (default to ‘write’). Can be specified globally (str) or for each argument (list, dict).

property data_names

A list of names for data required by this module.

property data_shapes

A list of (name, shape) pairs specifying the data inputs to this module.

get_params()[source]

Gets parameters, those are potentially copies of the actual parameters used to do computation on the device. Subclass should override this method if contains parameters.

Returns

Return type

({}, {}), a pair of empty dict.

init_optimizer(kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), force_init=False)[source]

Installs and initializes optimizers. By default we do nothing. Subclass should override this method if needed.

Parameters
  • kvstore (str or KVStore) – Default ‘local’.

  • optimizer (str or Optimizer) – Default ‘sgd’

  • optimizer_params (dict) – Default ((‘learning_rate’, 0.01),). The default value is not a dictionary, just to avoid pylint warning of dangerous default values.

  • force_init (bool) – Default False, indicating whether we should force re-initializing the optimizer in the case an optimizer is already installed.

init_params(initializer=<mxnet.initializer.Uniform object>, arg_params=None, aux_params=None, allow_missing=False, force_init=False, allow_extra=False)[source]

Initializes the parameters and auxiliary states. By default this function does nothing. Subclass should override this method if contains parameters.

Parameters
  • initializer (Initializer) – Called to initialize parameters if needed.

  • arg_params (dict) – If not None, should be a dictionary of existing arg_params. Initialization will be copied from that.

  • aux_params (dict) – If not None, should be a dictionary of existing aux_params. Initialization will be copied from that.

  • allow_missing (bool) – If True, params could contain missing values, and the initializer will be called to fill those missing params.

  • force_init (bool) – If True, will force re-initialize even if already initialized.

  • allow_extra (boolean, optional) – Whether allow extra parameters that are not needed by symbol. If this is True, no error will be thrown when arg_params or aux_params contain extra parameters that is not needed by the executor.

property label_shapes

A list of (name, shape) pairs specifying the label inputs to this module. If this module does not accept labels – either it is a module without loss function, or it is not bound for training, then this should return an empty list []`.

property output_names

A list of names for the outputs of this module.

property output_shapes

A list of (name, shape) pairs specifying the outputs of this module.

update()[source]

Updates parameters according to the installed optimizer and the gradients computed in the previous forward-backward batch. Currently we do nothing here. Subclass should override this method if contains parameters.

update_metric(eval_metric, labels, pre_sliced=False)[source]

Evaluates and accumulates evaluation metric on outputs of the last forward computation. Subclass should override this method if needed.

Parameters
  • eval_metric (EvalMetric) –

  • labels (list of NDArray) – Typically data_batch.label.

class mxnet.module.SequentialModule(logger=<module 'logging' from '/work/conda_env/lib/python3.8/logging/__init__.py'>)[source]

Bases: mxnet.module.base_module.BaseModule

A SequentialModule is a container module that can chain multiple modules together.

Note

Building a computation graph with this kind of imperative container is less flexible and less efficient than the symbolic graph. So, this should be only used as a handy utility.

Methods

add(module, **kwargs)

Add a module to the chain.

backward([out_grads])

Backward computation.

bind(data_shapes[, label_shapes, …])

Binds the symbols to construct executors.

forward(data_batch[, is_train])

Forward computation.

get_input_grads([merge_multi_context])

Gets the gradients with respect to the inputs of the module.

get_outputs([merge_multi_context])

Gets outputs from a previous forward computation.

get_params()

Gets current parameters.

init_optimizer([kvstore, optimizer, …])

Installs and initializes optimizers.

init_params([initializer, arg_params, …])

Initializes parameters.

install_monitor(mon)

Installs monitor on all executors.

update()

Updates parameters according to installed optimizer and the gradient computed in the previous forward-backward cycle.

update_metric(eval_metric, labels[, pre_sliced])

Evaluates and accumulates evaluation metric on outputs of the last forward computation.

Attributes

data_names

A list of names for data required by this module.

data_shapes

Gets data shapes.

label_shapes

Gets label shapes.

output_names

A list of names for the outputs of this module.

output_shapes

Gets output shapes.

add(module, **kwargs)[source]

Add a module to the chain.

Parameters
  • module (BaseModule) – The new module to add.

  • kwargs (**keywords) –

    All the keyword arguments are saved as meta information for the added module. The currently known meta includes

    • take_labels: indicating whether the module expect to

      take labels when doing computation. Note any module in the chain can take labels (not necessarily only the top most one), and they all take the same labels passed from the original data batch for the SequentialModule.

Returns

This function returns self to allow us to easily chain a series of add calls.

Return type

self

Examples

>>> # An example of addinging two modules to a chain.
>>> seq_mod = mx.mod.SequentialModule()
>>> seq_mod.add(mod1)
>>> seq_mod.add(mod2)
backward(out_grads=None)[source]

Backward computation.

bind(data_shapes, label_shapes=None, for_training=True, inputs_need_grad=False, force_rebind=False, shared_module=None, grad_req='write')[source]

Binds the symbols to construct executors. This is necessary before one can perform computation with the module.

Parameters
  • data_shapes (list of (str, tuple)) – Typically is data_iter.provide_data.

  • label_shapes (list of (str, tuple)) – Typically is data_iter.provide_label.

  • for_training (bool) – Default is True. Whether the executors should be bind for training.

  • inputs_need_grad (bool) – Default is False. Whether the gradients to the input data need to be computed. Typically this is not needed. But this might be needed when implementing composition of modules.

  • force_rebind (bool) – Default is False. This function does nothing if the executors are already bound. But with this True, the executors will be forced to rebind.

  • shared_module (Module) – Default is None. Currently shared module is not supported for SequentialModule.

  • grad_req (str, list of str, dict of str to str) – Requirement for gradient accumulation. Can be ‘write’, ‘add’, or ‘null’ (default to ‘write’). Can be specified globally (str) or for each argument (list, dict).

property data_names

A list of names for data required by this module.

property data_shapes

Gets data shapes.

Returns

A list of (name, shape) pairs. The data shapes of the first module is the data shape of a SequentialModule.

Return type

list

forward(data_batch, is_train=None)[source]

Forward computation.

Parameters
  • data_batch (DataBatch) –

  • is_train (bool) – Default is None, in which case is_train is take as self.for_training.

get_input_grads(merge_multi_context=True)[source]

Gets the gradients with respect to the inputs of the module.

Parameters

merge_multi_context (bool) – Default is True. In the case when data-parallelism is used, the outputs will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.

Returns

If merge_multi_context is True, it is like [grad1, grad2]. Otherwise, it is like [[grad1_dev1, grad1_dev2], [grad2_dev1, grad2_dev2]]. All the output elements are NDArray.

Return type

list of NDArrays or list of list of NDArrays

get_outputs(merge_multi_context=True)[source]

Gets outputs from a previous forward computation.

Parameters

merge_multi_context (bool) – Default is True. In the case when data-parallelism is used, the outputs will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.

Returns

If merge_multi_context is True, it is like [out1, out2]. Otherwise, it is like [[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]. All the output elements are numpy arrays.

Return type

list of NDArray or list of list of NDArray

get_params()[source]

Gets current parameters.

Returns

A pair of dictionaries each mapping parameter names to NDArray values. This is a merged dictionary of all the parameters in the modules.

Return type

(arg_params, aux_params)

init_optimizer(kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), force_init=False)[source]

Installs and initializes optimizers.

Parameters
  • kvstore (str or KVStore) – Default ‘local’.

  • optimizer (str or Optimizer) – Default ‘sgd’

  • optimizer_params (dict) – Default (('learning_rate', 0.01),). The default value is not a dictionary, just to avoid pylint warning of dangerous default values.

  • force_init (bool) – Default False, indicating whether we should force re-initializing the optimizer in the case an optimizer is already installed.

init_params(initializer=<mxnet.initializer.Uniform object>, arg_params=None, aux_params=None, allow_missing=False, force_init=False, allow_extra=False)[source]

Initializes parameters.

Parameters
  • initializer (Initializer) –

  • arg_params (dict) – Default None. Existing parameters. This has higher priority than initializer.

  • aux_params (dict) – Default None. Existing auxiliary states. This has higher priority than initializer.

  • allow_missing (bool) – Allow missing values in arg_params and aux_params (if not None). In this case, missing values will be filled with initializer.

  • force_init (bool) – Default False.

  • allow_extra (boolean, optional) – Whether allow extra parameters that are not needed by symbol. If this is True, no error will be thrown when arg_params or aux_params contain extra parameters that is not needed by the executor.

install_monitor(mon)[source]

Installs monitor on all executors.

property label_shapes

Gets label shapes.

Returns

A list of (name, shape) pairs. The return value could be None if the module does not need labels, or if the module is not bound for training (in this case, label information is not available).

Return type

list

property output_names

A list of names for the outputs of this module.

property output_shapes

Gets output shapes.

Returns

A list of (name, shape) pairs. The output shapes of the last module is the output shape of a SequentialModule.

Return type

list

update()[source]

Updates parameters according to installed optimizer and the gradient computed in the previous forward-backward cycle.

update_metric(eval_metric, labels, pre_sliced=False)[source]

Evaluates and accumulates evaluation metric on outputs of the last forward computation.

Parameters
  • eval_metric (EvalMetric) –

  • labels (list of NDArray) – Typically data_batch.label.