Executor and Executor Manager¶

The executor and executor manager are internal classes for managing symbolic graph execution. This document is only intended for reference for advanced users.

Executor¶

Executor Executor is the object providing efficient symbolic graph execution and optimization.

Executor Manager¶

`DataParallelExecutorGroup`	A group of executors living on different devices, for data parallelization.
`DataParallelExecutorManager`	Helper class to manage multiple executors for data parallelism.

API Reference¶

Symbolic Executor component of MXNet.

class mxnet.executor.Executor(handle, symbol, ctx, grad_req, group2ctx)[source]¶

Executor is the object providing efficient symbolic graph execution and optimization.

Examples

>>> # typical approach to create an executor is to bind symbol
>>> a = mx.sym.Variable('a')
>>> b = mx.sym.Variable('b')
>>> c = 2 * a + b
>>> texec = c.bind(mx.cpu(), {'a': mx.nd.array([1,2]), 'b':mx.nd.array([2,3])})

forward(is_train=False, **kwargs)[source]¶

Calculate the outputs specified by the bound symbol.

Parameters:	is_train (bool, optional) – Whether this forward is for evaluation purpose. If True, a backward call is expected to follow. **kwargs – Additional specification of input arguments.

Examples

>>> # doing forward by specifying data
>>> texec.forward(is_train=True, data=mydata)
>>> # doing forward by not specifying things, but copy to the executor before hand
>>> mydata.copyto(texec.arg_dict['data'])
>>> texec.forward(is_train=True)
>>> # doing forward by specifying data and get outputs
>>> outputs = texec.forward(is_train=True, data=mydata)
>>> print(outputs[0].asnumpy())

backward(out_grads=None, is_train=True)[source]¶

Do backward pass to get the gradient of arguments.

Parameters:

out_grads (NDArray or list of NDArray or dict of str to NDArray, optional) – Gradient on the outputs to be propagated back. This parameter is only needed when bind is called on outputs that are not a loss function.
is_train (bool, default True) – Whether this backward is for training or inference. Note that in rare cases you want to call backward with is_train=False to get gradient during inference.

Examples

>>> # Example for binding on loss function symbol, which gives the loss value of the model.
>>> # Equivalently it gives the head gradient for backward pass.
>>> # In this example the built-in SoftmaxOutput is used as loss function.
>>> # MakeLoss can be used to define customized loss function symbol.
>>> net = mx.sym.Variable('data')
>>> net = mx.sym.FullyConnected(net, name='fc', num_hidden=6)
>>> net = mx.sym.Activation(net, name='relu', act_type="relu")
>>> net = mx.sym.SoftmaxOutput(net, name='softmax')

>>> args =  {'data': mx.nd.ones((1, 4)), 'fc_weight': mx.nd.ones((6, 4)),
>>>          'fc_bias': mx.nd.array((1, 4, 4, 4, 5, 6)), 'softmax_label': mx.nd.ones((1))}
>>> args_grad = {'fc_weight': mx.nd.zeros((6, 4)), 'fc_bias': mx.nd.zeros((6))}
>>> texec = net.bind(ctx=mx.cpu(), args=args, args_grad=args_grad)
>>> out = texec.forward(is_train=True)[0].copy()
>>> print out.asnumpy()
[[ 0.00378404  0.07600445  0.07600445  0.07600445  0.20660152  0.5616011 ]]
>>> texec.backward()
>>> print(texec.grad_arrays[1].asnumpy())
[[ 0.00378404  0.00378404  0.00378404  0.00378404]
 [-0.92399555 -0.92399555 -0.92399555 -0.92399555]
 [ 0.07600445  0.07600445  0.07600445  0.07600445]
 [ 0.07600445  0.07600445  0.07600445  0.07600445]
 [ 0.20660152  0.20660152  0.20660152  0.20660152]
 [ 0.5616011   0.5616011   0.5616011   0.5616011 ]]
>>>
>>> # Example for binding on non-loss function symbol.
>>> # Here the binding symbol is neither built-in loss function
>>> # nor customized loss created by MakeLoss.
>>> # As a result the head gradient is not automatically provided.
>>> a = mx.sym.Variable('a')
>>> b = mx.sym.Variable('b')
>>> # c is not a loss function symbol
>>> c = 2 * a + b
>>> args = {'a': mx.nd.array([1,2]), 'b':mx.nd.array([2,3])}
>>> args_grad = {'a': mx.nd.zeros((2)), 'b': mx.nd.zeros((2))}
>>> texec = c.bind(ctx=mx.cpu(), args=args, args_grad=args_grad)
>>> out = texec.forward(is_train=True)[0].copy()
>>> print(out.asnumpy())
[ 4.  7.]
>>> # out_grads is the head gradient in backward pass.
>>> # Here we define 'c' as loss function.
>>> # Then 'out' is passed as head gradient of backward pass.
>>> texec.backward(out)
>>> print(texec.grad_arrays[0].asnumpy())
[ 8.  14.]
>>> print(texec.grad_arrays[1].asnumpy())
[ 4.  7.]

set_monitor_callback(callback)[source]¶

Install callback for monitor.

Parameters:	callback (function) – Takes a string and an NDArrayHandle.

Examples

>>> def mon_callback(*args, **kwargs):
>>>     print("Do your stuff here.")
>>>
>>> texe.set_monitor_callback(mon_callback)

arg_dict¶

Get dictionary representation of argument arrrays.

Returns:	arg_dict – The dictionary that maps the names of arguments to NDArrays.
Return type:	dict of str to NDArray
Raises:	ValueError : if there are duplicated names in the arguments.

grad_dict¶

Get dictionary representation of gradient arrays.

Returns:	grad_dict – The dictionary that maps name of arguments to gradient arrays.
Return type:	dict of str to NDArray

aux_dict¶

Get dictionary representation of auxiliary states arrays.

Returns:	aux_dict – The dictionary that maps name of auxiliary states to NDArrays.
Return type:	dict of str to NDArray
Raises:	ValueError : if there are duplicated names in the auxiliary states.

output_dict¶

Get dictionary representation of output arrays.

Returns:	output_dict – The dictionary that maps name of output names to NDArrays.
Return type:	dict of str to NDArray
Raises:	ValueError : if there are duplicated names in the outputs.

copy_params_from(arg_params, aux_params=None, allow_extra_params=False)[source]¶

Copy parameters from arg_params, aux_params into executor’s internal array.

Parameters:

arg_params (dict of str to NDArray) – Parameters, dict of name to NDArray of arguments.
aux_params (dict of str to NDArray, optional) – Parameters, dict of name to NDArray of auxiliary states.
allow_extra_params (boolean, optional) – Whether allow extra parameters that are not needed by symbol. If this is True, no error will be thrown when arg_params or aux_params contain extra parameters that is not needed by the executor.

Raises:

ValueError – If there is additional parameters in the dict but allow_extra_params=False.

Examples

>>> # set parameters with existing model checkpoint
>>> model_prefix = 'mx_mlp'
>>> sym, arg_params, aux_params = mx.model.load_checkpoint(model_prefix, 0)
>>> texec.copy_params_from(arg_params, aux_params)

reshape(partial_shaping=False, allow_up_sizing=False, **kwargs)[source]¶

Return a new executor with the same symbol and shared memory, but different input/output shapes. For runtime reshaping, variable length sequences, etc. The returned executor shares state with the current one, and cannot be used in parallel with it.

Parameters:	partial_shaping (bool) – Whether to allow changing the shape of unspecified arguments. allow_up_sizing (bool) – Whether to allow allocating new ndarrays that’s larger than the original. kwargs (dict of string to tuple of int) – New shape for arguments.
Returns:	exec – A new executor that shares memory with self.
Return type:	Executor

Examples

>>> a = mx.sym.Variable('a')
>>> b = mx.sym.Variable('b')
>>> c = 2 * a + b
>>> texec = c.bind(mx.cpu(), {'a': mx.nd.zeros((2, 1)), 'b': mx.nd.ones((2,1))})
>>> new_shape = {'a': (4, 2), 'b': (4, 2)}
>>> texec.reshape(allow_up_sizing=True, **new_shape)

debug_str()[source]¶

Get a debug string about internal execution plan.

Returns:	debug_str – Debug string of the executor.
Return type:	string

Examples

>>> a = mx.sym.Variable('a')
>>> b = mx.sym.sin(a)
>>> c = 2 * a + b
>>> texec = c.bind(mx.cpu(), {'a': mx.nd.array([1,2]), 'b':mx.nd.array([2,3])})
>>> print(texec.debug_str())
Symbol Outputs:
            output[0]=_plus0(0)
Variable:a
--------------------
Op:_mul_scalar, Name=_mulscalar0
Inputs:
        arg[0]=a(0) version=0
Attrs:
        scalar=2
--------------------
Op:sin, Name=sin0
Inputs:
        arg[0]=a(0) version=0
--------------------
Op:elemwise_add, Name=_plus0
Inputs:
        arg[0]=_mulscalar0(0)
        arg[1]=sin0(0)
Total 0 MB allocated
Total 11 TempSpace resource requested

Executor manager.

class mxnet.executor_manager.DataParallelExecutorGroup(sym, arg_names, param_names, ctx, slices, train_data, shared_group=None)[source]¶

A group of executors living on different devices, for data parallelization.

Parameters:

sym (Symbol) – The network configuration.
arg_names (list of str) – Equals sym.list_arguments()
param_names (list of str) – List of names of all trainable parameters.
ctx (list of Context) – List of devices for training (data parallelization).
slices (list of int) – Describes how the data parallelization splits data into different devices.
train_data (DataIter (or DataBatch)) – The dataset for training. It could be any object with provide_data and provide_label properties. Loading of actual data is not necessarily needed at this stage.
shared_grop (DataParallelExecutorGroup) – An existing executor group, if to share parameters with it.

load_data_batch(data_batch)[source]¶: Load data and labels into arrays.

forward(is_train=False)[source]¶: Perform a forward pass on each executor.

backward()[source]¶: Perform a backward pass on each executor.

update_metric(metric, labels)[source]¶: Update evaluation metric with label and current outputs.

class mxnet.executor_manager.DataParallelExecutorManager(symbol, ctx, train_data, arg_names, param_names, aux_names, work_load_list=None, logger=None, sym_gen=None)[source]¶

Helper class to manage multiple executors for data parallelism.

Parameters:

symbol (Symbol) – Output symbol.
ctx (list of Context) – Devices to run on.
param_names (list of str) – Name of all trainable parameters of the network.
arg_names (list of str) – Name of all arguments of the network.
aux_names (list of str) – Name of all auxiliary states of the network.
train_data (DataIter) – Training data iterator.
work_load_list (list of float or int, optional) – The list of work load for different devices, in the same order as ctx.
logger (logging logger) – When not specified, default logger will be used.
sym_gen (A function that generate new Symbols depending on different) – input shapes. Used only for bucketing.

install_monitor(monitor)[source]¶: Install monitor on all executors.

set_params(arg_params, aux_params)[source]¶

Set parameter and aux values.

Parameters:	arg_params (list of NDArray) – Source parameter arrays aux_params (list of NDArray) – Source aux arrays.

copy_to(arg_params, aux_params)[source]¶

Copy data from each executor to `arg_params and aux_params.

Parameters:	arg_params (list of NDArray) – Target parameter arrays. aux_params (list of NDArray) – Target aux arrays.

Notes

This function will inplace update the NDArrays in arg_params and aux_params.

param_arrays¶: Shared parameter arrays.

grad_arrays¶: Shared gradient arrays.

aux_arrays¶: Shared aux states.

load_data_batch(data_batch)[source]¶: Load data and labels into arrays.

forward(is_train=False)[source]¶: Run forward on the current executor.

backward()[source]¶: Run backward on the current executor.

update_metric(metric, labels)[source]¶: Update metric with the current executor.