mxnet.kvstore.Horovod

class Horovod[source]

Bases: mxnet.kvstore.base.KVStoreBase

A communication backend using Horovod.

__init__()[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__()

Initialize self.

broadcast(key, value, out[, priority])

Broadcast the value NDArray at rank 0 to all ranks

is_capable(capability)

Queries if the KVStore type supports certain capability, such as optimizer algorithm, gradient compression, sparsity, etc.

load_optimizer_states(fname)

Loads the optimizer (updater) state from the file.

pushpull(key, value[, out, priority])

Performs allreduce on a single tensor or a list of tensor objects

register(klass)

Registers a new KVStore.

save_optimizer_states(fname[, dump_optimizer])

Saves the optimizer (updater) state to a file.

set_optimizer(optimizer)

Registers an optimizer with the kvstore.

Attributes

OPTIMIZER

kv_registry

local_rank

num_workers

Returns the number of worker nodes.

rank

Returns the rank of this worker node.

type

Returns the type of this kvstore backend.

broadcast(key, value, out, priority=0)[source]

Broadcast the value NDArray at rank 0 to all ranks

Parameters
  • key (str, or int) – The key is used to name the tensor for allreduce. Its usage is different from that of parameter servers.

  • value (NDArray) – The tensor that is to be broadcasted.

  • out (NDArray, list of NDArray) – Output tensor that receives value broadcasted from root process

  • priority (int, optional) – The priority of the operation. Higher priority operations are likely to be executed before other actions.

Examples

>>> a = mx.nd.ones(shape)
>>> b = mx.nd.zeros(shape)
>>> kv.broadcast('2', value=a, out=b)
>>> print(b.asnumpy)
[[ 1.  1.  1.]
[ 1.  1.  1.]]
static is_capable(capability)[source]

Queries if the KVStore type supports certain capability, such as optimizer algorithm, gradient compression, sparsity, etc.

Parameters

capability (str) – The capability to query

Returns

result – Whether the capability is supported or not.

Return type

bool

load_optimizer_states(fname)[source]

Loads the optimizer (updater) state from the file.

Parameters

fname (str) – Path to input states file.

property num_workers

Returns the number of worker nodes.

Returns

size – The number of worker nodes.

Return type

int

pushpull(key, value, out=None, priority=0)[source]

Performs allreduce on a single tensor or a list of tensor objects

This function performs in-place summation of the input tensor over all the processes.

The name pushpull is a generic term. In Horovod, its action is implemented via ring allreduce. Each operation is identified by the ‘key’; if key is not provided, an incremented auto-generated name is used. The tensor type and shape must be the same on all processes for a given name. The reduction will not start until all processes are ready to send and receive the tensor.

Parameters
  • key (str, int, or sequence of str or int) – Keys used to uniquely tag an operation.

  • value (NDArray) – Tensor value on one process to be summed. If out is not specified, the value will be modified in-place

  • out (NDArray) – Output tensor after allreduce. If not specified, the input tensor value will be modified in-place.

  • priority (int, optional) – The priority of the operation. Higher priority operations are likely to be executed before other actions.

Examples

>>> # perform in-place allreduce on tensor a
>>> shape = (2, 3)
>>> nworker = kv.num_workers # assume there are 8 processes
>>> a = mx.nd.ones(shape)
>>> kv.pushpull('1', a)
>>> print(a.asnumpy())
[[ 8.  8.  8.]
[ 8.  8.  8.]]
>>> # perform allreduce on tensor a and output to b
>>> a = mx.nd.ones(shape)
>>> kv.pushpull('2', a, out=b)
>>> print(b.asnumpy())
[[ 8.  8.  8.]
[ 8.  8.  8.]]
property rank

Returns the rank of this worker node.

Returns

rank – The rank of this node, which is in range [0, num_workers())

Return type

int

save_optimizer_states(fname, dump_optimizer=False)[source]

Saves the optimizer (updater) state to a file. This is often used when checkpointing the model during training.

Parameters
  • fname (str) – Path to the output states file.

  • dump_optimizer (bool, default False) – Whether to also save the optimizer itself. This would also save optimizer information such as learning rate and weight decay schedules.

set_optimizer(optimizer)[source]

Registers an optimizer with the kvstore.

When using a single machine, this function updates the local optimizer. If using multiple machines and this operation is invoked from a worker node, it will serialized the optimizer with pickle and send it to all servers. The function returns after all servers have been updated.

Parameters

optimizer (KVStoreBase) – The new optimizer for the store

property type

Returns the type of this kvstore backend.

Returns

type – the string type

Return type

str