mxnet.kvstore.Horovod¶

class Horovod[source]¶

Bases: mxnet.kvstore.base.KVStoreBase

A communication backend using Horovod.

__init__()[source]¶: Initialize self. See help(type(self)) for accurate signature.

Methods

`__init__`()	Initialize self.
`broadcast`(key, value, out[, priority])	Broadcast the value NDArray at rank 0 to all ranks
`is_capable`(capability)	Queries if the KVStore type supports certain capability, such as optimizer algorithm, gradient compression, sparsity, etc.
`load_optimizer_states`(fname)	Loads the optimizer (updater) state from the file.
`pushpull`(key, value[, out, priority])	Performs allreduce on a single tensor or a list of tensor objects
`register`(klass)	Registers a new KVStore.
`save_optimizer_states`(fname[, dump_optimizer])	Saves the optimizer (updater) state to a file.
`set_optimizer`(optimizer)	Registers an optimizer with the kvstore.

Attributes

`OPTIMIZER`
`kv_registry`
`local_rank`
`num_workers`	Returns the number of worker nodes.
`rank`	Returns the rank of this worker node.
`type`	Returns the type of this kvstore backend.

broadcast(key, value, out, priority=0)[source]¶

Broadcast the value NDArray at rank 0 to all ranks

Parameters

key (str, or int) – The key is used to name the tensor for allreduce. Its usage is different from that of parameter servers.
value (NDArray) – The tensor that is to be broadcasted.
out (NDArray, list of NDArray) – Output tensor that receives value broadcasted from root process
priority (int, optional) – The priority of the operation. Higher priority operations are likely to be executed before other actions.

Examples

>>> a = mx.nd.ones(shape)
>>> b = mx.nd.zeros(shape)
>>> kv.broadcast('2', value=a, out=b)
>>> print(b.asnumpy)
[[ 1.  1.  1.]
[ 1.  1.  1.]]

static is_capable(capability)[source]¶

Queries if the KVStore type supports certain capability, such as optimizer algorithm, gradient compression, sparsity, etc.

Parameters: capability (str) – The capability to query
Returns: result – Whether the capability is supported or not.
Return type: bool

load_optimizer_states(fname)[source]¶

Loads the optimizer (updater) state from the file.

Parameters: fname (str) – Path to input states file.

property num_workers¶

Returns the number of worker nodes.

Returns: size – The number of worker nodes.
Return type: int

pushpull(key, value, out=None, priority=0)[source]¶

Performs allreduce on a single tensor or a list of tensor objects

This function performs in-place summation of the input tensor over all the processes.

The name pushpull is a generic term. In Horovod, its action is implemented via ring allreduce. Each operation is identified by the ‘key’; if key is not provided, an incremented auto-generated name is used. The tensor type and shape must be the same on all processes for a given name. The reduction will not start until all processes are ready to send and receive the tensor.

Parameters

key (str, int, or sequence of str or int) – Keys used to uniquely tag an operation.
value (NDArray) – Tensor value on one process to be summed. If out is not specified, the value will be modified in-place
out (NDArray) – Output tensor after allreduce. If not specified, the input tensor value will be modified in-place.
priority (int, optional) – The priority of the operation. Higher priority operations are likely to be executed before other actions.

Examples

>>> # perform in-place allreduce on tensor a
>>> shape = (2, 3)
>>> nworker = kv.num_workers # assume there are 8 processes
>>> a = mx.nd.ones(shape)
>>> kv.pushpull('1', a)
>>> print(a.asnumpy())
[[ 8.  8.  8.]
[ 8.  8.  8.]]

>>> # perform allreduce on tensor a and output to b
>>> a = mx.nd.ones(shape)
>>> kv.pushpull('2', a, out=b)
>>> print(b.asnumpy())
[[ 8.  8.  8.]
[ 8.  8.  8.]]

property rank¶

Returns the rank of this worker node.

Returns: rank – The rank of this node, which is in range [0, num_workers())
Return type: int

save_optimizer_states(fname, dump_optimizer=False)[source]¶

Saves the optimizer (updater) state to a file. This is often used when checkpointing the model during training.

Parameters

fname (str) – Path to the output states file.
dump_optimizer (bool, default False) – Whether to also save the optimizer itself. This would also save optimizer information such as learning rate and weight decay schedules.

set_optimizer(optimizer)[source]¶

Registers an optimizer with the kvstore.

When using a single machine, this function updates the local optimizer. If using multiple machines and this operation is invoked from a worker node, it will serialized the optimizer with pickle and send it to all servers. The function returns after all servers have been updated.

Parameters: optimizer (KVStoreBase) – The new optimizer for the store

property type¶

Returns the type of this kvstore backend.

Returns: type – the string type
Return type: str

Did this page help you?

Yes

No

Thanks for your feedback!