mxnet.kvstore.Horovod¶
-
class
Horovod
[source]¶ Bases:
mxnet.kvstore.base.KVStoreBase
A communication backend using Horovod.
Methods
__init__
()Initialize self.
broadcast
(key, value, out[, priority])Broadcast the value NDArray at rank 0 to all ranks
is_capable
(capability)Queries if the KVStore type supports certain capability, such as optimizer algorithm, gradient compression, sparsity, etc.
load_optimizer_states
(fname)Loads the optimizer (updater) state from the file.
pushpull
(key, value[, out, priority])Performs allreduce on a single tensor or a list of tensor objects
register
(klass)Registers a new KVStore.
save_optimizer_states
(fname[, dump_optimizer])Saves the optimizer (updater) state to a file.
set_optimizer
(optimizer)Registers an optimizer with the kvstore.
Attributes
OPTIMIZER
kv_registry
local_rank
Returns the number of worker nodes.
Returns the rank of this worker node.
Returns the type of this kvstore backend.
-
broadcast
(key, value, out, priority=0)[source]¶ Broadcast the value NDArray at rank 0 to all ranks
- Parameters
key (str, or int) – The key is used to name the tensor for allreduce. Its usage is different from that of parameter servers.
value (NDArray) – The tensor that is to be broadcasted.
out (NDArray, list of NDArray) – Output tensor that receives value broadcasted from root process
priority (int, optional) – The priority of the operation. Higher priority operations are likely to be executed before other actions.
Examples
>>> a = mx.nd.ones(shape) >>> b = mx.nd.zeros(shape) >>> kv.broadcast('2', value=a, out=b) >>> print(b.asnumpy) [[ 1. 1. 1.] [ 1. 1. 1.]]
-
static
is_capable
(capability)[source]¶ Queries if the KVStore type supports certain capability, such as optimizer algorithm, gradient compression, sparsity, etc.
- Parameters
capability (str) – The capability to query
- Returns
result – Whether the capability is supported or not.
- Return type
bool
-
load_optimizer_states
(fname)[source]¶ Loads the optimizer (updater) state from the file.
- Parameters
fname (str) – Path to input states file.
-
property
num_workers
¶ Returns the number of worker nodes.
- Returns
size – The number of worker nodes.
- Return type
int
-
pushpull
(key, value, out=None, priority=0)[source]¶ Performs allreduce on a single tensor or a list of tensor objects
This function performs in-place summation of the input tensor over all the processes.
The name pushpull is a generic term. In Horovod, its action is implemented via ring allreduce. Each operation is identified by the ‘key’; if key is not provided, an incremented auto-generated name is used. The tensor type and shape must be the same on all processes for a given name. The reduction will not start until all processes are ready to send and receive the tensor.
- Parameters
key (str, int, or sequence of str or int) – Keys used to uniquely tag an operation.
value (NDArray) – Tensor value on one process to be summed. If out is not specified, the value will be modified in-place
out (NDArray) – Output tensor after allreduce. If not specified, the input tensor value will be modified in-place.
priority (int, optional) – The priority of the operation. Higher priority operations are likely to be executed before other actions.
Examples
>>> # perform in-place allreduce on tensor a >>> shape = (2, 3) >>> nworker = kv.num_workers # assume there are 8 processes >>> a = mx.nd.ones(shape) >>> kv.pushpull('1', a) >>> print(a.asnumpy()) [[ 8. 8. 8.] [ 8. 8. 8.]]
>>> # perform allreduce on tensor a and output to b >>> a = mx.nd.ones(shape) >>> kv.pushpull('2', a, out=b) >>> print(b.asnumpy()) [[ 8. 8. 8.] [ 8. 8. 8.]]
-
property
rank
¶ Returns the rank of this worker node.
- Returns
rank – The rank of this node, which is in range [0, num_workers())
- Return type
int
-
save_optimizer_states
(fname, dump_optimizer=False)[source]¶ Saves the optimizer (updater) state to a file. This is often used when checkpointing the model during training.
- Parameters
fname (str) – Path to the output states file.
dump_optimizer (bool, default False) – Whether to also save the optimizer itself. This would also save optimizer information such as learning rate and weight decay schedules.
-
set_optimizer
(optimizer)[source]¶ Registers an optimizer with the kvstore.
When using a single machine, this function updates the local optimizer. If using multiple machines and this operation is invoked from a worker node, it will serialized the optimizer with pickle and send it to all servers. The function returns after all servers have been updated.
- Parameters
optimizer (KVStoreBase) – The new optimizer for the store
-
property
type
¶ Returns the type of this kvstore backend.
- Returns
type – the string type
- Return type
str
-