Performance

The following tutorials will help you learn how to tune MXNet or use tools that will improve training and inference performance.

Essential

Improving Performancehttps://mxnet.apache.org/api/faq/perf

How to get the best performance from MXNet.

Profilerbackend/profiler.html

How to profile MXNet models.

Tuning NumPy Operationshttps://mxnet.apache.org/versions/master/tutorials/gluon/gotchas_numpy_in_mxnet.html

Gotchas using NumPy in MXNet.

Compression

Compression: float16compression/float16.html

How to use float16 in your model to boost training speed.

Gradient Compressioncompression/gradient_compression.html

How to use gradient compression to reduce communication bandwidth and increase speed.

Accelerated Backend

TensorRTbackend/tensorRt.html

How to use NVIDIA’s TensorRT to boost inference performance.

Distributed Training

Distributed Training Using the KVStore APIhttps://mxnet.apache.org/versions/master/faq/distributed_training.html

How to use the KVStore API to use multiple GPUs when training a model.

Training with Multiple GPUs Using Model Parallelismhttps://mxnet.apache.org/versions/master/faq/model_parallel_lstm.html

An overview of using multiple GPUs when training an LSTM.

Data Parallelism in MXNethttps://mxnet.apache.org/versions/master/faq/multi_devices.html

An overview of distributed training strategies.

MXNet with Horovodhttps://github.com/apache/incubator-mxnet/tree/master/example/distributed_training-horovod

A set of example scripts demonstrating MNIST and ImageNet training with Horovod as the distributed training backend.