Run on Amazon SageMaker

This chapter will give a high level overview about running MXNet on Amazon SageMaker, in-depth tutorials can be found on the Sagemaker website.

SageMaker offers Jupyter notebooks and supports MXNet out-of-the box. You can run your notebooks on CPU instances and as such profit from the free tier. However, more powerful CPU instances or GPU instances are charged by time. Within this notebook you can fetch, explore and prepare training data.

With your own data on the notebook instance, you can easily launch training via the SageMaker SDK. So there is no need to manually configure and log into EC2 instances. You can either bring your own model or use SageMaker’s built-in algorithms that are tailored to specific use cases such as computer vision, NLP etc. SageMaker encapsulates the process of training into the class Estimator and we can now start the training on the local notebook instance:

from sagemaker.mxnet import MXNet as MXNetEstimator
estimator = MXNetEstimator(entry_point='train.py',
                           role=sagemaker.get_execution_role(),
                           train_instance_count=1,
                           train_instance_type='local',
                           hyperparameters={'batch_size': 1024,
                                            'epochs': 30})
estimator.fit(inputs)

If you require a more powerful platform for training, then you only need to change the train_instance_type. Once you call fit, SageMaker will automatically create the required EC2 instances, train your model within a Docker container and then immediately shutdown these instances. Fit() requires an entry point (here train.py) that describes the model and training loop. This script needs to provides certain functions, that will be automatically called by SageMaker once you train and deploy the model. More information about the entry point script can be found here. When the model is ready for deployment you can use SageMaker’s hosting services that create an HTTPS endpoint where model inference is provided.

predictor = estimator.deploy(initial_instance_count=1,
                             instance_type='ml.m4.xlarge')

The following links show more advanced uses cases in SageMaker: - Distributed training on multiple machines - Hyperparameter Tuning Jobs - Optimize a model with SageMaker Neo - Build Groundtruth Datasets - Getting started with SageMaker