- A Beginner's Guide to Implementing Operators in MXNet Backend
- Convert from Caffe to MXNet
- MXNet on the Cloud
- Distributed Training in MXNet
- Environment Variables
- Using MXNet with Large Tensor Support
- Model Parallel
- Data Parallelism with Multiple CPU/GPUs on MXNet
- Create New Operators
- NNPACK for Multi-Core CPU Support in MXNet
- Some Tips for Improving MXNet Performance
- Create a Dataset Using RecordIO
- Use data from S3 for training
- MXNet Security Best Practices
- Deep Learning at the Edge
- Visualize Neural Networks
- Why MXNet came to be?
Use data from S3 for training
AWS S3 is a cloud-based object storage service that allows storage and retrieval of large amounts of data at a very low cost. This makes it an attractive option to store large training datasets. MXNet is deeply integrated with S3 for this purpose.
An S3 protocol URL (like
s3://bucket-name/training-data) can be provided as a parameter for any data iterator that takes a file path as input. For example,
data_iter = mx.io.ImageRecordIter( path_imgrec="s3://bucket-name/training-data/caltech_train.rec", data_shape=(3, 227, 227), batch_size=4, resize=256)
Following are detailed instructions on how to use data from S3 for training.
Step 1: Build MXNet with S3 integration enabled
Follow instructions here to install MXNet from source with the following additional steps to enable S3 integration.
libssl-devbefore building MXNet. These packages are required to read/write from AWS S3.
config.mkbefore building MXNet.
echo "USE_S3=1" >> config.mk
Step 2: Configure S3 authentication tokens
MXNet requires the S3 environment variables
AWS_SECRET_ACCESS_KEY to be set. Here are instructions to get the access keys from AWS console.
export AWS_ACCESS_KEY_ID=<your-access-key-id> AWS_SECRET_ACCESS_KEY=<your-secret-access-key>
Step 3: Upload data to S3
There are several ways to upload data to S3. One easy way is to use the AWS command line utility. For example, the following
sync command will recursively copy contents from a local directory to a directory in S3.
aws s3 sync ./training-data s3://bucket-name/training-data
Step 4: Train with data from S3
Once the data is in S3, it is very straightforward to use it from MXNet. Any data iterator that can read/write data from a local drive can also read/write data from S3.
Let's modify an existing example code in MXNet repository to read data from S3 instead of local disk.
mxnet/tests/python/train/test_conv.py trains a convolutional network using MNIST data from local disk. We'll do the following change to read the data from S3 instead.
~/mxnet$ sed -i -- 's/data\//s3:\/\/bucket-name\/training-data\//g' ./tests/python/train/test_conv.py ~/mxnet$ git diff ./tests/python/train/test_conv.py diff --git a/tests/python/train/test_conv.py b/tests/python/train/test_conv.py index 039790e..66a60ce 100644 --- a/tests/python/train/test_conv.py +++ b/tests/python/train/test_conv.py @@ -39,14 +39,14 @@ def get_iters(): batch_size = 100 train_dataiter = mx.io.MNISTIter( - image="data/train-images-idx3-ubyte", - label="data/train-labels-idx1-ubyte", + image="s3://bucket-name/training-data/train-images-idx3-ubyte", + label="s3://bucket-name/training-data/train-labels-idx1-ubyte", data_shape=(1, 28, 28), label_name='sm_label', batch_size=batch_size, shuffle=True, flat=False, silent=False, seed=10) val_dataiter = mx.io.MNISTIter( - image="data/t10k-images-idx3-ubyte", - label="data/t10k-labels-idx1-ubyte", + image="s3://bucket-name/training-data/t10k-images-idx3-ubyte", + label="s3://bucket-name/training-data/t10k-labels-idx1-ubyte", data_shape=(1, 28, 28), label_name='sm_label', batch_size=batch_size, shuffle=True, flat=False, silent=False)
After the above change
test_conv.py will fetch data from S3 instead of the local disk.
python ./tests/python/train/test_conv.py [21:59:19] src/io/s3_filesys.cc:878: No AWS Region set, using default region us-east-1 [21:59:21] src/io/iter_mnist.cc:94: MNISTIter: load 60000 images, shuffle=1, shape=(100,1,28,28) [21:59:21] src/io/iter_mnist.cc:94: MNISTIter: load 10000 images, shuffle=1, shape=(100,1,28,28) INFO:root:Start training with [cpu(0)] Start training with [cpu(0)] INFO:root:Epoch Resetting Data Iterator Epoch Resetting Data Iterator INFO:root:Epoch Time cost=11.277 Epoch Time cost=11.277 INFO:root:Epoch Validation-accuracy=0.955100 Epoch Validation-accuracy=0.955100 INFO:root:Finish fit... Finish fit... INFO:root:Finish predict... Finish predict... INFO:root:final accuracy = 0.955100 final accuracy = 0.955100