Gluon Data API

Overview

This document lists the data APIs in Gluon:

mxnet.gluon.data Dataset utilities.
mxnet.gluon.data.vision Vision utilities.

The Gluon Data API, defined in the gluon.data package, provides useful dataset loading and processing tools, as well as common public datasets.

In the rest of this document, we list routines provided by the gluon.data package.

Data

Dataset Abstract dataset class.
ArrayDataset A dataset that combines multiple dataset-like objects, e.g.
RecordFileDataset A dataset wrapping over a RecordIO (.rec) file.
Sampler Base class for samplers.
SequentialSampler Samples elements from [0, length) sequentially.
RandomSampler Samples elements from [0, length) randomly without replacement.
BatchSampler Wraps over another Sampler and return mini-batches of samples.
DataLoader Loads data from a dataset and returns mini-batches of data.

Vision

Vision Datasets

MNIST MNIST handwritten digits dataset from http://yann.lecun.com/exdb/mnist
FashionMNIST A dataset of Zalando’s article images consisting of fashion products,
CIFAR10 CIFAR10 image classification dataset from https://www.cs.toronto.edu/~kriz/cifar.html
CIFAR100 CIFAR100 image classification dataset from https://www.cs.toronto.edu/~kriz/cifar.html
ImageRecordDataset A dataset wrapping over a RecordIO file containing images.
ImageFolderDataset A dataset for loading image files stored in a folder structure.

Vision Transforms

Transforms can be used to augment input data during training. You can compose multiple transforms sequentially (taking note of which functions should be applied before and after ToTensor).

from mxnet.gluon.data.vision import MNIST, transforms
from mxnet import gluon
transform = transforms.Compose([
    transforms.Resize(300),
    transforms.RandomResizedCrop(224),
    transforms.RandomBrightness(0.1),
    transforms.ToTensor(),
    transforms.Normalize(0, 1)])
data = MNIST(train=True).transform_first(transform)
data_loader = gluon.data.DataLoader(data, batch_size=32, num_workers=1)
for data, label in data_loader:
    # do something with data and label
Compose Sequentially composes multiple transforms.
Cast Cast input to a specific data type
ToTensor Converts an image NDArray to a tensor NDArray.
Normalize Normalize an tensor of shape (C x H x W) with mean and standard deviation.
RandomResizedCrop Crop the input image with random scale and aspect ratio.
CenterCrop Crops the image src to the given size by trimming on all four sides and preserving the center of the image.
Resize Resize an image to the given size.
RandomFlipLeftRight Randomly flip the input image left to right with a probability of 0.5.
RandomFlipTopBottom Randomly flip the input image top to bottom with a probability of 0.5.
RandomBrightness Randomly jitters image brightness with a factor chosen from [max(0, 1 - brightness), 1 + brightness].
RandomContrast Randomly jitters image contrast with a factor chosen from [max(0, 1 - contrast), 1 + contrast].
RandomSaturation Randomly jitters image saturation with a factor chosen from [max(0, 1 - saturation), 1 + saturation].
RandomHue Randomly jitters image hue with a factor chosen from [max(0, 1 - hue), 1 + hue].
RandomColorJitter Randomly jitters the brightness, contrast, saturation, and hue of an image.
RandomLighting Add AlexNet-style PCA-based noise to an image.

API Reference

Dataset utilities.

class mxnet.gluon.data.ArrayDataset(*args)[source]

A dataset that combines multiple dataset-like objects, e.g. Datasets, lists, arrays, etc.

The i-th sample is defined as (x1[i], x2[i], ...).

Parameters:*args (one or more dataset-like objects) – The data arrays.
class mxnet.gluon.data.BatchSampler(sampler, batch_size, last_batch='keep')[source]

Wraps over another Sampler and return mini-batches of samples.

Parameters:
  • sampler (Sampler) – The source Sampler.
  • batch_size (int) – Size of mini-batch.
  • last_batch ({'keep', 'discard', 'rollover'}) –

    Specifies how the last batch is handled if batch_size does not evenly divide sequence length.

    If ‘keep’, the last batch will be returned directly, but will contain less element than batch_size requires.

    If ‘discard’, the last batch will be discarded.

    If ‘rollover’, the remaining elements will be rolled over to the next iteration.

Examples

>>> sampler = gluon.data.SequentialSampler(10)
>>> batch_sampler = gluon.data.BatchSampler(sampler, 3, 'keep')
>>> list(batch_sampler)
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
class mxnet.gluon.data.DataLoader(dataset, batch_size=None, shuffle=False, sampler=None, last_batch=None, batch_sampler=None, batchify_fn=None, num_workers=0, pin_memory=False, pin_device_id=0, prefetch=None, thread_pool=False)[source]

Loads data from a dataset and returns mini-batches of data.

Parameters:
  • dataset (Dataset) – Source dataset. Note that numpy and mxnet arrays can be directly used as a Dataset.
  • batch_size (int) – Size of mini-batch.
  • shuffle (bool) – Whether to shuffle the samples.
  • sampler (Sampler) – The sampler to use. Either specify sampler or shuffle, not both.
  • last_batch ({'keep', 'discard', 'rollover'}) –

    How to handle the last batch if batch_size does not evenly divide len(dataset).

    keep - A batch with less samples than previous batches is returned. discard - The last batch is discarded if its incomplete. rollover - The remaining samples are rolled over to the next epoch.

  • batch_sampler (Sampler) – A sampler that returns mini-batches. Do not specify batch_size, shuffle, sampler, and last_batch if batch_sampler is specified.
  • batchify_fn (callable) –

    Callback function to allow users to specify how to merge samples into a batch. Defaults to default_batchify_fn:

    def default_batchify_fn(data):
        if isinstance(data[0], nd.NDArray):
            return nd.stack(*data)
        elif isinstance(data[0], tuple):
            data = zip(*data)
            return [default_batchify_fn(i) for i in data]
        else:
            data = np.asarray(data)
            return nd.array(data, dtype=data.dtype)
    
  • num_workers (int, default 0) – The number of multiprocessing workers to use for data preprocessing.
  • pin_memory (boolean, default False) – If True, the dataloader will copy NDArrays into pinned memory before returning them. Copying from CPU pinned memory to GPU is faster than from normal CPU memory.
  • pin_device_id (int, default 0) – The device id to use for allocating pinned memory if pin_memory is True
  • prefetch (int, default is num_workers * 2) – The number of prefetching batches only works if num_workers > 0. If prefetch > 0, it allow worker process to prefetch certain batches before acquiring data from iterators. Note that using large prefetching batch will provide smoother bootstrapping performance, but will consume more shared_memory. Using smaller number may forfeit the purpose of using multiple worker processes, try reduce num_workers in this case. By default it defaults to num_workers * 2.
  • thread_pool (bool, default False) – If True, use threading pool instead of multiprocessing pool. Using threadpool can avoid shared memory usage. If DataLoader is more IO bounded or GIL is not a killing problem, threadpool version may achieve better performance than multiprocessing.
class mxnet.gluon.data.Dataset[source]

Abstract dataset class. All datasets should have this interface.

Subclasses need to override __getitem__, which returns the i-th element, and __len__, which returns the total number elements.

Note

An mxnet or numpy array can be directly used as a dataset.

transform(fn, lazy=True)[source]

Returns a new dataset with each sample transformed by the transformer function fn.

Parameters:
  • fn (callable) – A transformer function that takes a sample as input and returns the transformed sample.
  • lazy (bool, default True) – If False, transforms all samples at once. Otherwise, transforms each sample on demand. Note that if fn is stochastic, you must set lazy to True or you will get the same result on all epochs.
Returns:

The transformed dataset.

Return type:

Dataset

transform_first(fn, lazy=True)[source]

Returns a new dataset with the first element of each sample transformed by the transformer function fn.

This is useful, for example, when you only want to transform data while keeping label as is.

Parameters:
  • fn (callable) – A transformer function that takes the first elemtn of a sample as input and returns the transformed element.
  • lazy (bool, default True) – If False, transforms all samples at once. Otherwise, transforms each sample on demand. Note that if fn is stochastic, you must set lazy to True or you will get the same result on all epochs.
Returns:

The transformed dataset.

Return type:

Dataset

class mxnet.gluon.data.RandomSampler(length)[source]

Samples elements from [0, length) randomly without replacement.

Parameters:length (int) – Length of the sequence.
class mxnet.gluon.data.RecordFileDataset(filename)[source]

A dataset wrapping over a RecordIO (.rec) file.

Each sample is a string representing the raw content of an record.

Parameters:filename (str) – Path to rec file.
class mxnet.gluon.data.Sampler[source]

Base class for samplers.

All samplers should subclass Sampler and define __iter__ and __len__ methods.

class mxnet.gluon.data.SequentialSampler(length)[source]

Samples elements from [0, length) sequentially.

Parameters:length (int) – Length of the sequence.
class mxnet.gluon.data.SimpleDataset(data)[source]

Simple Dataset wrapper for lists and arrays.

Parameters:data (dataset-like object) – Any object that implements len() and [].

Vision utilities.

Dataset container.

class mxnet.gluon.data.vision.datasets.MNIST(root='/work/mxnet/datasets/mnist', train=True, transform=None)[source]

MNIST handwritten digits dataset from http://yann.lecun.com/exdb/mnist

Each sample is an image (in 3D NDArray) with shape (28, 28, 1).

Parameters:
  • root (str, default $MXNET_HOME/datasets/mnist) – Path to temp folder for storing data.
  • train (bool, default True) – Whether to load the training or testing set.
  • transform (function, default None) –

    A user defined callback that transforms each sample. For example:

    transform=lambda data, label: (data.astype(np.float32)/255, label)
    
class mxnet.gluon.data.vision.datasets.FashionMNIST(root='/work/mxnet/datasets/fashion-mnist', train=True, transform=None)[source]

A dataset of Zalando’s article images consisting of fashion products, a drop-in replacement of the original MNIST dataset from https://github.com/zalandoresearch/fashion-mnist

Each sample is an image (in 3D NDArray) with shape (28, 28, 1).

Parameters:
  • root (str, default $MXNET_HOME/datasets/fashion-mnist') – Path to temp folder for storing data.
  • train (bool, default True) – Whether to load the training or testing set.
  • transform (function, default None) –

    A user defined callback that transforms each sample. For example:

    transform=lambda data, label: (data.astype(np.float32)/255, label)
    
class mxnet.gluon.data.vision.datasets.CIFAR10(root='/work/mxnet/datasets/cifar10', train=True, transform=None)[source]

CIFAR10 image classification dataset from https://www.cs.toronto.edu/~kriz/cifar.html

Each sample is an image (in 3D NDArray) with shape (32, 32, 3).

Parameters:
  • root (str, default $MXNET_HOME/datasets/cifar10) – Path to temp folder for storing data.
  • train (bool, default True) – Whether to load the training or testing set.
  • transform (function, default None) –

    A user defined callback that transforms each sample. For example:

    transform=lambda data, label: (data.astype(np.float32)/255, label)
    
class mxnet.gluon.data.vision.datasets.CIFAR100(root='/work/mxnet/datasets/cifar100', fine_label=False, train=True, transform=None)[source]

CIFAR100 image classification dataset from https://www.cs.toronto.edu/~kriz/cifar.html

Each sample is an image (in 3D NDArray) with shape (32, 32, 3).

Parameters:
  • root (str, default $MXNET_HOME/datasets/cifar100) – Path to temp folder for storing data.
  • fine_label (bool, default False) – Whether to load the fine-grained (100 classes) or coarse-grained (20 super-classes) labels.
  • train (bool, default True) – Whether to load the training or testing set.
  • transform (function, default None) –

    A user defined callback that transforms each sample. For example:

    transform=lambda data, label: (data.astype(np.float32)/255, label)
    
class mxnet.gluon.data.vision.datasets.ImageRecordDataset(filename, flag=1, transform=None)[source]

A dataset wrapping over a RecordIO file containing images.

Each sample is an image and its corresponding label.

Parameters:
  • filename (str) – Path to rec file.
  • flag ({0, 1}, default 1) – If 0, always convert images to greyscale. If 1, always convert images to colored (RGB).
  • transform (function, default None) –

    A user defined callback that transforms each sample. For example:

    transform=lambda data, label: (data.astype(np.float32)/255, label)
    
class mxnet.gluon.data.vision.datasets.ImageFolderDataset(root, flag=1, transform=None)[source]

A dataset for loading image files stored in a folder structure.

like:

root/car/0001.jpg
root/car/xxxa.jpg
root/car/yyyb.jpg
root/bus/123.jpg
root/bus/023.jpg
root/bus/wwww.jpg
Parameters:
  • root (str) – Path to root directory.
  • flag ({0, 1}, default 1) – If 0, always convert loaded images to greyscale (1 channel). If 1, always convert loaded images to colored (3 channels).
  • transform (callable, default None) –

    A function that takes data and label and transforms them:

    transform = lambda data, label: (data.astype(np.float32)/255, label)
    
synsets

list – List of class names. synsets[i] is the name for the integer label i

items

list of tuples – List of all images in (filename, label) pairs.

Image transforms.

class mxnet.gluon.data.vision.transforms.Compose(transforms)[source]

Sequentially composes multiple transforms.

Parameters:transforms (list of transform Blocks.) – The list of transforms to be composed.
Inputs:
  • data: input tensor with shape of the first transform Block requires.
Outputs:
  • out: output tensor with shape of the last transform Block produces.

Examples

>>> transformer = transforms.Compose([transforms.Resize(300),
...                                   transforms.CenterCrop(256),
...                                   transforms.ToTensor()])
>>> image = mx.nd.random.uniform(0, 255, (224, 224, 3)).astype(dtype=np.uint8)
>>> transformer(image)

class mxnet.gluon.data.vision.transforms.Cast(dtype='float32')[source]

Cast input to a specific data type

Parameters:dtype (str, default 'float32') – The target data type, in string or numpy.dtype.
Inputs:
  • data: input tensor with arbitrary shape.
Outputs:
  • out: output tensor with the same shape as data.
class mxnet.gluon.data.vision.transforms.ToTensor[source]

Converts an image NDArray to a tensor NDArray.

Converts an image NDArray of shape (H x W x C) in the range [0, 255] to a float32 tensor NDArray of shape (C x H x W) in the range [0, 1).

Inputs:
  • data: input tensor with (H x W x C) shape and uint8 type.
Outputs:
  • out: output tensor with (C x H x W) shape and float32 type.

Examples

>>> transformer = vision.transforms.ToTensor()
>>> image = mx.nd.random.uniform(0, 255, (4, 2, 3)).astype(dtype=np.uint8)
>>> transformer(image)
[[[ 0.85490197  0.72156864]
  [ 0.09019608  0.74117649]
  [ 0.61960787  0.92941177]
  [ 0.96470588  0.1882353 ]]
 [[ 0.6156863   0.73725492]
  [ 0.46666667  0.98039216]
  [ 0.44705883  0.45490196]
  [ 0.01960784  0.8509804 ]]
 [[ 0.39607844  0.03137255]
  [ 0.72156864  0.52941179]
  [ 0.16470589  0.7647059 ]
  [ 0.05490196  0.70588237]]]

class mxnet.gluon.data.vision.transforms.Normalize(mean, std)[source]

Normalize an tensor of shape (C x H x W) with mean and standard deviation.

Given mean (m1, ..., mn) and std (s1, ..., sn) for n channels, this transform normalizes each channel of the input tensor with:

output[i] = (input[i] - mi) / si

If mean or std is scalar, the same value will be applied to all channels.

Parameters:
  • mean (float or tuple of floats) – The mean values.
  • std (float or tuple of floats) – The standard deviation values.
Inputs:
  • data: input tensor with (C x H x W) shape.
Outputs:
  • out: output tensor with the shape as data.
class mxnet.gluon.data.vision.transforms.RandomResizedCrop(size, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=1)[source]

Crop the input image with random scale and aspect ratio.

Makes a crop of the original image with random size (default: 0.08 to 1.0 of the original image size) and random aspect ratio (default: 3/4 to 4/3), then resize it to the specified size.

Parameters:
  • size (int or tuple of (W, H)) – Size of the final output.
  • scale (tuple of two floats) – If scale is (min_area, max_area), the cropped image’s area will range from min_area to max_area of the original image’s area
  • ratio (tuple of two floats) – Range of aspect ratio of the cropped image before resizing.
  • interpolation (int) – Interpolation method for resizing. By default uses bilinear interpolation. See OpenCV’s resize function for available choices.
Inputs:
  • data: input tensor with (Hi x Wi x C) shape.
Outputs:
  • out: output tensor with (H x W x C) shape.
class mxnet.gluon.data.vision.transforms.CenterCrop(size, interpolation=1)[source]

Crops the image src to the given size by trimming on all four sides and preserving the center of the image. Upsamples if src is smaller than size.

Parameters:
  • size (int or tuple of (W, H)) – Size of output image.
  • interpolation (int) – Interpolation method for resizing. By default uses bilinear interpolation. See OpenCV’s resize function for available choices.
Inputs:
  • data: input tensor with (Hi x Wi x C) shape.
Outputs:
  • out: output tensor with (H x W x C) shape.

Examples

>>> transformer = vision.transforms.CenterCrop(size=(1000, 500))
>>> image = mx.nd.random.uniform(0, 255, (2321, 3482, 3)).astype(dtype=np.uint8)
>>> transformer(image)

class mxnet.gluon.data.vision.transforms.Resize(size, keep_ratio=False, interpolation=1)[source]

Resize an image to the given size. Should be applied before mxnet.gluon.data.vision.transforms.ToTensor.

Parameters:
  • size (int or tuple of (W, H)) – Size of output image.
  • keep_ratio (bool) – Whether to resize the short edge or both edges to size, if size is give as an integer.
  • interpolation (int) – Interpolation method for resizing. By default uses bilinear interpolation. See OpenCV’s resize function for available choices.
Inputs:
  • data: input tensor with (Hi x Wi x C) shape.
Outputs:
  • out: output tensor with (H x W x C) shape.

Examples

>>> transformer = vision.transforms.Resize(size=(1000, 500))
>>> image = mx.nd.random.uniform(0, 255, (224, 224, 3)).astype(dtype=np.uint8)
>>> transformer(image)

class mxnet.gluon.data.vision.transforms.RandomFlipLeftRight[source]

Randomly flip the input image left to right with a probability of 0.5.

Inputs:
  • data: input tensor with (H x W x C) shape.
Outputs:
  • out: output tensor with same shape as data.
class mxnet.gluon.data.vision.transforms.RandomFlipTopBottom[source]

Randomly flip the input image top to bottom with a probability of 0.5.

Inputs:
  • data: input tensor with (H x W x C) shape.
Outputs:
  • out: output tensor with same shape as data.
class mxnet.gluon.data.vision.transforms.RandomBrightness(brightness)[source]

Randomly jitters image brightness with a factor chosen from [max(0, 1 - brightness), 1 + brightness].

Parameters:brightness (float) – How much to jitter brightness. brightness factor is randomly chosen from [max(0, 1 - brightness), 1 + brightness].
Inputs:
  • data: input tensor with (H x W x C) shape.
Outputs:
  • out: output tensor with same shape as data.
class mxnet.gluon.data.vision.transforms.RandomContrast(contrast)[source]

Randomly jitters image contrast with a factor chosen from [max(0, 1 - contrast), 1 + contrast].

Parameters:contrast (float) – How much to jitter contrast. contrast factor is randomly chosen from [max(0, 1 - contrast), 1 + contrast].
Inputs:
  • data: input tensor with (H x W x C) shape.
Outputs:
  • out: output tensor with same shape as data.
class mxnet.gluon.data.vision.transforms.RandomSaturation(saturation)[source]

Randomly jitters image saturation with a factor chosen from [max(0, 1 - saturation), 1 + saturation].

Parameters:saturation (float) – How much to jitter saturation. saturation factor is randomly chosen from [max(0, 1 - saturation), 1 + saturation].
Inputs:
  • data: input tensor with (H x W x C) shape.
Outputs:
  • out: output tensor with same shape as data.
class mxnet.gluon.data.vision.transforms.RandomHue(hue)[source]

Randomly jitters image hue with a factor chosen from [max(0, 1 - hue), 1 + hue].

Parameters:hue (float) – How much to jitter hue. hue factor is randomly chosen from [max(0, 1 - hue), 1 + hue].
Inputs:
  • data: input tensor with (H x W x C) shape.
Outputs:
  • out: output tensor with same shape as data.
class mxnet.gluon.data.vision.transforms.RandomColorJitter(brightness=0, contrast=0, saturation=0, hue=0)[source]

Randomly jitters the brightness, contrast, saturation, and hue of an image.

Parameters:
  • brightness (float) – How much to jitter brightness. brightness factor is randomly chosen from [max(0, 1 - brightness), 1 + brightness].
  • contrast (float) – How much to jitter contrast. contrast factor is randomly chosen from [max(0, 1 - contrast), 1 + contrast].
  • saturation (float) – How much to jitter saturation. saturation factor is randomly chosen from [max(0, 1 - saturation), 1 + saturation].
  • hue (float) – How much to jitter hue. hue factor is randomly chosen from [max(0, 1 - hue), 1 + hue].
Inputs:
  • data: input tensor with (H x W x C) shape.
Outputs:
  • out: output tensor with same shape as data.
class mxnet.gluon.data.vision.transforms.RandomLighting(alpha)[source]

Add AlexNet-style PCA-based noise to an image.

Parameters:alpha (float) – Intensity of the image.
Inputs:
  • data: input tensor with (H x W x C) shape.
Outputs:
  • out: output tensor with same shape as data.