vision.transforms

Gluon provides pre-defined vision transformation and data augmentation functions in the mxnet.gluon.data.vision.transforms module.

transforms.Compose

Sequentially composes multiple transforms.

transforms.Cast

Cast input to a specific data type

transforms.ToTensor

Converts an image NDArray or batch of image NDArray to a tensor NDArray.

transforms.Normalize

Normalize an tensor of shape (C x H x W) or (N x C x H x W) with mean and standard deviation.

transforms.RandomResizedCrop

Crop the input image with random scale and aspect ratio.

transforms.CenterCrop

Crops the image src to the given size by trimming on all four sides and preserving the center of the image.

transforms.Resize

Resize an image or a batch of image NDArray to the given size.

transforms.RandomFlipLeftRight

Randomly flip the input image left to right with a probability of 0.5.

transforms.RandomFlipTopBottom

Randomly flip the input image top to bottom with a probability of 0.5.

transforms.RandomBrightness

Randomly jitters image brightness with a factor chosen from [max(0, 1 - brightness), 1 + brightness].

transforms.RandomContrast

Randomly jitters image contrast with a factor chosen from [max(0, 1 - contrast), 1 + contrast].

transforms.RandomSaturation

Randomly jitters image saturation with a factor chosen from [max(0, 1 - saturation), 1 + saturation].

transforms.RandomHue

Randomly jitters image hue with a factor chosen from [max(0, 1 - hue), 1 + hue].

transforms.RandomColorJitter

Randomly jitters the brightness, contrast, saturation, and hue of an image.

transforms.RandomLighting

Add AlexNet-style PCA-based noise to an image.

API Reference

Image transforms.

class mxnet.gluon.data.vision.transforms.Cast(dtype='float32')[source]

Bases: mxnet.gluon.block.HybridBlock

Cast input to a specific data type

Parameters

dtype (str, default 'float32') – The target data type, in string or numpy.dtype.

Inputs:
  • data: input tensor with arbitrary shape and dtype.

Outputs:
  • out: output tensor with the same shape as data and data type as dtype.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class mxnet.gluon.data.vision.transforms.CenterCrop(size, interpolation=1)[source]

Bases: mxnet.gluon.block.Block

Crops the image src to the given size by trimming on all four sides and preserving the center of the image. Upsamples if src is smaller than size.

Parameters
  • size (int or tuple of (W, H)) – Size of output image.

  • interpolation (int) – Interpolation method for resizing. By default uses bilinear interpolation. See OpenCV’s resize function for available choices.

Inputs:
  • data: input tensor with (Hi x Wi x C) shape.

Outputs:
  • out: output tensor with (H x W x C) shape.

Examples

>>> transformer = vision.transforms.CenterCrop(size=(1000, 500))
>>> image = mx.nd.random.uniform(0, 255, (2321, 3482, 3)).astype(dtype=np.uint8)
>>> transformer(image)
<NDArray 500x1000x3 @cpu(0)>
forward(x)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

Parameters

*args (list of NDArray) – Input tensors.

class mxnet.gluon.data.vision.transforms.Compose(transforms)[source]

Bases: mxnet.gluon.nn.basic_layers.Sequential

Sequentially composes multiple transforms.

Parameters

transforms (list of transform Blocks.) – The list of transforms to be composed.

Inputs:
  • data: input tensor with shape of the first transform Block requires.

Outputs:
  • out: output tensor with shape of the last transform Block produces.

Examples

>>> transformer = transforms.Compose([transforms.Resize(300),
...                                   transforms.CenterCrop(256),
...                                   transforms.ToTensor()])
>>> image = mx.nd.random.uniform(0, 255, (224, 224, 3)).astype(dtype=np.uint8)
>>> transformer(image)
<NDArray 3x256x256 @cpu(0)>
class mxnet.gluon.data.vision.transforms.CropResize(x, y, width, height, size=None, interpolation=None)[source]

Bases: mxnet.gluon.block.HybridBlock

Crop the input image with and optionally resize it.

Makes a crop of the original image then optionally resize it to the specified size.

Parameters
  • x (int) – Left boundary of the cropping area

  • y (int) – Top boundary of the cropping area

  • w (int) – Width of the cropping area

  • h (int) – Height of the cropping area

  • size (int or tuple of (w, h)) – Optional, resize to new size after cropping

  • interpolation (int, optional) – Interpolation method for resizing. By default uses bilinear interpolation. See OpenCV’s resize function for available choices. https://docs.opencv.org/2.4/modules/imgproc/doc/geometric_transformations.html?highlight=resize#resize Note that the Resize on gpu use contrib.bilinearResize2D operator which only support bilinear interpolation(1).

Inputs:
  • data: input tensor with (H x W x C) or (N x H x W x C) shape.

Outputs:
  • out: input tensor with (H x W x C) or (N x H x W x C) shape.

Examples

>>> transformer = vision.transforms.CropResize(x=0, y=0, width=100, height=100)
>>> image = mx.nd.random.uniform(0, 255, (224, 224, 3)).astype(dtype=np.uint8)
>>> transformer(image)
<NDArray 100x100x3 @cpu(0)>
>>> image = mx.nd.random.uniform(0, 255, (3, 224, 224, 3)).astype(dtype=np.uint8)
>>> transformer(image)
<NDArray 3x100x100x3 @cpu(0)>
>>> transformer = vision.transforms.CropResize(x=0, y=0, width=100, height=100, size=(50, 50), interpolation=1)
>>> transformer(image)
<NDArray 3x50x50 @cpu(0)>
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class mxnet.gluon.data.vision.transforms.Normalize(mean=0.0, std=1.0)[source]

Bases: mxnet.gluon.block.HybridBlock

Normalize an tensor of shape (C x H x W) or (N x C x H x W) with mean and standard deviation.

Given mean (m1, …, mn) and std (s1, …, sn) for n channels, this transform normalizes each channel of the input tensor with:

output[i] = (input[i] - mi) / si

If mean or std is scalar, the same value will be applied to all channels.

Parameters
  • mean (float or tuple of floats) – The mean values.

  • std (float or tuple of floats) – The standard deviation values.

Inputs:
  • data: input tensor with (C x H x W) or (N x C x H x W) shape.

Outputs:
  • out: output tensor with the shape as data.

Examples

>>> transformer = transforms.Normalize(mean=(0, 1, 2), std=(3, 2, 1))
>>> image = mx.nd.random.uniform(0, 1, (3, 4, 2))
>>> transformer(image)
[[[ 0.18293785  0.19761486]
  [ 0.23839645  0.28142193]
  [ 0.20092112  0.28598186]
  [ 0.18162774  0.28241724]]
 [[-0.2881726  -0.18821815]
  [-0.17705294 -0.30780914]
  [-0.2812064  -0.3512327 ]
  [-0.05411351 -0.4716435 ]]
 [[-1.0363373  -1.7273437 ]
  [-1.6165586  -1.5223348 ]
  [-1.208275   -1.1878313 ]
  [-1.4711051  -1.5200229 ]]]
<NDArray 3x4x2 @cpu(0)>
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class mxnet.gluon.data.vision.transforms.RandomApply(transforms, p=0.5)[source]

Bases: mxnet.gluon.nn.basic_layers.Sequential

Apply a list of transformations randomly given probability

Parameters
  • transforms – List of transformations.

  • p (float) – Probability of applying the transformations.

Inputs:
  • data: input tensor.

Outputs:
  • out: transformed image.

forward(x)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

Parameters

*args (list of NDArray) – Input tensors.

class mxnet.gluon.data.vision.transforms.RandomBrightness(brightness)[source]

Bases: mxnet.gluon.block.HybridBlock

Randomly jitters image brightness with a factor chosen from [max(0, 1 - brightness), 1 + brightness].

Parameters

brightness (float) – How much to jitter brightness. brightness factor is randomly chosen from [max(0, 1 - brightness), 1 + brightness].

Inputs:
  • data: input tensor with (H x W x C) shape.

Outputs:
  • out: output tensor with same shape as data.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class mxnet.gluon.data.vision.transforms.RandomColorJitter(brightness=0, contrast=0, saturation=0, hue=0)[source]

Bases: mxnet.gluon.block.HybridBlock

Randomly jitters the brightness, contrast, saturation, and hue of an image.

Parameters
  • brightness (float) – How much to jitter brightness. brightness factor is randomly chosen from [max(0, 1 - brightness), 1 + brightness].

  • contrast (float) – How much to jitter contrast. contrast factor is randomly chosen from [max(0, 1 - contrast), 1 + contrast].

  • saturation (float) – How much to jitter saturation. saturation factor is randomly chosen from [max(0, 1 - saturation), 1 + saturation].

  • hue (float) – How much to jitter hue. hue factor is randomly chosen from [max(0, 1 - hue), 1 + hue].

Inputs:
  • data: input tensor with (H x W x C) shape.

Outputs:
  • out: output tensor with same shape as data.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class mxnet.gluon.data.vision.transforms.RandomContrast(contrast)[source]

Bases: mxnet.gluon.block.HybridBlock

Randomly jitters image contrast with a factor chosen from [max(0, 1 - contrast), 1 + contrast].

Parameters

contrast (float) – How much to jitter contrast. contrast factor is randomly chosen from [max(0, 1 - contrast), 1 + contrast].

Inputs:
  • data: input tensor with (H x W x C) shape.

Outputs:
  • out: output tensor with same shape as data.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class mxnet.gluon.data.vision.transforms.RandomFlipLeftRight[source]

Bases: mxnet.gluon.block.HybridBlock

Randomly flip the input image left to right with a probability of 0.5.

Inputs:
  • data: input tensor with (H x W x C) shape.

Outputs:
  • out: output tensor with same shape as data.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class mxnet.gluon.data.vision.transforms.RandomFlipTopBottom[source]

Bases: mxnet.gluon.block.HybridBlock

Randomly flip the input image top to bottom with a probability of 0.5.

Inputs:
  • data: input tensor with (H x W x C) shape.

Outputs:
  • out: output tensor with same shape as data.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class mxnet.gluon.data.vision.transforms.RandomHue(hue)[source]

Bases: mxnet.gluon.block.HybridBlock

Randomly jitters image hue with a factor chosen from [max(0, 1 - hue), 1 + hue].

Parameters

hue (float) – How much to jitter hue. hue factor is randomly chosen from [max(0, 1 - hue), 1 + hue].

Inputs:
  • data: input tensor with (H x W x C) shape.

Outputs:
  • out: output tensor with same shape as data.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class mxnet.gluon.data.vision.transforms.RandomLighting(alpha)[source]

Bases: mxnet.gluon.block.HybridBlock

Add AlexNet-style PCA-based noise to an image.

Parameters

alpha (float) – Intensity of the image.

Inputs:
  • data: input tensor with (H x W x C) shape.

Outputs:
  • out: output tensor with same shape as data.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class mxnet.gluon.data.vision.transforms.RandomResizedCrop(size, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=1)[source]

Bases: mxnet.gluon.block.Block

Crop the input image with random scale and aspect ratio.

Makes a crop of the original image with random size (default: 0.08 to 1.0 of the original image size) and random aspect ratio (default: 3/4 to 4/3), then resize it to the specified size.

Parameters
  • size (int or tuple of (W, H)) – Size of the final output.

  • scale (tuple of two floats) – If scale is (min_area, max_area), the cropped image’s area will range from min_area to max_area of the original image’s area

  • ratio (tuple of two floats) – Range of aspect ratio of the cropped image before resizing.

  • interpolation (int) – Interpolation method for resizing. By default uses bilinear interpolation. See OpenCV’s resize function for available choices.

Inputs:
  • data: input tensor with (Hi x Wi x C) shape.

Outputs:
  • out: output tensor with (H x W x C) shape.

forward(x)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

Parameters

*args (list of NDArray) – Input tensors.

class mxnet.gluon.data.vision.transforms.RandomRotation(angle_limits, zoom_in=False, zoom_out=False, rotate_with_proba=1.0)[source]

Bases: mxnet.gluon.block.Block

Random rotate the input image by a random angle.

Keeps the original image shape and aspect ratio.

Parameters
  • angle_limits (tuple) – Tuple of 2 elements containing the upper and lower limit for rotation angles in degree.

  • zoom_in (bool) – Zoom in image so that no padding is present in final output.

  • zoom_out (bool) – Zoom out image so that the entire original image is present in final output.

  • rotate_with_proba (float32) –

Inputs:
  • data: input tensor with (C x H x W) or (N x C x H x W) shape.

Outputs:
  • out: output tensor with (C x H x W) or (N x C x H x W) shape.

forward(x)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

Parameters

*args (list of NDArray) – Input tensors.

class mxnet.gluon.data.vision.transforms.RandomSaturation(saturation)[source]

Bases: mxnet.gluon.block.HybridBlock

Randomly jitters image saturation with a factor chosen from [max(0, 1 - saturation), 1 + saturation].

Parameters

saturation (float) – How much to jitter saturation. saturation factor is randomly chosen from [max(0, 1 - saturation), 1 + saturation].

Inputs:
  • data: input tensor with (H x W x C) shape.

Outputs:
  • out: output tensor with same shape as data.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class mxnet.gluon.data.vision.transforms.Resize(size, keep_ratio=False, interpolation=1)[source]

Bases: mxnet.gluon.block.HybridBlock

Resize an image or a batch of image NDArray to the given size. Should be applied before mxnet.gluon.data.vision.transforms.ToTensor.

Parameters
  • size (int or tuple of (W, H)) – Size of output image.

  • keep_ratio (bool) – Whether to resize the short edge or both edges to size, if size is give as an integer.

  • interpolation (int) – Interpolation method for resizing. By default uses bilinear interpolation. See OpenCV’s resize function for available choices. Note that the Resize on gpu use contrib.bilinearResize2D operator which only support bilinear interpolation(1).

Inputs:
  • data: input tensor with (H x W x C) or (N x H x W x C) shape.

Outputs:
  • out: output tensor with (H x W x C) or (N x H x W x C) shape.

Examples

>>> transformer = vision.transforms.Resize(size=(1000, 500))
>>> image = mx.nd.random.uniform(0, 255, (224, 224, 3)).astype(dtype=np.uint8)
>>> transformer(image)
<NDArray 500x1000x3 @cpu(0)>
>>> image = mx.nd.random.uniform(0, 255, (3, 224, 224, 3)).astype(dtype=np.uint8)
>>> transformer(image)
<NDArray 3x500x1000x3 @cpu(0)>
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class mxnet.gluon.data.vision.transforms.Rotate(rotation_degrees, zoom_in=False, zoom_out=False)[source]

Bases: mxnet.gluon.block.Block

Rotate the input image by a given angle. Keeps the original image shape.

Parameters
  • rotation_degrees (float32) – Desired rotation angle in degrees.

  • zoom_in (bool) – Zoom in image so that no padding is present in final output.

  • zoom_out (bool) – Zoom out image so that the entire original image is present in final output.

Inputs:
  • data: input tensor with (C x H x W) or (N x C x H x W) shape.

Outputs:
  • out: output tensor with (C x H x W) or (N x C x H x W) shape.

forward(x)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

Parameters

*args (list of NDArray) – Input tensors.

class mxnet.gluon.data.vision.transforms.ToTensor[source]

Bases: mxnet.gluon.block.HybridBlock

Converts an image NDArray or batch of image NDArray to a tensor NDArray.

Converts an image NDArray of shape (H x W x C) in the range [0, 255] to a float32 tensor NDArray of shape (C x H x W) in the range [0, 1].

If batch input, converts a batch image NDArray of shape (N x H x W x C) in the range [0, 255] to a float32 tensor NDArray of shape (N x C x H x W).

Inputs:
  • data: input tensor with (H x W x C) or (N x H x W x C) shape and uint8 type.

Outputs:
  • out: output tensor with (C x H x W) or (N x C x H x W) shape and float32 type.

Examples

>>> transformer = vision.transforms.ToTensor()
>>> image = mx.nd.random.uniform(0, 255, (4, 2, 3)).astype(dtype=np.uint8)
>>> transformer(image)
[[[ 0.85490197  0.72156864]
  [ 0.09019608  0.74117649]
  [ 0.61960787  0.92941177]
  [ 0.96470588  0.1882353 ]]
 [[ 0.6156863   0.73725492]
  [ 0.46666667  0.98039216]
  [ 0.44705883  0.45490196]
  [ 0.01960784  0.8509804 ]]
 [[ 0.39607844  0.03137255]
  [ 0.72156864  0.52941179]
  [ 0.16470589  0.7647059 ]
  [ 0.05490196  0.70588237]]]
<NDArray 3x4x2 @cpu(0)>
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.