vision.transforms¶
Gluon provides pre-defined vision transformation and data augmentation functions in the mxnet.gluon.data.vision.transforms
module.
Sequentially composes multiple transforms. |
|
Cast input to a specific data type |
|
Converts an image NDArray or batch of image NDArray to a tensor NDArray. |
|
Normalize an tensor of shape (C x H x W) or (N x C x H x W) with mean and standard deviation. |
|
Crop the input image with random scale and aspect ratio. |
|
Crops the image src to the given size by trimming on all four sides and preserving the center of the image. |
|
Resize an image or a batch of image NDArray to the given size. |
|
Randomly flip the input image left to right with a probability of 0.5. |
|
Randomly flip the input image top to bottom with a probability of 0.5. |
|
Randomly jitters image brightness with a factor chosen from [max(0, 1 - brightness), 1 + brightness]. |
|
Randomly jitters image contrast with a factor chosen from [max(0, 1 - contrast), 1 + contrast]. |
|
Randomly jitters image saturation with a factor chosen from [max(0, 1 - saturation), 1 + saturation]. |
|
Randomly jitters image hue with a factor chosen from [max(0, 1 - hue), 1 + hue]. |
|
Randomly jitters the brightness, contrast, saturation, and hue of an image. |
|
Add AlexNet-style PCA-based noise to an image. |
API Reference¶
Image transforms.
-
class
mxnet.gluon.data.vision.transforms.
Cast
(dtype='float32')[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Cast input to a specific data type
- Parameters
dtype (str, default 'float32') – The target data type, in string or numpy.dtype.
- Inputs:
data: input tensor with arbitrary shape and dtype.
- Outputs:
out: output tensor with the same shape as data and data type as dtype.
-
class
mxnet.gluon.data.vision.transforms.
CenterCrop
(size, interpolation=1)[source]¶ Bases:
mxnet.gluon.block.Block
Crops the image src to the given size by trimming on all four sides and preserving the center of the image. Upsamples if src is smaller than size.
- Parameters
size (int or tuple of (W, H)) – Size of output image.
interpolation (int) – Interpolation method for resizing. By default uses bilinear interpolation. See OpenCV’s resize function for available choices.
- Inputs:
data: input tensor with (Hi x Wi x C) shape.
- Outputs:
out: output tensor with (H x W x C) shape.
Examples
>>> transformer = vision.transforms.CenterCrop(size=(1000, 500)) >>> image = mx.nd.random.uniform(0, 255, (2321, 3482, 3)).astype(dtype=np.uint8) >>> transformer(image) <NDArray 500x1000x3 @cpu(0)>
-
class
mxnet.gluon.data.vision.transforms.
Compose
(transforms)[source]¶ Bases:
mxnet.gluon.nn.basic_layers.Sequential
Sequentially composes multiple transforms.
- Parameters
transforms (list of transform Blocks.) – The list of transforms to be composed.
- Inputs:
data: input tensor with shape of the first transform Block requires.
- Outputs:
out: output tensor with shape of the last transform Block produces.
Examples
>>> transformer = transforms.Compose([transforms.Resize(300), ... transforms.CenterCrop(256), ... transforms.ToTensor()]) >>> image = mx.nd.random.uniform(0, 255, (224, 224, 3)).astype(dtype=np.uint8) >>> transformer(image) <NDArray 3x256x256 @cpu(0)>
-
class
mxnet.gluon.data.vision.transforms.
CropResize
(x, y, width, height, size=None, interpolation=None)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Crop the input image with and optionally resize it.
Makes a crop of the original image then optionally resize it to the specified size.
- Parameters
x (int) – Left boundary of the cropping area
y (int) – Top boundary of the cropping area
w (int) – Width of the cropping area
h (int) – Height of the cropping area
size (int or tuple of (w, h)) – Optional, resize to new size after cropping
interpolation (int, optional) – Interpolation method for resizing. By default uses bilinear interpolation. See OpenCV’s resize function for available choices. https://docs.opencv.org/2.4/modules/imgproc/doc/geometric_transformations.html?highlight=resize#resize Note that the Resize on gpu use contrib.bilinearResize2D operator which only support bilinear interpolation(1).
- Inputs:
data: input tensor with (H x W x C) or (N x H x W x C) shape.
- Outputs:
out: input tensor with (H x W x C) or (N x H x W x C) shape.
Examples
>>> transformer = vision.transforms.CropResize(x=0, y=0, width=100, height=100) >>> image = mx.nd.random.uniform(0, 255, (224, 224, 3)).astype(dtype=np.uint8) >>> transformer(image) <NDArray 100x100x3 @cpu(0)> >>> image = mx.nd.random.uniform(0, 255, (3, 224, 224, 3)).astype(dtype=np.uint8) >>> transformer(image) <NDArray 3x100x100x3 @cpu(0)> >>> transformer = vision.transforms.CropResize(x=0, y=0, width=100, height=100, size=(50, 50), interpolation=1) >>> transformer(image) <NDArray 3x50x50 @cpu(0)>
-
class
mxnet.gluon.data.vision.transforms.
Normalize
(mean=0.0, std=1.0)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Normalize an tensor of shape (C x H x W) or (N x C x H x W) with mean and standard deviation.
Given mean (m1, …, mn) and std (s1, …, sn) for n channels, this transform normalizes each channel of the input tensor with:
output[i] = (input[i] - mi) / si
If mean or std is scalar, the same value will be applied to all channels.
- Parameters
mean (float or tuple of floats) – The mean values.
std (float or tuple of floats) – The standard deviation values.
- Inputs:
data: input tensor with (C x H x W) or (N x C x H x W) shape.
- Outputs:
out: output tensor with the shape as data.
Examples
>>> transformer = transforms.Normalize(mean=(0, 1, 2), std=(3, 2, 1)) >>> image = mx.nd.random.uniform(0, 1, (3, 4, 2)) >>> transformer(image) [[[ 0.18293785 0.19761486] [ 0.23839645 0.28142193] [ 0.20092112 0.28598186] [ 0.18162774 0.28241724]] [[-0.2881726 -0.18821815] [-0.17705294 -0.30780914] [-0.2812064 -0.3512327 ] [-0.05411351 -0.4716435 ]] [[-1.0363373 -1.7273437 ] [-1.6165586 -1.5223348 ] [-1.208275 -1.1878313 ] [-1.4711051 -1.5200229 ]]] <NDArray 3x4x2 @cpu(0)>
-
class
mxnet.gluon.data.vision.transforms.
RandomApply
(transforms, p=0.5)[source]¶ Bases:
mxnet.gluon.nn.basic_layers.Sequential
Apply a list of transformations randomly given probability
- Parameters
transforms – List of transformations.
p (float) – Probability of applying the transformations.
- Inputs:
data: input tensor.
- Outputs:
out: transformed image.
-
class
mxnet.gluon.data.vision.transforms.
RandomBrightness
(brightness)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Randomly jitters image brightness with a factor chosen from [max(0, 1 - brightness), 1 + brightness].
- Parameters
brightness (float) – How much to jitter brightness. brightness factor is randomly chosen from [max(0, 1 - brightness), 1 + brightness].
- Inputs:
data: input tensor with (H x W x C) shape.
- Outputs:
out: output tensor with same shape as data.
-
class
mxnet.gluon.data.vision.transforms.
RandomColorJitter
(brightness=0, contrast=0, saturation=0, hue=0)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Randomly jitters the brightness, contrast, saturation, and hue of an image.
- Parameters
brightness (float) – How much to jitter brightness. brightness factor is randomly chosen from [max(0, 1 - brightness), 1 + brightness].
contrast (float) – How much to jitter contrast. contrast factor is randomly chosen from [max(0, 1 - contrast), 1 + contrast].
saturation (float) – How much to jitter saturation. saturation factor is randomly chosen from [max(0, 1 - saturation), 1 + saturation].
hue (float) – How much to jitter hue. hue factor is randomly chosen from [max(0, 1 - hue), 1 + hue].
- Inputs:
data: input tensor with (H x W x C) shape.
- Outputs:
out: output tensor with same shape as data.
-
class
mxnet.gluon.data.vision.transforms.
RandomContrast
(contrast)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Randomly jitters image contrast with a factor chosen from [max(0, 1 - contrast), 1 + contrast].
- Parameters
contrast (float) – How much to jitter contrast. contrast factor is randomly chosen from [max(0, 1 - contrast), 1 + contrast].
- Inputs:
data: input tensor with (H x W x C) shape.
- Outputs:
out: output tensor with same shape as data.
-
class
mxnet.gluon.data.vision.transforms.
RandomFlipLeftRight
[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Randomly flip the input image left to right with a probability of 0.5.
- Inputs:
data: input tensor with (H x W x C) shape.
- Outputs:
out: output tensor with same shape as data.
-
class
mxnet.gluon.data.vision.transforms.
RandomFlipTopBottom
[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Randomly flip the input image top to bottom with a probability of 0.5.
- Inputs:
data: input tensor with (H x W x C) shape.
- Outputs:
out: output tensor with same shape as data.
-
class
mxnet.gluon.data.vision.transforms.
RandomHue
(hue)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Randomly jitters image hue with a factor chosen from [max(0, 1 - hue), 1 + hue].
- Parameters
hue (float) – How much to jitter hue. hue factor is randomly chosen from [max(0, 1 - hue), 1 + hue].
- Inputs:
data: input tensor with (H x W x C) shape.
- Outputs:
out: output tensor with same shape as data.
-
class
mxnet.gluon.data.vision.transforms.
RandomLighting
(alpha)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Add AlexNet-style PCA-based noise to an image.
- Parameters
alpha (float) – Intensity of the image.
- Inputs:
data: input tensor with (H x W x C) shape.
- Outputs:
out: output tensor with same shape as data.
-
class
mxnet.gluon.data.vision.transforms.
RandomResizedCrop
(size, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=1)[source]¶ Bases:
mxnet.gluon.block.Block
Crop the input image with random scale and aspect ratio.
Makes a crop of the original image with random size (default: 0.08 to 1.0 of the original image size) and random aspect ratio (default: 3/4 to 4/3), then resize it to the specified size.
- Parameters
size (int or tuple of (W, H)) – Size of the final output.
scale (tuple of two floats) – If scale is (min_area, max_area), the cropped image’s area will range from min_area to max_area of the original image’s area
ratio (tuple of two floats) – Range of aspect ratio of the cropped image before resizing.
interpolation (int) – Interpolation method for resizing. By default uses bilinear interpolation. See OpenCV’s resize function for available choices.
- Inputs:
data: input tensor with (Hi x Wi x C) shape.
- Outputs:
out: output tensor with (H x W x C) shape.
-
class
mxnet.gluon.data.vision.transforms.
RandomRotation
(angle_limits, zoom_in=False, zoom_out=False, rotate_with_proba=1.0)[source]¶ Bases:
mxnet.gluon.block.Block
- Random rotate the input image by a random angle.
Keeps the original image shape and aspect ratio.
- Parameters
angle_limits (tuple) – Tuple of 2 elements containing the upper and lower limit for rotation angles in degree.
zoom_in (bool) – Zoom in image so that no padding is present in final output.
zoom_out (bool) – Zoom out image so that the entire original image is present in final output.
rotate_with_proba (float32) –
- Inputs:
data: input tensor with (C x H x W) or (N x C x H x W) shape.
- Outputs:
out: output tensor with (C x H x W) or (N x C x H x W) shape.
-
class
mxnet.gluon.data.vision.transforms.
RandomSaturation
(saturation)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Randomly jitters image saturation with a factor chosen from [max(0, 1 - saturation), 1 + saturation].
- Parameters
saturation (float) – How much to jitter saturation. saturation factor is randomly chosen from [max(0, 1 - saturation), 1 + saturation].
- Inputs:
data: input tensor with (H x W x C) shape.
- Outputs:
out: output tensor with same shape as data.
-
class
mxnet.gluon.data.vision.transforms.
Resize
(size, keep_ratio=False, interpolation=1)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Resize an image or a batch of image NDArray to the given size. Should be applied before mxnet.gluon.data.vision.transforms.ToTensor.
- Parameters
size (int or tuple of (W, H)) – Size of output image.
keep_ratio (bool) – Whether to resize the short edge or both edges to size, if size is give as an integer.
interpolation (int) – Interpolation method for resizing. By default uses bilinear interpolation. See OpenCV’s resize function for available choices. Note that the Resize on gpu use contrib.bilinearResize2D operator which only support bilinear interpolation(1).
- Inputs:
data: input tensor with (H x W x C) or (N x H x W x C) shape.
- Outputs:
out: output tensor with (H x W x C) or (N x H x W x C) shape.
Examples
>>> transformer = vision.transforms.Resize(size=(1000, 500)) >>> image = mx.nd.random.uniform(0, 255, (224, 224, 3)).astype(dtype=np.uint8) >>> transformer(image) <NDArray 500x1000x3 @cpu(0)> >>> image = mx.nd.random.uniform(0, 255, (3, 224, 224, 3)).astype(dtype=np.uint8) >>> transformer(image) <NDArray 3x500x1000x3 @cpu(0)>
-
class
mxnet.gluon.data.vision.transforms.
Rotate
(rotation_degrees, zoom_in=False, zoom_out=False)[source]¶ Bases:
mxnet.gluon.block.Block
Rotate the input image by a given angle. Keeps the original image shape.
- Parameters
rotation_degrees (float32) – Desired rotation angle in degrees.
zoom_in (bool) – Zoom in image so that no padding is present in final output.
zoom_out (bool) – Zoom out image so that the entire original image is present in final output.
- Inputs:
data: input tensor with (C x H x W) or (N x C x H x W) shape.
- Outputs:
out: output tensor with (C x H x W) or (N x C x H x W) shape.
-
class
mxnet.gluon.data.vision.transforms.
ToTensor
[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Converts an image NDArray or batch of image NDArray to a tensor NDArray.
Converts an image NDArray of shape (H x W x C) in the range [0, 255] to a float32 tensor NDArray of shape (C x H x W) in the range [0, 1].
If batch input, converts a batch image NDArray of shape (N x H x W x C) in the range [0, 255] to a float32 tensor NDArray of shape (N x C x H x W).
- Inputs:
data: input tensor with (H x W x C) or (N x H x W x C) shape and uint8 type.
- Outputs:
out: output tensor with (C x H x W) or (N x C x H x W) shape and float32 type.
Examples
>>> transformer = vision.transforms.ToTensor() >>> image = mx.nd.random.uniform(0, 255, (4, 2, 3)).astype(dtype=np.uint8) >>> transformer(image) [[[ 0.85490197 0.72156864] [ 0.09019608 0.74117649] [ 0.61960787 0.92941177] [ 0.96470588 0.1882353 ]] [[ 0.6156863 0.73725492] [ 0.46666667 0.98039216] [ 0.44705883 0.45490196] [ 0.01960784 0.8509804 ]] [[ 0.39607844 0.03137255] [ 0.72156864 0.52941179] [ 0.16470589 0.7647059 ] [ 0.05490196 0.70588237]]] <NDArray 3x4x2 @cpu(0)>