vision.transforms¶
Gluon provides pre-defined vision transformation and data augmentation functions in the mxnet.gluon.data.vision.transforms
module.
Sequentially composes multiple transforms. |
|
Cast input to a specific data type |
|
Converts an image NDArray or batch of image NDArray to a tensor NDArray. |
|
Normalize an tensor of shape (C x H x W) or (N x C x H x W) with mean and standard deviation. |
|
Crop the input image with random scale and aspect ratio. |
|
Crops the image src to the given size by trimming on all four sides and preserving the center of the image. |
|
Resize an image or a batch of image NDArray to the given size. |
|
Randomly flip the input image left to right with a probability of 0.5. |
|
Randomly flip the input image top to bottom with a probability of 0.5. |
|
Randomly jitters image brightness with a factor chosen from [max(0, 1 - brightness), 1 + brightness]. |
|
Randomly jitters image contrast with a factor chosen from [max(0, 1 - contrast), 1 + contrast]. |
|
Randomly jitters image saturation with a factor chosen from [max(0, 1 - saturation), 1 + saturation]. |
|
Randomly jitters image hue with a factor chosen from [max(0, 1 - hue), 1 + hue]. |
|
Randomly jitters the brightness, contrast, saturation, and hue of an image. |
|
Add AlexNet-style PCA-based noise to an image. |
API Reference¶
Image transforms.
-
class
mxnet.gluon.data.vision.transforms.
Cast
(dtype='float32')[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Cast input to a specific data type
- Parameters
dtype (str, default 'float32') – The target data type, in string or numpy.dtype.
- Inputs:
data: input tensor with arbitrary shape and dtype.
- Outputs:
out: output tensor with the same shape as data and data type as dtype.
-
class
mxnet.gluon.data.vision.transforms.
CenterCrop
(size, interpolation=1)[source]¶ Bases:
mxnet.gluon.block.Block
Crops the image src to the given size by trimming on all four sides and preserving the center of the image. Upsamples if src is smaller than size.
- Parameters
size (int or tuple of (W, H)) – Size of output image.
interpolation (int) – Interpolation method for resizing. By default uses bilinear interpolation. See OpenCV’s resize function for available choices.
- Inputs:
data: input tensor with (Hi x Wi x C) shape.
- Outputs:
out: output tensor with (H x W x C) shape.
Examples
>>> transformer = vision.transforms.CenterCrop(size=(1000, 500)) >>> image = mx.nd.random.uniform(0, 255, (2321, 3482, 3)).astype(dtype=np.uint8) >>> transformer(image) <NDArray 500x1000x3 @cpu(0)>
-
class
mxnet.gluon.data.vision.transforms.
Compose
(transforms)[source]¶ Bases:
mxnet.gluon.nn.basic_layers.Sequential
Sequentially composes multiple transforms.
- Parameters
transforms (list of transform Blocks.) – The list of transforms to be composed.
- Inputs:
data: input tensor with shape of the first transform Block requires.
- Outputs:
out: output tensor with shape of the last transform Block produces.
Examples
>>> transformer = transforms.Compose([transforms.Resize(300), ... transforms.CenterCrop(256), ... transforms.ToTensor()]) >>> image = mx.nd.random.uniform(0, 255, (224, 224, 3)).astype(dtype=np.uint8) >>> transformer(image) <NDArray 3x256x256 @cpu(0)>
-
class
mxnet.gluon.data.vision.transforms.
CropResize
(x, y, width, height, size=None, interpolation=None)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Crop the input image with and optionally resize it.
Makes a crop of the original image then optionally resize it to the specified size.
- Parameters
x (int) – Left boundary of the cropping area
y (int) – Top boundary of the cropping area
w (int) – Width of the cropping area
h (int) – Height of the cropping area
size (int or tuple of (w, h)) – Optional, resize to new size after cropping
interpolation (int, optional) – Interpolation method for resizing. By default uses bilinear interpolation. See OpenCV’s resize function for available choices. https://docs.opencv.org/2.4/modules/imgproc/doc/geometric_transformations.html?highlight=resize#resize Note that the Resize on gpu use contrib.bilinearResize2D operator which only support bilinear interpolation(1). The result would be slightly different on gpu compared to cpu. OpenCV tend to align center while bilinearResize2D use algorithm which aligns corner.
- Inputs:
data: input tensor with (H x W x C) or (N x H x W x C) shape.
- Outputs:
out: input tensor with (H x W x C) or (N x H x W x C) shape.
Examples
>>> transformer = vision.transforms.CropResize(x=0, y=0, width=100, height=100) >>> image = mx.nd.random.uniform(0, 255, (224, 224, 3)).astype(dtype=np.uint8) >>> transformer(image) <NDArray 100x100x3 @cpu(0)> >>> image = mx.nd.random.uniform(0, 255, (3, 224, 224, 3)).astype(dtype=np.uint8) >>> transformer(image) <NDArray 3x100x100x3 @cpu(0)> >>> transformer = vision.transforms.CropResize(x=0, y=0, width=100, height=100, size=(50, 50), interpolation=1) >>> transformer(image) <NDArray 3x50x50 @cpu(0)>
-
class
mxnet.gluon.data.vision.transforms.
Normalize
(mean=0.0, std=1.0)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Normalize an tensor of shape (C x H x W) or (N x C x H x W) with mean and standard deviation.
Given mean (m1, …, mn) and std (s1, …, sn) for n channels, this transform normalizes each channel of the input tensor with:
output[i] = (input[i] - mi) / si
If mean or std is scalar, the same value will be applied to all channels.
- Parameters
mean (float or tuple of floats) – The mean values.
std (float or tuple of floats) – The standard deviation values.
- Inputs:
data: input tensor with (C x H x W) or (N x C x H x W) shape.
- Outputs:
out: output tensor with the shape as data.
Examples
>>> transformer = transforms.Normalize(mean=(0, 1, 2), std=(3, 2, 1)) >>> image = mx.nd.random.uniform(0, 1, (3, 4, 2)) >>> transformer(image) [[[ 0.18293785 0.19761486] [ 0.23839645 0.28142193] [ 0.20092112 0.28598186] [ 0.18162774 0.28241724]] [[-0.2881726 -0.18821815] [-0.17705294 -0.30780914] [-0.2812064 -0.3512327 ] [-0.05411351 -0.4716435 ]] [[-1.0363373 -1.7273437 ] [-1.6165586 -1.5223348 ] [-1.208275 -1.1878313 ] [-1.4711051 -1.5200229 ]]] <NDArray 3x4x2 @cpu(0)>
-
class
mxnet.gluon.data.vision.transforms.
RandomBrightness
(brightness)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Randomly jitters image brightness with a factor chosen from [max(0, 1 - brightness), 1 + brightness].
- Parameters
brightness (float) – How much to jitter brightness. brightness factor is randomly chosen from [max(0, 1 - brightness), 1 + brightness].
- Inputs:
data: input tensor with (H x W x C) shape.
- Outputs:
out: output tensor with same shape as data.
-
class
mxnet.gluon.data.vision.transforms.
RandomColorJitter
(brightness=0, contrast=0, saturation=0, hue=0)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Randomly jitters the brightness, contrast, saturation, and hue of an image.
- Parameters
brightness (float) – How much to jitter brightness. brightness factor is randomly chosen from [max(0, 1 - brightness), 1 + brightness].
contrast (float) – How much to jitter contrast. contrast factor is randomly chosen from [max(0, 1 - contrast), 1 + contrast].
saturation (float) – How much to jitter saturation. saturation factor is randomly chosen from [max(0, 1 - saturation), 1 + saturation].
hue (float) – How much to jitter hue. hue factor is randomly chosen from [max(0, 1 - hue), 1 + hue].
- Inputs:
data: input tensor with (H x W x C) shape.
- Outputs:
out: output tensor with same shape as data.
-
class
mxnet.gluon.data.vision.transforms.
RandomContrast
(contrast)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Randomly jitters image contrast with a factor chosen from [max(0, 1 - contrast), 1 + contrast].
- Parameters
contrast (float) – How much to jitter contrast. contrast factor is randomly chosen from [max(0, 1 - contrast), 1 + contrast].
- Inputs:
data: input tensor with (H x W x C) shape.
- Outputs:
out: output tensor with same shape as data.
-
class
mxnet.gluon.data.vision.transforms.
RandomFlipLeftRight
[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Randomly flip the input image left to right with a probability of 0.5.
- Inputs:
data: input tensor with (H x W x C) shape.
- Outputs:
out: output tensor with same shape as data.
-
class
mxnet.gluon.data.vision.transforms.
RandomFlipTopBottom
[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Randomly flip the input image top to bottom with a probability of 0.5.
- Inputs:
data: input tensor with (H x W x C) shape.
- Outputs:
out: output tensor with same shape as data.
-
class
mxnet.gluon.data.vision.transforms.
RandomHue
(hue)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Randomly jitters image hue with a factor chosen from [max(0, 1 - hue), 1 + hue].
- Parameters
hue (float) – How much to jitter hue. hue factor is randomly chosen from [max(0, 1 - hue), 1 + hue].
- Inputs:
data: input tensor with (H x W x C) shape.
- Outputs:
out: output tensor with same shape as data.
-
class
mxnet.gluon.data.vision.transforms.
RandomLighting
(alpha)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Add AlexNet-style PCA-based noise to an image.
- Parameters
alpha (float) – Intensity of the image.
- Inputs:
data: input tensor with (H x W x C) shape.
- Outputs:
out: output tensor with same shape as data.
-
class
mxnet.gluon.data.vision.transforms.
RandomResizedCrop
(size, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=1)[source]¶ Bases:
mxnet.gluon.block.Block
Crop the input image with random scale and aspect ratio.
Makes a crop of the original image with random size (default: 0.08 to 1.0 of the original image size) and random aspect ratio (default: 3/4 to 4/3), then resize it to the specified size.
- Parameters
size (int or tuple of (W, H)) – Size of the final output.
scale (tuple of two floats) – If scale is (min_area, max_area), the cropped image’s area will range from min_area to max_area of the original image’s area
ratio (tuple of two floats) – Range of aspect ratio of the cropped image before resizing.
interpolation (int) – Interpolation method for resizing. By default uses bilinear interpolation. See OpenCV’s resize function for available choices.
- Inputs:
data: input tensor with (Hi x Wi x C) shape.
- Outputs:
out: output tensor with (H x W x C) shape.
-
class
mxnet.gluon.data.vision.transforms.
RandomSaturation
(saturation)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Randomly jitters image saturation with a factor chosen from [max(0, 1 - saturation), 1 + saturation].
- Parameters
saturation (float) – How much to jitter saturation. saturation factor is randomly chosen from [max(0, 1 - saturation), 1 + saturation].
- Inputs:
data: input tensor with (H x W x C) shape.
- Outputs:
out: output tensor with same shape as data.
-
class
mxnet.gluon.data.vision.transforms.
Resize
(size, keep_ratio=False, interpolation=1)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Resize an image or a batch of image NDArray to the given size. Should be applied before mxnet.gluon.data.vision.transforms.ToTensor.
- Parameters
size (int or tuple of (W, H)) – Size of output image.
keep_ratio (bool) – Whether to resize the short edge or both edges to size, if size is give as an integer.
interpolation (int) – Interpolation method for resizing. By default uses bilinear interpolation. See OpenCV’s resize function for available choices. Note that the Resize on gpu use contrib.bilinearResize2D operator which only support bilinear interpolation(1). The result would be slightly different on gpu compared to cpu. OpenCV tend to align center while bilinearResize2D use algorithm which aligns corner.
- Inputs:
data: input tensor with (H x W x C) or (N x H x W x C) shape.
- Outputs:
out: output tensor with (H x W x C) or (N x H x W x C) shape.
Examples
>>> transformer = vision.transforms.Resize(size=(1000, 500)) >>> image = mx.nd.random.uniform(0, 255, (224, 224, 3)).astype(dtype=np.uint8) >>> transformer(image) <NDArray 500x1000x3 @cpu(0)> >>> image = mx.nd.random.uniform(0, 255, (3, 224, 224, 3)).astype(dtype=np.uint8) >>> transformer(image) <NDArray 3x500x1000x3 @cpu(0)>
-
class
mxnet.gluon.data.vision.transforms.
ToTensor
[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Converts an image NDArray or batch of image NDArray to a tensor NDArray.
Converts an image NDArray of shape (H x W x C) in the range [0, 255] to a float32 tensor NDArray of shape (C x H x W) in the range [0, 1].
If batch input, converts a batch image NDArray of shape (N x H x W x C) in the range [0, 255] to a float32 tensor NDArray of shape (N x C x H x W).
- Inputs:
data: input tensor with (H x W x C) or (N x H x W x C) shape and uint8 type.
- Outputs:
out: output tensor with (C x H x W) or (N x C x H x W) shape and float32 type.
Examples
>>> transformer = vision.transforms.ToTensor() >>> image = mx.nd.random.uniform(0, 255, (4, 2, 3)).astype(dtype=np.uint8) >>> transformer(image) [[[ 0.85490197 0.72156864] [ 0.09019608 0.74117649] [ 0.61960787 0.92941177] [ 0.96470588 0.1882353 ]] [[ 0.6156863 0.73725492] [ 0.46666667 0.98039216] [ 0.44705883 0.45490196] [ 0.01960784 0.8509804 ]] [[ 0.39607844 0.03137255] [ 0.72156864 0.52941179] [ 0.16470589 0.7647059 ] [ 0.05490196 0.70588237]]] <NDArray 3x4x2 @cpu(0)>