mxnet.image

Note

This API is best used in conjunction with mxnet.io data iterators. For augmentation and transforms in gluon with Datasets and DataLoaders see mxnet.gluon.data

Image Iterators and image augmentation functions

Classes

Augmenter(**kwargs)

Image Augmenter base class

BrightnessJitterAug(brightness)

Random brightness jitter augmentation.

CastAug([typ])

Cast to float32

CenterCropAug(size[, interp])

Make center crop augmenter.

ColorJitterAug(brightness, contrast, saturation)

Apply random brightness, contrast and saturation jitter in random order.

ColorNormalizeAug(mean, std)

Mean and std normalization.

ContrastJitterAug(contrast)

Random contrast jitter augmentation.

DetAugmenter(**kwargs)

Detection base augmenter

DetBorrowAug(augmenter)

Borrow standard augmenter from image classification.

DetHorizontalFlipAug(p)

Random horizontal flipping.

DetRandomCropAug([min_object_covered, …])

Random cropping with constraints

DetRandomPadAug([aspect_ratio_range, …])

Random padding augmenter.

DetRandomSelectAug(aug_list[, skip_prob])

Randomly select one augmenter to apply, with chance to skip all.

ForceResizeAug(size[, interp])

Force resize to size regardless of aspect ratio

HorizontalFlipAug(p)

Random horizontal flip.

HueJitterAug(hue)

Random hue jitter augmentation.

ImageDetIter(batch_size, data_shape[, …])

Image iterator with a large number of augmentation choices for detection.

ImageIter(batch_size, data_shape[, …])

Image data iterator with a large number of augmentation choices.

LightingAug(alphastd, eigval, eigvec)

Add PCA based noise.

Number

All numbers inherit from this class.

RandomCropAug(size[, interp])

Make random crop augmenter

RandomGrayAug(p)

Randomly convert to gray image.

RandomOrderAug(ts)

Apply list of augmenters in random order

RandomSizedCropAug(size, area, ratio[, interp])

Make random crop with random resizing and random aspect ratio jitter augmenter.

ResizeAug(size[, interp])

Make resize shorter edge to size augmenter.

SaturationJitterAug(saturation)

Random saturation jitter augmentation.

SequentialAug(ts)

Composing a sequential augmenter list.

Functions

CreateAugmenter(data_shape[, resize, …])

Creates an augmenter list.

CreateDetAugmenter(data_shape[, resize, …])

Create augmenters for detection.

CreateMultiRandCropAugmenter([…])

Helper function to create multiple random crop augmenters.

center_crop(src, size[, interp])

Crops the image src to the given size by trimming on all four sides and preserving the center of the image.

color_normalize(src, mean[, std])

Normalize src with mean and std.

copyMakeBorder([src, top, bot, left, right, …])

Pad image border with OpenCV.

fixed_crop(src, x0, y0, w, h[, size, interp])

Crop src at fixed location, and (optionally) resize it to size.

imdecode(buf, *args, **kwargs)

Decode an image to an NDArray.

imread(filename, *args, **kwargs)

Read and decode an image to an NDArray.

imresize(src, w, h, *args, **kwargs)

Resize image with OpenCV.

imrotate(src, rotation_degrees[, zoom_in, …])

Rotates the input image(s) of a specific rotation degree.

is_np_array()

Checks whether the NumPy-array semantics is currently turned on.

random_crop(src, size[, interp])

Randomly crop src with size (width, height).

random_rotate(src, angle_limits[, zoom_in, …])

Random rotates src by an angle included in angle limits.

random_size_crop(src, size, area, ratio[, …])

Randomly crop src with size.

resize_short(src, size[, interp])

Resizes shorter edge to size.

scale_down(src_size, size)

Scales down crop size if it’s larger than image size.

class mxnet.image.Augmenter(**kwargs)[source]

Bases: object

Image Augmenter base class

Methods

dumps()

Saves the Augmenter to string

dumps()[source]

Saves the Augmenter to string

Returns

JSON formatted string that describes the Augmenter.

Return type

str

class mxnet.image.BrightnessJitterAug(brightness)[source]

Bases: mxnet.image.image.Augmenter

Random brightness jitter augmentation.

Parameters

brightness (float) – The brightness jitter ratio range, [0, 1]

class mxnet.image.CastAug(typ='float32')[source]

Bases: mxnet.image.image.Augmenter

Cast to float32

class mxnet.image.CenterCropAug(size, interp=2)[source]

Bases: mxnet.image.image.Augmenter

Make center crop augmenter.

Parameters
  • size (list or tuple of int) – The desired output image size.

  • interp (int, optional, default=2) – Interpolation method. See resize_short for details.

class mxnet.image.ColorJitterAug(brightness, contrast, saturation)[source]

Bases: mxnet.image.image.RandomOrderAug

Apply random brightness, contrast and saturation jitter in random order.

Parameters
  • brightness (float) – The brightness jitter ratio range, [0, 1]

  • contrast (float) – The contrast jitter ratio range, [0, 1]

  • saturation (float) – The saturation jitter ratio range, [0, 1]

class mxnet.image.ColorNormalizeAug(mean, std)[source]

Bases: mxnet.image.image.Augmenter

Mean and std normalization.

Parameters
  • mean (NDArray) – RGB mean to be subtracted

  • std (NDArray) – RGB standard deviation to be divided

class mxnet.image.ContrastJitterAug(contrast)[source]

Bases: mxnet.image.image.Augmenter

Random contrast jitter augmentation.

Parameters

contrast (float) – The contrast jitter ratio range, [0, 1]

mxnet.image.CreateAugmenter(data_shape, resize=0, rand_crop=False, rand_resize=False, rand_mirror=False, mean=None, std=None, brightness=0, contrast=0, saturation=0, hue=0, pca_noise=0, rand_gray=0, inter_method=2)[source]

Creates an augmenter list.

Parameters
  • data_shape (tuple of int) – Shape for output data

  • resize (int) – Resize shorter edge if larger than 0 at the begining

  • rand_crop (bool) – Whether to enable random cropping other than center crop

  • rand_resize (bool) – Whether to enable random sized cropping, require rand_crop to be enabled

  • rand_gray (float) – [0, 1], probability to convert to grayscale for all channels, the number of channels will not be reduced to 1

  • rand_mirror (bool) – Whether to apply horizontal flip to image with probability 0.5

  • mean (np.ndarray or None) – Mean pixel values for [r, g, b]

  • std (np.ndarray or None) – Standard deviations for [r, g, b]

  • brightness (float) – Brightness jittering range (percent)

  • contrast (float) – Contrast jittering range (percent)

  • saturation (float) – Saturation jittering range (percent)

  • hue (float) – Hue jittering range (percent)

  • pca_noise (float) – Pca noise level (percent)

  • inter_method (int, default=2(Area-based)) –

    Interpolation method for all resizing operations

    Possible values: 0: Nearest Neighbors Interpolation. 1: Bilinear interpolation. 2: Bicubic interpolation over 4x4 pixel neighborhood. 3: Area-based (resampling using pixel area relation). It may be a preferred method for image decimation, as it gives moire-free results. But when the image is zoomed, it is similar to the Nearest Neighbors method. (used by default). 4: Lanczos interpolation over 8x8 pixel neighborhood. 9: Cubic for enlarge, area for shrink, bilinear for others 10: Random select from interpolation method metioned above. Note: When shrinking an image, it will generally look best with AREA-based interpolation, whereas, when enlarging an image, it will generally look best with Bicubic (slow) or Bilinear (faster but still looks OK).

Examples

>>> # An example of creating multiple augmenters
>>> augs = mx.image.CreateAugmenter(data_shape=(3, 300, 300), rand_mirror=True,
...    mean=True, brightness=0.125, contrast=0.125, rand_gray=0.05,
...    saturation=0.125, pca_noise=0.05, inter_method=10)
>>> # dump the details
>>> for aug in augs:
...    aug.dumps()
mxnet.image.CreateDetAugmenter(data_shape, resize=0, rand_crop=0, rand_pad=0, rand_gray=0, rand_mirror=False, mean=None, std=None, brightness=0, contrast=0, saturation=0, pca_noise=0, hue=0, inter_method=2, min_object_covered=0.1, aspect_ratio_range=(0.75, 1.33), area_range=(0.05, 3.0), min_eject_coverage=0.3, max_attempts=50, pad_val=(127, 127, 127))[source]

Create augmenters for detection.

Parameters
  • data_shape (tuple of int) – Shape for output data

  • resize (int) – Resize shorter edge if larger than 0 at the begining

  • rand_crop (float) – [0, 1], probability to apply random cropping

  • rand_pad (float) – [0, 1], probability to apply random padding

  • rand_gray (float) – [0, 1], probability to convert to grayscale for all channels

  • rand_mirror (bool) – Whether to apply horizontal flip to image with probability 0.5

  • mean (np.ndarray or None) – Mean pixel values for [r, g, b]

  • std (np.ndarray or None) – Standard deviations for [r, g, b]

  • brightness (float) – Brightness jittering range (percent)

  • contrast (float) – Contrast jittering range (percent)

  • saturation (float) – Saturation jittering range (percent)

  • hue (float) – Hue jittering range (percent)

  • pca_noise (float) – Pca noise level (percent)

  • inter_method (int, default=2(Area-based)) –

    Interpolation method for all resizing operations

    Possible values: 0: Nearest Neighbors Interpolation. 1: Bilinear interpolation. 2: Area-based (resampling using pixel area relation). It may be a preferred method for image decimation, as it gives moire-free results. But when the image is zoomed, it is similar to the Nearest Neighbors method. (used by default). 3: Bicubic interpolation over 4x4 pixel neighborhood. 4: Lanczos interpolation over 8x8 pixel neighborhood. 9: Cubic for enlarge, area for shrink, bilinear for others 10: Random select from interpolation method metioned above. Note: When shrinking an image, it will generally look best with AREA-based interpolation, whereas, when enlarging an image, it will generally look best with Bicubic (slow) or Bilinear (faster but still looks OK).

  • min_object_covered (float) – The cropped area of the image must contain at least this fraction of any bounding box supplied. The value of this parameter should be non-negative. In the case of 0, the cropped area does not need to overlap any of the bounding boxes supplied.

  • min_eject_coverage (float) – The minimum coverage of cropped sample w.r.t its original size. With this constraint, objects that have marginal area after crop will be discarded.

  • aspect_ratio_range (tuple of floats) – The cropped area of the image must have an aspect ratio = width / height within this range.

  • area_range (tuple of floats) – The cropped area of the image must contain a fraction of the supplied image within in this range.

  • max_attempts (int) – Number of attempts at generating a cropped/padded region of the image of the specified constraints. After max_attempts failures, return the original image.

  • pad_val (float) – Pixel value to be filled when padding is enabled. pad_val will automatically be subtracted by mean and divided by std if applicable.

Examples

>>> # An example of creating multiple augmenters
>>> augs = mx.image.CreateDetAugmenter(data_shape=(3, 300, 300), rand_crop=0.5,
...    rand_pad=0.5, rand_mirror=True, mean=True, brightness=0.125, contrast=0.125,
...    saturation=0.125, pca_noise=0.05, inter_method=10, min_object_covered=[0.3, 0.5, 0.9],
...    area_range=(0.3, 3.0))
>>> # dump the details
>>> for aug in augs:
...    aug.dumps()
mxnet.image.CreateMultiRandCropAugmenter(min_object_covered=0.1, aspect_ratio_range=(0.75, 1.33), area_range=(0.05, 1.0), min_eject_coverage=0.3, max_attempts=50, skip_prob=0)[source]

Helper function to create multiple random crop augmenters.

Parameters
  • min_object_covered (float or list of float, default=0.1) – The cropped area of the image must contain at least this fraction of any bounding box supplied. The value of this parameter should be non-negative. In the case of 0, the cropped area does not need to overlap any of the bounding boxes supplied.

  • min_eject_coverage (float or list of float, default=0.3) – The minimum coverage of cropped sample w.r.t its original size. With this constraint, objects that have marginal area after crop will be discarded.

  • aspect_ratio_range (tuple of floats or list of tuple of floats, default=(0.75, 1.33)) – The cropped area of the image must have an aspect ratio = width / height within this range.

  • area_range (tuple of floats or list of tuple of floats, default=(0.05, 1.0)) – The cropped area of the image must contain a fraction of the supplied image within in this range.

  • max_attempts (int or list of int, default=50) – Number of attempts at generating a cropped/padded region of the image of the specified constraints. After max_attempts failures, return the original image.

Examples

>>> # An example of creating multiple random crop augmenters
>>> min_object_covered = [0.1, 0.3, 0.5, 0.7, 0.9]  # use 5 augmenters
>>> aspect_ratio_range = (0.75, 1.33)  # use same range for all augmenters
>>> area_range = [(0.1, 1.0), (0.2, 1.0), (0.2, 1.0), (0.3, 0.9), (0.5, 1.0)]
>>> min_eject_coverage = 0.3
>>> max_attempts = 50
>>> aug = mx.image.det.CreateMultiRandCropAugmenter(min_object_covered=min_object_covered,
        aspect_ratio_range=aspect_ratio_range, area_range=area_range,
        min_eject_coverage=min_eject_coverage, max_attempts=max_attempts,
        skip_prob=0)
>>> aug.dumps()  # show some details
class mxnet.image.DetAugmenter(**kwargs)[source]

Bases: object

Detection base augmenter

Methods

dumps()

Saves the Augmenter to string

dumps()[source]

Saves the Augmenter to string

Returns

JSON formatted string that describes the Augmenter.

Return type

str

class mxnet.image.DetBorrowAug(augmenter)[source]

Bases: mxnet.image.detection.DetAugmenter

Borrow standard augmenter from image classification. Which is good once you know label won’t be affected after this augmenter.

Parameters

augmenter (mx.image.Augmenter) – The borrowed standard augmenter which has no effect on label

Methods

dumps()

Override the default one to avoid duplicate dump.

dumps()[source]

Override the default one to avoid duplicate dump.

class mxnet.image.DetHorizontalFlipAug(p)[source]

Bases: mxnet.image.detection.DetAugmenter

Random horizontal flipping.

Parameters

p (float) – chance [0, 1] to flip

class mxnet.image.DetRandomCropAug(min_object_covered=0.1, aspect_ratio_range=(0.75, 1.33), area_range=(0.05, 1.0), min_eject_coverage=0.3, max_attempts=50)[source]

Bases: mxnet.image.detection.DetAugmenter

Random cropping with constraints

Parameters
  • min_object_covered (float, default=0.1) – The cropped area of the image must contain at least this fraction of any bounding box supplied. The value of this parameter should be non-negative. In the case of 0, the cropped area does not need to overlap any of the bounding boxes supplied.

  • min_eject_coverage (float, default=0.3) – The minimum coverage of cropped sample w.r.t its original size. With this constraint, objects that have marginal area after crop will be discarded.

  • aspect_ratio_range (tuple of floats, default=(0.75, 1.33)) – The cropped area of the image must have an aspect ratio = width / height within this range.

  • area_range (tuple of floats, default=(0.05, 1.0)) – The cropped area of the image must contain a fraction of the supplied image within in this range.

  • max_attempts (int, default=50) – Number of attempts at generating a cropped/padded region of the image of the specified constraints. After max_attempts failures, return the original image.

class mxnet.image.DetRandomPadAug(aspect_ratio_range=(0.75, 1.33), area_range=(1.0, 3.0), max_attempts=50, pad_val=(128, 128, 128))[source]

Bases: mxnet.image.detection.DetAugmenter

Random padding augmenter.

Parameters
  • aspect_ratio_range (tuple of floats, default=(0.75, 1.33)) – The padded area of the image must have an aspect ratio = width / height within this range.

  • area_range (tuple of floats, default=(1.0, 3.0)) – The padded area of the image must be larger than the original area

  • max_attempts (int, default=50) – Number of attempts at generating a padded region of the image of the specified constraints. After max_attempts failures, return the original image.

  • pad_val (float or tuple of float, default=(128, 128, 128)) – pixel value to be filled when padding is enabled.

class mxnet.image.DetRandomSelectAug(aug_list, skip_prob=0)[source]

Bases: mxnet.image.detection.DetAugmenter

Randomly select one augmenter to apply, with chance to skip all.

Parameters
  • aug_list (list of DetAugmenter) – The random selection will be applied to one of the augmenters

  • skip_prob (float) – The probability to skip all augmenters and return input directly

Methods

dumps()

Override default.

dumps()[source]

Override default.

class mxnet.image.ForceResizeAug(size, interp=2)[source]

Bases: mxnet.image.image.Augmenter

Force resize to size regardless of aspect ratio

Parameters
  • size (tuple of (int, int)) – The desired size as in (width, height)

  • interp (int, optional, default=2) – Interpolation method. See resize_short for details.

class mxnet.image.HorizontalFlipAug(p)[source]

Bases: mxnet.image.image.Augmenter

Random horizontal flip.

Parameters

p (float) – Probability to flip image horizontally

class mxnet.image.HueJitterAug(hue)[source]

Bases: mxnet.image.image.Augmenter

Random hue jitter augmentation.

Parameters

hue (float) – The hue jitter ratio range, [0, 1]

class mxnet.image.ImageDetIter(batch_size, data_shape, path_imgrec=None, path_imglist=None, path_root=None, path_imgidx=None, shuffle=False, part_index=0, num_parts=1, aug_list=None, imglist=None, data_name='data', label_name='label', last_batch_handle='pad', **kwargs)[source]

Bases: mxnet.image.image.ImageIter

Image iterator with a large number of augmentation choices for detection.

Parameters
  • aug_list (list or None) – Augmenter list for generating distorted images

  • batch_size (int) – Number of examples per batch.

  • data_shape (tuple) – Data shape in (channels, height, width) format. For now, only RGB image with 3 channels is supported.

  • path_imgrec (str) – Path to image record file (.rec). Created with tools/im2rec.py or bin/im2rec.

  • path_imglist (str) – Path to image list (.lst). Created with tools/im2rec.py or with custom script. Format: Tab separated record of index, one or more labels and relative_path_from_root.

  • imglist (list) – A list of images with the label(s). Each item is a list [imagelabel: float or list of float, imgpath].

  • path_root (str) – Root folder of image files.

  • path_imgidx (str) – Path to image index file. Needed for partition and shuffling when using .rec source.

  • shuffle (bool) – Whether to shuffle all images at the start of each iteration or not. Can be slow for HDD.

  • part_index (int) – Partition index.

  • num_parts (int) – Total number of partitions.

  • data_name (str) – Data name for provided symbols.

  • label_name (str) – Name for detection labels

  • last_batch_handle (str, optional) – How to handle the last batch. This parameter can be ‘pad’(default), ‘discard’ or ‘roll_over’. If ‘pad’, the last batch will be padded with data starting from the begining If ‘discard’, the last batch will be discarded If ‘roll_over’, the remaining elements will be rolled over to the next iteration

  • kwargs – More arguments for creating augmenter. See mx.image.CreateDetAugmenter.

Methods

augmentation_transform(data, label)

Override Transforms input data with specified augmentations.

check_label_shape(label_shape)

Checks if the new label shape is valid

draw_next([color, thickness, mean, std, …])

Display next image with bounding boxes drawn.

next()

Override the function for returning next batch.

reshape([data_shape, label_shape])

Reshape iterator for data_shape or label_shape.

sync_label_shape(it[, verbose])

Synchronize label shape with the input iterator.

augmentation_transform(data, label)[source]

Override Transforms input data with specified augmentations.

check_label_shape(label_shape)[source]

Checks if the new label shape is valid

draw_next(color=None, thickness=2, mean=None, std=None, clip=True, waitKey=None, window_name='draw_next', id2labels=None)[source]

Display next image with bounding boxes drawn.

Parameters
  • color (tuple) – Bounding box color in RGB, use None for random color

  • thickness (int) – Bounding box border thickness

  • mean (True or numpy.ndarray) – Compensate for the mean to have better visual effect

  • std (True or numpy.ndarray) – Revert standard deviations

  • clip (bool) – If true, clip to [0, 255] for better visual effect

  • waitKey (None or int) – Hold the window for waitKey milliseconds if set, skip ploting if None

  • window_name (str) – Plot window name if waitKey is set.

  • id2labels (dict) – Mapping of labels id to labels name.

Returns

Return type

numpy.ndarray

Examples

>>> # use draw_next to get images with bounding boxes drawn
>>> iterator = mx.image.ImageDetIter(1, (3, 600, 600), path_imgrec='train.rec')
>>> for image in iterator.draw_next(waitKey=None):
...     # display image
>>> # or let draw_next display using cv2 module
>>> for image in iterator.draw_next(waitKey=0, window_name='disp'):
...     pass
next()[source]

Override the function for returning next batch.

reshape(data_shape=None, label_shape=None)[source]

Reshape iterator for data_shape or label_shape.

Parameters
  • data_shape (tuple or None) – Reshape the data_shape to the new shape if not None

  • label_shape (tuple or None) – Reshape label shape to new shape if not None

sync_label_shape(it, verbose=False)[source]

Synchronize label shape with the input iterator. This is useful when train/validation iterators have different label padding.

Parameters
  • it (ImageDetIter) – The other iterator to synchronize

  • verbose (bool) – Print verbose log if true

Returns

The synchronized other iterator, the internal label shape is updated as well.

Return type

ImageDetIter

Examples

>>> train_iter = mx.image.ImageDetIter(32, (3, 300, 300), path_imgrec='train.rec')
>>> val_iter = mx.image.ImageDetIter(32, (3, 300, 300), path.imgrec='val.rec')
>>> train_iter.label_shape
(30, 6)
>>> val_iter.label_shape
(25, 6)
>>> val_iter = train_iter.sync_label_shape(val_iter, verbose=False)
>>> train_iter.label_shape
(30, 6)
>>> val_iter.label_shape
(30, 6)
class mxnet.image.ImageIter(batch_size, data_shape, label_width=1, path_imgrec=None, path_imglist=None, path_root=None, path_imgidx=None, shuffle=False, part_index=0, num_parts=1, aug_list=None, imglist=None, data_name='data', label_name='softmax_label', dtype='float32', last_batch_handle='pad', **kwargs)[source]

Bases: mxnet.io.io.DataIter

Image data iterator with a large number of augmentation choices. This iterator supports reading from both .rec files and raw image files.

To load input images from .rec files, use path_imgrec parameter and to load from raw image files, use path_imglist and path_root parameters.

To use data partition (for distributed training) or shuffling, specify path_imgidx parameter.

Parameters
  • batch_size (int) – Number of examples per batch.

  • data_shape (tuple) – Data shape in (channels, height, width) format. For now, only RGB image with 3 channels is supported.

  • label_width (int, optional) – Number of labels per example. The default label width is 1.

  • path_imgrec (str) – Path to image record file (.rec). Created with tools/im2rec.py or bin/im2rec.

  • path_imglist (str) – Path to image list (.lst). Created with tools/im2rec.py or with custom script. Format: Tab separated record of index, one or more labels and relative_path_from_root.

  • imglist (list) – A list of images with the label(s). Each item is a list [imagelabel: float or list of float, imgpath].

  • path_root (str) – Root folder of image files.

  • path_imgidx (str) – Path to image index file. Needed for partition and shuffling when using .rec source.

  • shuffle (bool) – Whether to shuffle all images at the start of each iteration or not. Can be slow for HDD.

  • part_index (int) – Partition index.

  • num_parts (int) – Total number of partitions.

  • data_name (str) – Data name for provided symbols.

  • label_name (str) – Label name for provided symbols.

  • dtype (str) – Label data type. Default: float32. Other options: int32, int64, float64

  • last_batch_handle (str, optional) – How to handle the last batch. This parameter can be ‘pad’(default), ‘discard’ or ‘roll_over’. If ‘pad’, the last batch will be padded with data starting from the begining If ‘discard’, the last batch will be discarded If ‘roll_over’, the remaining elements will be rolled over to the next iteration

  • kwargs – More arguments for creating augmenter. See mx.image.CreateAugmenter.

Methods

augmentation_transform(data)

Transforms input data with specified augmentation.

check_data_shape(data_shape)

Checks if the input data shape is valid

check_valid_image(data)

Checks if the input data is valid

hard_reset()

Resets the iterator and ignore roll over data

imdecode(s)

Decodes a string or byte string to an NDArray.

next()

Returns the next batch of data.

next_sample()

Helper function for reading in next sample.

postprocess_data(datum)

Final postprocessing step before image is loaded into the batch.

read_image(fname)

Reads an input image fname and returns the decoded raw bytes.

reset()

Resets the iterator to the beginning of the data.

augmentation_transform(data)[source]

Transforms input data with specified augmentation.

check_data_shape(data_shape)[source]

Checks if the input data shape is valid

check_valid_image(data)[source]

Checks if the input data is valid

hard_reset()[source]

Resets the iterator and ignore roll over data

imdecode(s)[source]

Decodes a string or byte string to an NDArray. See mx.img.imdecode for more details.

next()[source]

Returns the next batch of data.

next_sample()[source]

Helper function for reading in next sample.

postprocess_data(datum)[source]

Final postprocessing step before image is loaded into the batch.

read_image(fname)[source]

Reads an input image fname and returns the decoded raw bytes. .. rubric:: Examples

>>> dataIter.read_image('Face.jpg') # returns decoded raw bytes.
reset()[source]

Resets the iterator to the beginning of the data.

class mxnet.image.LightingAug(alphastd, eigval, eigvec)[source]

Bases: mxnet.image.image.Augmenter

Add PCA based noise.

Parameters
  • alphastd (float) – Noise level

  • eigval (3x1 np.array) – Eigen values

  • eigvec (3x3 np.array) – Eigen vectors

class mxnet.image.Number[source]

Bases: object

All numbers inherit from this class.

If you just want to check if an argument x is a number, without caring what kind, use isinstance(x, Number).

class mxnet.image.RandomCropAug(size, interp=2)[source]

Bases: mxnet.image.image.Augmenter

Make random crop augmenter

Parameters
  • size (int) – The length to be set for the shorter edge.

  • interp (int, optional, default=2) – Interpolation method. See resize_short for details.

class mxnet.image.RandomGrayAug(p)[source]

Bases: mxnet.image.image.Augmenter

Randomly convert to gray image.

Parameters

p (float) – Probability to convert to grayscale

class mxnet.image.RandomOrderAug(ts)[source]

Bases: mxnet.image.image.Augmenter

Apply list of augmenters in random order

Parameters

ts (list of augmenters) – A series of augmenters to be applied in random order

Methods

dumps()

Override the default to avoid duplicate dump.

dumps()[source]

Override the default to avoid duplicate dump.

class mxnet.image.RandomSizedCropAug(size, area, ratio, interp=2, **kwargs)[source]

Bases: mxnet.image.image.Augmenter

Make random crop with random resizing and random aspect ratio jitter augmenter.

Parameters
  • size (tuple of (int, int)) – Size of the crop formatted as (width, height).

  • area (float in (0, 1] or tuple of (float, float)) – If tuple, minimum area and maximum area to be maintained after cropping If float, minimum area to be maintained after cropping, maximum area is set to 1.0

  • ratio (tuple of (float, float)) – Aspect ratio range as (min_aspect_ratio, max_aspect_ratio)

  • interp (int, optional, default=2) – Interpolation method. See resize_short for details.

class mxnet.image.ResizeAug(size, interp=2)[source]

Bases: mxnet.image.image.Augmenter

Make resize shorter edge to size augmenter.

Parameters
  • size (int) – The length to be set for the shorter edge.

  • interp (int, optional, default=2) – Interpolation method. See resize_short for details.

class mxnet.image.SaturationJitterAug(saturation)[source]

Bases: mxnet.image.image.Augmenter

Random saturation jitter augmentation.

Parameters

saturation (float) – The saturation jitter ratio range, [0, 1]

class mxnet.image.SequentialAug(ts)[source]

Bases: mxnet.image.image.Augmenter

Composing a sequential augmenter list.

Parameters

ts (list of augmenters) – A series of augmenters to be applied in sequential order.

Methods

dumps()

Override the default to avoid duplicate dump.

dumps()[source]

Override the default to avoid duplicate dump.

mxnet.image.center_crop(src, size, interp=2)[source]

Crops the image src to the given size by trimming on all four sides and preserving the center of the image. Upsamples if src is smaller than size.

Note

This requires MXNet to be compiled with USE_OPENCV.

Parameters
  • src (NDArray) – Binary source image data.

  • size (list or tuple of int) – The desired output image size.

  • interp (int, optional, default=2) – Interpolation method. See resize_short for details.

Returns

  • NDArray – The cropped image.

  • Tuple – (x, y, width, height) where x, y are the positions of the crop in the original image and width, height the dimensions of the crop.

Example

>>> with open("flower.jpg", 'rb') as fp:
...     str_image = fp.read()
...
>>> image = mx.image.imdecode(str_image)
>>> image
<NDArray 2321x3482x3 @cpu(0)>
>>> cropped_image, (x, y, width, height) = mx.image.center_crop(image, (1000, 500))
>>> cropped_image
<NDArray 500x1000x3 @cpu(0)>
>>> x, y, width, height
(1241, 910, 1000, 500)
mxnet.image.color_normalize(src, mean, std=None)[source]

Normalize src with mean and std.

Parameters
  • src (NDArray) – Input image

  • mean (NDArray) – RGB mean to be subtracted

  • std (NDArray) – RGB standard deviation to be divided

Returns

An NDArray containing the normalized image.

Return type

NDArray

mxnet.image.copyMakeBorder(src=None, top=_Null, bot=_Null, left=_Null, right=_Null, type=_Null, value=_Null, values=_Null, out=None, name=None, **kwargs)

Pad image border with OpenCV.

Parameters
  • src (NDArray) – source image

  • top (int, required) – Top margin.

  • bot (int, required) – Bottom margin.

  • left (int, required) – Left margin.

  • right (int, required) – Right margin.

  • type (int, optional, default='0') – Filling type (default=cv2.BORDER_CONSTANT).

  • value (double, optional, default=0) – (Deprecated! Use values instead.) Fill with single value.

  • values (tuple of <double>, optional, default=[]) – Fill with value(RGB[A] or gray), up to 4 channels.

  • out (NDArray, optional) – The output NDArray to hold the result.

Returns

out – The output of this function.

Return type

NDArray or list of NDArrays

mxnet.image.fixed_crop(src, x0, y0, w, h, size=None, interp=2)[source]

Crop src at fixed location, and (optionally) resize it to size.

Parameters
  • src (NDArray) – Input image

  • x0 (int) – Left boundary of the cropping area

  • y0 (int) – Top boundary of the cropping area

  • w (int) – Width of the cropping area

  • h (int) – Height of the cropping area

  • size (tuple of (w, h)) – Optional, resize to new size after cropping

  • interp (int, optional, default=2) – Interpolation method. See resize_short for details.

Returns

An NDArray containing the cropped image.

Return type

NDArray

mxnet.image.imdecode(buf, *args, **kwargs)[source]

Decode an image to an NDArray.

Note

imdecode uses OpenCV (not the CV2 Python library). MXNet must have been built with USE_OPENCV=1 for imdecode to work.

Parameters
  • buf (str/bytes/bytearray or numpy.ndarray) – Binary image data as string or numpy ndarray.

  • flag (int, optional, default=1) – 1 for three channel color output. 0 for grayscale output.

  • to_rgb (int, optional, default=1) – 1 for RGB formatted output (MXNet default). 0 for BGR formatted output (OpenCV default).

  • out (NDArray, optional) – Output buffer. Use None for automatic allocation.

Returns

An NDArray containing the image.

Return type

NDArray

Example

>>> with open("flower.jpg", 'rb') as fp:
...     str_image = fp.read()
...
>>> image = mx.img.imdecode(str_image)
>>> image
<NDArray 224x224x3 @cpu(0)>

Set flag parameter to 0 to get grayscale output

>>> with open("flower.jpg", 'rb') as fp:
...     str_image = fp.read()
...
>>> image = mx.img.imdecode(str_image, flag=0)
>>> image
<NDArray 224x224x1 @cpu(0)>

Set to_rgb parameter to 0 to get output in OpenCV format (BGR)

>>> with open("flower.jpg", 'rb') as fp:
...     str_image = fp.read()
...
>>> image = mx.img.imdecode(str_image, to_rgb=0)
>>> image
<NDArray 224x224x3 @cpu(0)>
mxnet.image.imread(filename, *args, **kwargs)[source]

Read and decode an image to an NDArray.

Note

imread uses OpenCV (not the CV2 Python library). MXNet must have been built with USE_OPENCV=1 for imdecode to work.

Parameters
  • filename (str) – Name of the image file to be loaded.

  • flag ({0, 1}, default 1) – 1 for three channel color output. 0 for grayscale output.

  • to_rgb (bool, default True) – True for RGB formatted output (MXNet default). False for BGR formatted output (OpenCV default).

  • out (NDArray, optional) – Output buffer. Use None for automatic allocation.

Returns

An NDArray containing the image.

Return type

NDArray

Example

>>> mx.img.imread("flower.jpg")
<NDArray 224x224x3 @cpu(0)>

Set flag parameter to 0 to get grayscale output

>>> mx.img.imread("flower.jpg", flag=0)
<NDArray 224x224x1 @cpu(0)>

Set to_rgb parameter to 0 to get output in OpenCV format (BGR)

>>> mx.img.imread("flower.jpg", to_rgb=0)
<NDArray 224x224x3 @cpu(0)>
mxnet.image.imresize(src, w, h, *args, **kwargs)[source]

Resize image with OpenCV.

Note

imresize uses OpenCV (not the CV2 Python library). MXNet must have been built with USE_OPENCV=1 for imresize to work.

Parameters
  • src (NDArray) – source image

  • w (int, required) – Width of resized image.

  • h (int, required) – Height of resized image.

  • interp (int, optional, default=1) – Interpolation method (default=cv2.INTER_LINEAR). Possible values: 0: Nearest Neighbors Interpolation. 1: Bilinear interpolation. 2: Bicubic interpolation over 4x4 pixel neighborhood. 3: Area-based (resampling using pixel area relation). It may be a preferred method for image decimation, as it gives moire-free results. But when the image is zoomed, it is similar to the Nearest Neighbors method. (used by default). 4: Lanczos interpolation over 8x8 pixel neighborhood. 9: Cubic for enlarge, area for shrink, bilinear for others 10: Random select from interpolation method metioned above. Note: When shrinking an image, it will generally look best with AREA-based interpolation, whereas, when enlarging an image, it will generally look best with Bicubic (slow) or Bilinear (faster but still looks OK). More details can be found in the documentation of OpenCV, please refer to http://docs.opencv.org/master/da/d54/group__imgproc__transform.html.

  • out (NDArray, optional) – The output NDArray to hold the result.

Returns

out – The output of this function.

Return type

NDArray or list of NDArrays

Example

>>> with open("flower.jpeg", 'rb') as fp:
...     str_image = fp.read()
...
>>> image = mx.img.imdecode(str_image)
>>> image
<NDArray 2321x3482x3 @cpu(0)>
>>> new_image = mx.img.resize(image, 240, 360)
>>> new_image
<NDArray 240x360x3 @cpu(0)>
mxnet.image.imrotate(src, rotation_degrees, zoom_in=False, zoom_out=False)[source]

Rotates the input image(s) of a specific rotation degree.

Parameters
  • src (NDArray) – Input image (format CHW) or batch of images (format NCHW), in both case is required a float32 data type.

  • rotation_degrees (scalar or NDArray) – Wanted rotation in degrees. In case of src being a single image a scalar is needed, otherwise a mono-dimensional vector of angles or a scalar.

  • zoom_in (bool) – If True input image(s) will be zoomed in a way so that no padding will be shown in the output result.

  • zoom_out (bool) – If True input image(s) will be zoomed in a way so that the whole original image will be contained in the output result.

Returns

An NDArray containing the rotated image(s).

Return type

NDArray

mxnet.image.is_np_array()[source]

Checks whether the NumPy-array semantics is currently turned on. This is currently used in Gluon for checking whether an array of type mxnet.numpy.ndarray or mx.nd.NDArray should be created. For example, at the time when a parameter is created in a Block, an mxnet.numpy.ndarray is created if this returns true; else an mx.nd.NDArray is created.

Normally, users are not recommended to use this API directly unless you known exactly what is going on under the hood.

Please note that this is designed as an infrastructure for the incoming MXNet-NumPy operators. Legacy operators registered in the modules mx.nd and mx.sym are not guaranteed to behave like their counterparts in NumPy within this semantics.

Returns

Return type

A bool value indicating whether the NumPy-array semantics is currently on.

mxnet.image.random_crop(src, size, interp=2)[source]

Randomly crop src with size (width, height). Upsample result if src is smaller than size.

Parameters
  • src (Source image NDArray) –

  • size (Size of the crop formatted as (width, height). If the size is larger) – than the image, then the source image is upsampled to size and returned.

  • interp (int, optional, default=2) – Interpolation method. See resize_short for details.

Returns

  • NDArray – An NDArray containing the cropped image.

  • Tuple – A tuple (x, y, width, height) where (x, y) is top-left position of the crop in the original image and (width, height) are the dimensions of the cropped image.

Example

>>> im = mx.nd.array(cv2.imread("flower.jpg"))
>>> cropped_im, rect  = mx.image.random_crop(im, (100, 100))
>>> print cropped_im
<NDArray 100x100x1 @cpu(0)>
>>> print rect
(20, 21, 100, 100)
mxnet.image.random_rotate(src, angle_limits, zoom_in=False, zoom_out=False)[source]

Random rotates src by an angle included in angle limits.

Parameters
  • src (NDArray) – Input image (format CHW) or batch of images (format NCHW), in both case is required a float32 data type.

  • angle_limits (tuple) – Tuple of 2 elements containing the upper and lower limit for rotation angles in degree.

  • zoom_in (bool) – If True input image(s) will be zoomed in a way so that no padding will be shown in the output result.

  • zoom_out (bool) – If True input image(s) will be zoomed in a way so that the whole original image will be contained in the output result.

Returns

An NDArray containing the rotated image(s).

Return type

NDArray

mxnet.image.random_size_crop(src, size, area, ratio, interp=2, **kwargs)[source]

Randomly crop src with size. Randomize area and aspect ratio.

Parameters
  • src (NDArray) – Input image

  • size (tuple of (int, int)) – Size of the crop formatted as (width, height).

  • area (float in (0, 1] or tuple of (float, float)) – If tuple, minimum area and maximum area to be maintained after cropping If float, minimum area to be maintained after cropping, maximum area is set to 1.0

  • ratio (tuple of (float, float)) – Aspect ratio range as (min_aspect_ratio, max_aspect_ratio)

  • interp (int, optional, default=2) – Interpolation method. See resize_short for details.

Returns

  • NDArray – An NDArray containing the cropped image.

  • Tuple – A tuple (x, y, width, height) where (x, y) is top-left position of the crop in the original image and (width, height) are the dimensions of the cropped image.

mxnet.image.resize_short(src, size, interp=2)[source]

Resizes shorter edge to size.

Note

resize_short uses OpenCV (not the CV2 Python library). MXNet must have been built with OpenCV for resize_short to work.

Resizes the original image by setting the shorter edge to size and setting the longer edge accordingly. Resizing function is called from OpenCV.

Parameters
  • src (NDArray) – The original image.

  • size (int) – The length to be set for the shorter edge.

  • interp (int, optional, default=2) – Interpolation method used for resizing the image. Possible values: 0: Nearest Neighbors Interpolation. 1: Bilinear interpolation. 2: Bicubic interpolation over 4x4 pixel neighborhood. 3: Area-based (resampling using pixel area relation). It may be a preferred method for image decimation, as it gives moire-free results. But when the image is zoomed, it is similar to the Nearest Neighbors method. (used by default). 4: Lanczos interpolation over 8x8 pixel neighborhood. 9: Cubic for enlarge, area for shrink, bilinear for others 10: Random select from interpolation method metioned above. Note: When shrinking an image, it will generally look best with AREA-based interpolation, whereas, when enlarging an image, it will generally look best with Bicubic (slow) or Bilinear (faster but still looks OK). More details can be found in the documentation of OpenCV, please refer to http://docs.opencv.org/master/da/d54/group__imgproc__transform.html.

Returns

An ‘NDArray’ containing the resized image.

Return type

NDArray

Example

>>> with open("flower.jpeg", 'rb') as fp:
...     str_image = fp.read()
...
>>> image = mx.img.imdecode(str_image)
>>> image
<NDArray 2321x3482x3 @cpu(0)>
>>> size = 640
>>> new_image = mx.img.resize_short(image, size)
>>> new_image
<NDArray 2321x3482x3 @cpu(0)>
mxnet.image.scale_down(src_size, size)[source]

Scales down crop size if it’s larger than image size.

If width/height of the crop is larger than the width/height of the image, sets the width/height to the width/height of the image.

Parameters
  • src_size (tuple of int) – Size of the image in (width, height) format.

  • size (tuple of int) – Size of the crop in (width, height) format.

Returns

A tuple containing the scaled crop size in (width, height) format.

Return type

tuple of int

Example

>>> src_size = (640,480)
>>> size = (720,120)
>>> new_size = mx.img.scale_down(src_size, size)
>>> new_size
(640,106)