Image API

Overview

This document summarizes supporting functions and iterators to read and process images provided in

mxnet.image

Image processing functions

image.imdecode
image.scale_down
image.resize_short
image.fixed_crop
image.random_crop
image.center_crop
image.color_normalize
image.random_size_crop

Image iterators

Iterators support loading image from binary Record IO and raw image files.

image.ImageIter
>>> data_iter = mx.image.ImageIter(batch_size=4, data_shape=(3, 224, 224), label_width=1,
                                   path_imglist='data/custom.lst')
>>> data_iter.reset()
>>> for data in data_iter:
...     d = data.data[0]
...     print(d.shape)
>>> # we can apply lots of augmentations as well
>>> data_iter = mx.image.ImageIter(4, (3, 224, 224), path_imglist='data/custom.lst',
                                   rand_crop=resize=True, rand_mirror=True, mean=True,
                                   brightness=0.1, contrast=0.1, saturation=0.1, hue=0.1,
                                   pca_noise=0.1, rand_gray=0.05)
>>> data = data_iter.next()
>>> # specify augmenters manually is also supported
>>> data_iter = mx.image.ImageIter(32, (3, 224, 224), path_rec='data/caltech.rec',
                                   path_imgidx='data/caltech.idx', shuffle=True,
                                   aug_list=[mx.image.HorizontalFlipAug(0.5),
                                   mx.image.ColorJitterAug(0.1, 0.1, 0.1)])

We use helper function to initialize augmenters

image.CreateAugmenter

A list of supporting augmenters

image.Augmenter
image.ResizeAug
image.ForceResizeAug
image.RandomCropAug
image.RandomSizedCropAug
image.CenterCropAug
image.RandomOrderAug
image.BrightnessJitterAug
image.ContrastJitterAug
image.SaturationJitterAug
image.HueJitterAug
image.ColorJitterAug
image.LightingAug
image.ColorNormalizeAug
image.RandomGrayAug
image.HorizontalFlipAug
image.CastAug

Similar to ImageIter, ImageDetIter is designed for Object Detection tasks.

image.ImageDetIter
>>> data_iter = mx.image.ImageDetIter(batch_size=4, data_shape=(3, 224, 224),
                                      path_imglist='data/train.lst')
>>> data_iter.reset()
>>> for data in data_iter:
...     d = data.data[0]
...     l = data.label[0]
...     print(d.shape)
...     print(l.shape)

Unlike object classification with fixed label_width, object count may vary from image to image. Thus we have special format for object detection labels. Usually the lst file generated by tools/im2rec.py is a list of

index_0  label_0  image_path_0
index_1  label_1  image_path_1

Where label_N is a number a of fixed-width vector. The format of label used in object detection is a variable length vector

A  B  [header]  [(object0), (object1), ... (objectN)]

Where A is the width of header, B is the width of each object. Header is optional and used for inserting helper information such as (width, height). Each object is usually 5 or 6 numbers describing the object properties, for example: [id, xmin, ymin, xmax, ymax, difficulty] Putting all together, we have a lst file for object detection:

0  2  5  640  480  1  0.1  0.2  0.8  0.9  2  0.5  0.3  0.6  0.8  data/xxx.jpg
1  2  5  480  640  3  0.05  0.16  0.75  0.9  data/xxx.jpg
2  2  5  500  600  2  0.6  0.1  0.7  0.5  0  0.1  0.3  0.2  0.4  3  0.25  0.25  0.3  0.3 data/xxx.jpg
...

A helper function to initialize Augmenters for Object detection task

image.CreateDetAugmenter

Since Detection task is sensitive to object localization, any modification to image that introduced localization shift will require correction to label, and a list of augmenters specific for Object detection is provided

image.DetBorrowAug
image.DetRandomSelectAug
image.DetHorizontalFlipAug
image.DetRandomCropAug
image.DetRandomPadAug

API Reference