mxnet.io

Functions

CSVIter(*args, **kwargs)

b”Returns the CSV file iterator.nnIn this function, the data_shape parameter is used to set the shape of each line of the input data.nIf a row in an input file is 1,2,3,4,5,6` and data_shape is (3,2), that rownwill be reshaped, yielding the array [[1,2],[3,4],[5,6]] of shape (3,2).nnBy default, the CSVIter has round_batch parameter set to True. So, if batch_sizenis 3 and there are 4 total rows in CSV file, 2 more examplesnare consumed at the first round. If reset function is called after first round,nthe call is ignored and remaining examples are returned in the second round.nnIf one wants all the instances in the second round after calling reset, make surento set round_batch to False.nnIf data_csv = 'data/' is set, then all the files in this directory will be read.nn``reset()`` is expected to be called only after a complete pass of data.nnBy default, the CSVIter parses all entries in the data file as float32 data type,nif dtype argument is set to be ‘int32’ or ‘int64’ then CSVIter will parse all entries in the filenas int32 or int64 data type accordingly.nnExamples::nn // Contents of CSV file data/data.csv.n 1,2,3n 2,3,4n 3,4,5n 4,5,6nn // Creates a CSVIter with batch_size`=2 and default `round_batch`=True.n CSVIter = mx.io.CSVIter(data_csv = ‘data/data.csv’, data_shape = (3,),n batch_size = 2)nn // Two batches read from the above iterator are as follows:n [[ 1. 2. 3.]n [ 2. 3. 4.]]n [[ 3. 4. 5.]n [ 4. 5. 6.]]nn // Creates a `CSVIter with default round_batch set to True.n CSVIter = mx.io.CSVIter(data_csv = ‘data/data.csv’, data_shape = (3,),n batch_size = 3)nn // Two batches read from the above iterator in the first pass are as follows:n [[1. 2. 3.]n [2. 3. 4.]n [3. 4. 5.]]nn [[4. 5. 6.]n [1. 2. 3.]n [2. 3. 4.]]nn // Now, reset method is called.n CSVIter.reset()nn // Batch read from the above iterator in the second pass is as follows:n [[ 3. 4. 5.]n [ 4. 5. 6.]n [ 1. 2. 3.]]nn // Creates a CSVIter with round_batch`=False.n CSVIter = mx.io.CSVIter(data_csv = ‘data/data.csv’, data_shape = (3,),n batch_size = 3, round_batch=False)nn // Contents of two batches read from the above iterator in both passes, after callingn // `reset method before second pass, is as follows:n [[1. 2. 3.]n [2. 3. 4.]n [3. 4. 5.]]nn [[4. 5. 6.]n [2. 3. 4.]n [3. 4. 5.]]nn // Creates a ‘CSVIter’ with dtype`=’int32’n CSVIter = mx.io.CSVIter(data_csv = ‘data/data.csv’, data_shape = (3,),n batch_size = 3, round_batch=False, dtype=’int32’)nn // Contents of two batches read from the above iterator in both passes, after callingn // `reset method before second pass, is as follows:n [[1 2 3]n [2 3 4]n [3 4 5]]nn [[4 5 6]n [2 3 4]n [3 4 5]]nnnnDefined in src/io/iter_csv.cc:L307”

ImageDetRecordIter(*args, **kwargs)

b’Create iterator for image detection dataset packed in recordio.’

ImageRecordInt8Iter(*args, **kwargs)

b”Iterating on image RecordIO filesnn.. note:: ImageRecordInt8Iter is deprecated. Use ImageRecordIter(dtype=’int8’) instead.nnThis iterator is identical to ImageRecordIter except for using int8 asnthe data type instead of float.nnnnDefined in src/io/iter_image_recordio_2.cc:L940”

ImageRecordIter(*args, **kwargs)

b’Iterates on image RecordIO filesnnReads batches of images from .rec RecordIO files. One can use im2rec.py tooln(in tools/) to pack raw image files into RecordIO files. This iterator is lessnflexible to customization but is fast and has lot of language bindings. Toniterate over raw images directly use ImageIter instead (in Python).nnExample::nn data_iter = mx.io.ImageRecordIter(n path_imgrec=”./sample.rec”, # The target record file.n data_shape=(3, 227, 227), # Output data shape; 227x227 region will be cropped from the original image.n batch_size=4, # Number of items per batch.n resize=256 # Resize the shorter edge to 256 before cropping.n # You can specify more augmentation options. Use help(mx.io.ImageRecordIter) to see all the options.n )n # You can now use the data_iter to access batches of images.n batch = data_iter.next() # first batch.n images = batch.data[0] # This will contain 4 (=batch_size) images each of 3x227x227.n # process the imagesn …n data_iter.reset() # To restart the iterator from the beginning.nnnnDefined in src/io/iter_image_recordio_2.cc:L903’

ImageRecordIter_v1(*args, **kwargs)

b’Iterating on image RecordIO filesnn.. note::nn ImageRecordIter_v1 is deprecated. Use ImageRecordIter instead.nnnRead images batches from RecordIO files with a rich of data augmentationnoptions.nnOne can use tools/im2rec.py to pack individual image files into RecordIOnfiles.nnnnDefined in src/io/iter_image_recordio.cc:L351’

ImageRecordUInt8Iter(*args, **kwargs)

b”Iterating on image RecordIO filesnn.. note:: ImageRecordUInt8Iter is deprecated. Use ImageRecordIter(dtype=’uint8’) instead.nnThis iterator is identical to ImageRecordIter except for using uint8 asnthe data type instead of float.nnnnDefined in src/io/iter_image_recordio_2.cc:L922”

ImageRecordUInt8Iter_v1(*args, **kwargs)

b’Iterating on image RecordIO filesnn.. note::nn ImageRecordUInt8Iter_v1 is deprecated. Use ImageRecordUInt8Iter instead.nnThis iterator is identical to ImageRecordIter except for using uint8 asnthe data type instead of float.nnnnDefined in src/io/iter_image_recordio.cc:L376’

LibSVMIter(*args, **kwargs)

b”Returns the LibSVM iterator which returns data with csrnstorage type. This iterator is experimental and should be used with care.nnThe input data is stored in a format similar to LibSVM file format, except that the indicesnare expected to be zero-based instead of one-based, and the column indices for each row arenexpected to be sorted in ascending order. Details of the LibSVM format are availablen`here. <https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/>`_nnnThe data_shape parameter is used to set the shape of each line of the data.nThe dimension of both data_shape and label_shape are expected to be 1.nnThe data_libsvm parameter is used to set the path input LibSVM file.nWhen it is set to a directory, all the files in the directory will be read.nnWhen label_libsvm is set to NULL, both data and label are read from the file specifiednby data_libsvm. In this case, the data is stored in csr storage type, while the label is a 1Dndense array.nnThe LibSVMIter only support round_batch parameter set to True. Therefore, if batch_sizenis 3 and there are 4 total rows in libsvm file, 2 more examples are consumed at the first round.nnWhen num_parts and part_index are provided, the data is split into num_parts partitions,nand the iterator only reads the part_index-th partition. However, the partitions are notnguaranteed to be even.nn``reset()`` is expected to be called only after a complete pass of data.nnExample::nn # Contents of libsvm file data.t.n 1.0 0:0.5 2:1.2n -2.0n -3.0 0:0.6 1:2.4 2:1.2n 4 2:-1.2nn # Creates a LibSVMIter with batch_size`=3.n >>> data_iter = mx.io.LibSVMIter(data_libsvm = ‘data.t’, data_shape = (3,), batch_size = 3)n # The data of the first batch is stored in csr storage typen >>> batch = data_iter.next()n >>> csr = batch.data[0]n <CSRNDArray 3x3 @cpu(0)>n >>> csr.asnumpy()n [[ 0.5 0. 1.2 ]n [ 0. 0. 0. ]n [ 0.6 2.4 1.2]]n # The label of first batchn >>> label = batch.label[0]n >>> labeln [ 1. -2. -3.]n <NDArray 3 @cpu(0)>nn >>> second_batch = data_iter.next()n # The data of the second batchn >>> second_batch.data[0].asnumpy()n [[ 0. 0. -1.2 ]n [ 0.5 0. 1.2 ]n [ 0. 0. 0. ]]n # The label of the second batchn >>> second_batch.label[0].asnumpy()n [ 4. 1. -2.]nn >>> data_iter.reset()n # To restart the iterator for the second pass of the datannWhen `label_libsvm is set to the path to another LibSVM file,ndata is read from data_libsvm and label from label_libsvm.nIn this case, both data and label are stored in the csr format.nIf the label column in the data_libsvm file is ignored.nnExample::nn # Contents of libsvm file label.tn 1.0n -2.0 0:0.125n -3.0 2:1.2n 4 1:1.0 2:-1.2nn # Creates a LibSVMIter with specified label filen >>> data_iter = mx.io.LibSVMIter(data_libsvm = ‘data.t’, data_shape = (3,),n label_libsvm = ‘label.t’, label_shape = (3,), batch_size = 3)nn # Both data and label are in csr storage typen >>> batch = data_iter.next()n >>> csr_data = batch.data[0]n <CSRNDArray 3x3 @cpu(0)>n >>> csr_data.asnumpy()n [[ 0.5 0. 1.2 ]n [ 0. 0. 0. ]n [ 0.6 2.4 1.2 ]]n >>> csr_label = batch.label[0]n <CSRNDArray 3x3 @cpu(0)>n >>> csr_label.asnumpy()n [[ 0. 0. 0. ]n [ 0.125 0. 0. ]n [ 0. 0. 1.2 ]]nnnnDefined in src/io/iter_libsvm.cc:L298”

MNISTIter(*args, **kwargs)

b’Iterating on the MNIST dataset.nnOne can download the dataset from http://yann.lecun.com/exdb/mnist/nnnnDefined in src/io/iter_mnist.cc:L264’

Classes

DataBatch(data[, label, pad, index, …])

A data batch.

DataDesc

DataDesc is used to store name, shape, type and layout information of the data or the label.

DataIter([batch_size])

The base class for an MXNet data iterator.

MXDataIter(handle[, data_name, label_name])

A python wrapper a C++ data iterator.

NDArrayIter(data[, label, batch_size, …])

Returns an iterator for mx.nd.NDArray, numpy.ndarray, h5py.Dataset mx.nd.sparse.CSRNDArray or scipy.sparse.csr_matrix.

PrefetchingIter(iters[, rename_data, …])

Performs pre-fetch for other data iterators.

ResizeIter(data_iter, size[, reset_internal])

Resize a data iterator to a given number of batches.