mx.io.LibSVMIter
¶
Description¶
Returns the LibSVM iterator which returns data with csr storage type. This iterator is experimental and should be used with care.
The input data is stored in a format similar to LibSVM file format, except that the indices are expected to be zero-based instead of one-based, and the column indices for each row are expected to be sorted in ascending order. Details of the LibSVM format are available here.
The data_shape parameter is used to set the shape of each line of the data. The dimension of both data_shape and label_shape are expected to be 1.
The data_libsvm parameter is used to set the path input LibSVM file. When it is set to a directory, all the files in the directory will be read.
When label_libsvm is set to NULL
, both data and label are read from the file specified
by data_libsvm. In this case, the data is stored in csr storage type, while the label is a 1D
dense array.
The LibSVMIter only support round_batch parameter set to True
. Therefore, if batch_size
is 3 and there are 4 total rows in libsvm file, 2 more examples are consumed at the first round.
When num_parts and part_index are provided, the data is split into num_parts partitions, and the iterator only reads the part_index-th partition. However, the partitions are not guaranteed to be even.
reset()
is expected to be called only after a complete pass of data.
Example:
# Contents of libsvm file ``data.t``.
1.0 0:0.5 2:1.2
-2.0
-3.0 0:0.6 1:2.4 2:1.2
4 2:-1.2
# Creates a `LibSVMIter` with `batch_size`=3.
>>> data_iter = mx.io.LibSVMIter(data_libsvm = 'data.t', data_shape = (3,), batch_size = 3)
# The data of the first batch is stored in csr storage type
>>> batch = data_iter.next()
>>> csr = batch.data[0]
<CSRNDArray 3x3 @cpu(0)>
>>> csr.asnumpy()
[[ 0.5 0. 1.2 ]
[ 0. 0. 0. ]
[ 0.6 2.4 1.2]]
# The label of first batch
>>> label = batch.label[0]
>>> label
[ 1. -2. -3.]
<NDArray 3 @cpu(0)>
>>> second_batch = data_iter.next()
# The data of the second batch
>>> second_batch.data[0].asnumpy()
[[ 0. 0. -1.2 ]
[ 0.5 0. 1.2 ]
[ 0. 0. 0. ]]
# The label of the second batch
>>> second_batch.label[0].asnumpy()
[ 4. 1. -2.]
>>> data_iter.reset()
# To restart the iterator for the second pass of the data
When `label_libsvm` is set to the path to another LibSVM file,
data is read from `data_libsvm` and label from `label_libsvm`.
In this case, both data and label are stored in the csr format.
If the label column in the `data_libsvm` file is ignored.
Example:
# Contents of libsvm file ``label.t``
1.0
-2.0 0:0.125
-3.0 2:1.2
4 1:1.0 2:-1.2
# Creates a `LibSVMIter` with specified label file
>>> data_iter = mx.io.LibSVMIter(data_libsvm = 'data.t', data_shape = (3,),
label_libsvm = 'label.t', label_shape = (3,), batch_size = 3)
# Both data and label are in csr storage type
>>> batch = data_iter.next()
>>> csr_data = batch.data[0]
<CSRNDArray 3x3 @cpu(0)>
>>> csr_data.asnumpy()
[[ 0.5 0. 1.2 ]
[ 0. 0. 0. ]
[ 0.6 2.4 1.2 ]]
>>> csr_label = batch.label[0]
<CSRNDArray 3x3 @cpu(0)>
>>> csr_label.asnumpy()
[[ 0. 0. 0. ]
[ 0.125 0. 0. ]
[ 0. 0. 1.2 ]]
Usage¶
mx.io.LibSVMIter(...)
Arguments¶
Argument |
Description |
---|---|
|
string, required. The input zero-base indexed LibSVM data file or a directory path. |
|
Shape(tuple), required. The shape of one example. |
|
string, optional, default=’NULL’. The input LibSVM label file or a directory path. If NULL,
all labels will be read from
|
|
Shape(tuple), optional, default=[1]. The shape of one label. |
|
int, optional, default=’1’. partition the data into multiple parts |
|
int, optional, default=’0’. the index of the part will read |
|
int (non-negative), required. Batch size. |
|
boolean, optional, default=1. Whether to use round robin to handle overflow batch or not. |
|
long (non-negative), optional, default=4. Maximum number of batches to prefetch. |
|
{‘cpu’, ‘gpu’},optional, default=’gpu’. Context data loader optimized for. |
|
{None, ‘float16’, ‘float32’, ‘float64’, ‘int32’, ‘int64’, ‘int8’, ‘uint8’},optional, default=’None’. Output data type. |
Value¶
iter
The result mx.dataiter
Link to Source Code: http://github.com/apache/incubator-mxnet/blob/1.6.0/src/io/iter_libsvm.cc#L298