`mx.io.LibSVMIter`¶

Description¶

Returns the LibSVM iterator which returns data with csr storage type. This iterator is experimental and should be used with care.

The input data is stored in a format similar to LibSVM file format, except that the indices are expected to be zero-based instead of one-based, and the column indices for each row are expected to be sorted in ascending order. Details of the LibSVM format are available here.

The data_shape parameter is used to set the shape of each line of the data. The dimension of both data_shape and label_shape are expected to be 1.

The data_libsvm parameter is used to set the path input LibSVM file. When it is set to a directory, all the files in the directory will be read.

When label_libsvm is set to NULL, both data and label are read from the file specified by data_libsvm. In this case, the data is stored in csr storage type, while the label is a 1D dense array.

The LibSVMIter only support round_batch parameter set to True. Therefore, if batch_size is 3 and there are 4 total rows in libsvm file, 2 more examples are consumed at the first round.

When num_parts and part_index are provided, the data is split into num_parts partitions, and the iterator only reads the part_index-th partition. However, the partitions are not guaranteed to be even.

reset() is expected to be called only after a complete pass of data.

Example:

# Contents of libsvm file ``data.t``.
1.0 0:0.5 2:1.2
-2.0
-3.0 0:0.6 1:2.4 2:1.2
4 2:-1.2

# Creates a `LibSVMIter` with `batch_size`=3.
>>> data_iter = mx.io.LibSVMIter(data_libsvm = 'data.t', data_shape = (3,), batch_size = 3)
# The data of the first batch is stored in csr storage type
>>> batch = data_iter.next()
>>> csr = batch.data[0]
<CSRNDArray 3x3 @cpu(0)>
>>> csr.asnumpy()
[[ 0.5        0.          1.2 ]
[ 0.          0.          0.  ]
[ 0.6         2.4         1.2]]
# The label of first batch
>>> label = batch.label[0]
>>> label
[ 1. -2. -3.]
<NDArray 3 @cpu(0)>

>>> second_batch = data_iter.next()
# The data of the second batch
>>> second_batch.data[0].asnumpy()
[[ 0.          0.         -1.2 ]
[ 0.5         0.          1.2 ]
[ 0.          0.          0. ]]
# The label of the second batch
>>> second_batch.label[0].asnumpy()
[ 4.  1. -2.]

>>> data_iter.reset()
# To restart the iterator for the second pass of the data

When `label_libsvm` is set to the path to another LibSVM file,
data is read from `data_libsvm` and label from `label_libsvm`.
In this case, both data and label are stored in the csr format.
If the label column in the `data_libsvm` file is ignored.

Example:

# Contents of libsvm file ``label.t``
1.0
-2.0 0:0.125
-3.0 2:1.2
4 1:1.0 2:-1.2

# Creates a `LibSVMIter` with specified label file
>>> data_iter = mx.io.LibSVMIter(data_libsvm = 'data.t', data_shape = (3,),
label_libsvm = 'label.t', label_shape = (3,), batch_size = 3)

# Both data and label are in csr storage type
>>> batch = data_iter.next()
>>> csr_data = batch.data[0]
<CSRNDArray 3x3 @cpu(0)>
>>> csr_data.asnumpy()
[[ 0.5         0.          1.2  ]
[ 0.          0.          0.   ]
[ 0.6         2.4         1.2 ]]
>>> csr_label = batch.label[0]
<CSRNDArray 3x3 @cpu(0)>
>>> csr_label.asnumpy()
[[ 0.          0.          0.   ]
[ 0.125       0.          0.   ]
[ 0.          0.          1.2 ]]

Usage¶

mx.io.LibSVMIter(...)

Arguments¶

Argument	Description
`data.libsvm`	string, required. The input zero-base indexed LibSVM data file or a directory path.
`data.shape`	Shape(tuple), required. The shape of one example.
`label.libsvm`	string, optional, default=’NULL’. The input LibSVM label file or a directory path. If NULL, all labels will be read from `data_libsvm`.
`label.shape`	Shape(tuple), optional, default=[1]. The shape of one label.
`num.parts`	int, optional, default=’1’. partition the data into multiple parts
`part.index`	int, optional, default=’0’. the index of the part will read
`batch.size`	int (non-negative), required. Batch size.
`round.batch`	boolean, optional, default=1. Whether to use round robin to handle overflow batch or not.
`prefetch.buffer`	long (non-negative), optional, default=4. Maximum number of batches to prefetch.
`ctx`	{‘cpu’, ‘gpu’},optional, default=’gpu’. Context data loader optimized for.
`dtype`	{None, ‘float16’, ‘float32’, ‘float64’, ‘int32’, ‘int64’, ‘int8’, ‘uint8’},optional, default=’None’. Output data type. `None` means no change.

Value¶

iter The result mx.dataiter

Link to Source Code: http://github.com/apache/incubator-mxnet/blob/1.6.0/src/io/iter_libsvm.cc#L298

mx.io.LibSVMIter¶

Description¶

Usage¶

Arguments¶

Value¶

`mx.io.LibSVMIter`¶