input split creates that allows reading of records from split of data, independent part that covers all the dataset
More...
#include <io.h>
|
virtual void | HintChunkSize (size_t chunk_size) |
| hint the inputsplit how large the chunk size it should return when implementing NextChunk this is a hint so may not be enforced, but InputSplit will try adjust its internal buffer size to the hinted value More...
|
|
virtual size_t | GetTotalSize (void)=0 |
| get the total size of the InputSplit More...
|
|
virtual void | BeforeFirst (void)=0 |
| reset the position of InputSplit to beginning More...
|
|
virtual bool | NextRecord (Blob *out_rec)=0 |
| get the next record, the returning value is valid until next call to NextRecord, NextChunk or NextBatch caller can modify the memory content of out_rec More...
|
|
virtual bool | NextChunk (Blob *out_chunk)=0 |
| get a chunk of memory that can contain multiple records, the caller needs to parse the content of the resulting chunk, for text file, out_chunk can contain data of multiple lines for recordio, out_chunk can contain multiple records(including headers) More...
|
|
virtual bool | NextBatch (Blob *out_chunk, size_t n_records) |
| get a chunk of memory that can contain multiple records, with hint for how many records is needed, the caller needs to parse the content of the resulting chunk, for text file, out_chunk can contain data of multiple lines for recordio, out_chunk can contain multiple records(including headers) More...
|
|
virtual | ~InputSplit (void) DMLC_THROW_EXCEPTION |
| destructor More...
|
|
virtual void | ResetPartition (unsigned part_index, unsigned num_parts)=0 |
| reset the Input split to a certain part id, The InputSplit will be pointed to the head of the new specified segment. This feature may not be supported by every implementation of InputSplit. More...
|
|
|
static InputSplit * | Create (const char *uri, unsigned part_index, unsigned num_parts, const char *type) |
| factory function: create input split given a uri More...
|
|
static InputSplit * | Create (const char *uri, const char *index_uri, unsigned part_index, unsigned num_parts, const char *type, const bool shuffle=false, const int seed=0, const size_t batch_size=256, const bool recurse_directories=false) |
| factory function: create input split given a uri for input and index More...
|
|
input split creates that allows reading of records from split of data, independent part that covers all the dataset
see InputSplit::Create for definition of record
virtual dmlc::InputSplit::~InputSplit |
( |
void |
| ) |
|
|
inlinevirtual |
virtual void dmlc::InputSplit::BeforeFirst |
( |
void |
| ) |
|
|
pure virtual |
static InputSplit* dmlc::InputSplit::Create |
( |
const char * |
uri, |
|
|
unsigned |
part_index, |
|
|
unsigned |
num_parts, |
|
|
const char * |
type |
|
) |
| |
|
static |
factory function: create input split given a uri
- Parameters
-
uri | the uri of the input, can contain hdfs prefix |
part_index | the part id of current input |
num_parts | total number of splits |
type | type of record List of possible types: "text", "recordio", "indexed_recordio"
- "text": text file, each line is treated as a record input split will split on '\n' or '\r'
- "recordio": binary recordio file, see recordio.h
- "indexed_recordio": binary recordio file with index, see recordio.h
|
- Returns
- a new input split
- See also
- InputSplit::Type
static InputSplit* dmlc::InputSplit::Create |
( |
const char * |
uri, |
|
|
const char * |
index_uri, |
|
|
unsigned |
part_index, |
|
|
unsigned |
num_parts, |
|
|
const char * |
type, |
|
|
const bool |
shuffle = false , |
|
|
const int |
seed = 0 , |
|
|
const size_t |
batch_size = 256 , |
|
|
const bool |
recurse_directories = false |
|
) |
| |
|
static |
factory function: create input split given a uri for input and index
- Parameters
-
uri | the uri of the input, can contain hdfs prefix |
index_uri | the uri of the index, can contain hdfs prefix |
part_index | the part id of current input |
num_parts | total number of splits |
type | type of record List of possible types: "text", "recordio", "indexed_recordio"
- "text": text file, each line is treated as a record input split will split on '\n' or '\r'
- "recordio": binary recordio file, see recordio.h
- "indexed_recordio": binary recordio file with index, see recordio.h
|
shuffle | whether to shuffle the output from the InputSplit, supported only by "indexed_recordio" type. Defaults to "false" |
seed | random seed to use in conjunction with the "shuffle" option. Defaults to 0 |
batch_size | a hint to InputSplit what is the intended number of examples return per batch. Used only by "indexed_recordio" type |
recurse_directories | whether to recursively traverse directories |
- Returns
- a new input split
- See also
- InputSplit::Type
virtual size_t dmlc::InputSplit::GetTotalSize |
( |
void |
| ) |
|
|
pure virtual |
virtual void dmlc::InputSplit::HintChunkSize |
( |
size_t |
chunk_size | ) |
|
|
inlinevirtual |
hint the inputsplit how large the chunk size it should return when implementing NextChunk this is a hint so may not be enforced, but InputSplit will try adjust its internal buffer size to the hinted value
- Parameters
-
Reimplemented in dmlc::InputSplitShuffle.
virtual bool dmlc::InputSplit::NextBatch |
( |
Blob * |
out_chunk, |
|
|
size_t |
n_records |
|
) |
| |
|
inlinevirtual |
get a chunk of memory that can contain multiple records, with hint for how many records is needed, the caller needs to parse the content of the resulting chunk, for text file, out_chunk can contain data of multiple lines for recordio, out_chunk can contain multiple records(including headers)
This function ensures there won't be partial record in the chunk caller can modify the memory content of out_chunk, the memory is valid until next call to NextRecord, NextChunk or NextBatch
- Parameters
-
out_chunk | used to store the result |
n_records | used as a hint for how many records should be returned, may be ignored |
- Returns
- true if we can successfully get next record false if we reached end of split
- See also
- InputSplit::Create for definition of record
-
RecordIOChunkReader to parse recordio content from out_chunk
virtual bool dmlc::InputSplit::NextChunk |
( |
Blob * |
out_chunk | ) |
|
|
pure virtual |
get a chunk of memory that can contain multiple records, the caller needs to parse the content of the resulting chunk, for text file, out_chunk can contain data of multiple lines for recordio, out_chunk can contain multiple records(including headers)
This function ensures there won't be partial record in the chunk caller can modify the memory content of out_chunk, the memory is valid until next call to NextRecord, NextChunk or NextBatch
Usually NextRecord is sufficient, NextChunk can be used by some multi-threaded parsers to parse the input content
- Parameters
-
out_chunk | used to store the result |
- Returns
- true if we can successfully get next record false if we reached end of split
- See also
- InputSplit::Create for definition of record
-
RecordIOChunkReader to parse recordio content from out_chunk
Implemented in dmlc::InputSplitShuffle.
virtual bool dmlc::InputSplit::NextRecord |
( |
Blob * |
out_rec | ) |
|
|
pure virtual |
get the next record, the returning value is valid until next call to NextRecord, NextChunk or NextBatch caller can modify the memory content of out_rec
For text, out_rec contains a single line For recordio, out_rec contains one record content(with header striped)
- Parameters
-
out_rec | used to store the result |
- Returns
- true if we can successfully get next record false if we reached end of split
- See also
- InputSplit::Create for definition of record
Implemented in dmlc::InputSplitShuffle.
virtual void dmlc::InputSplit::ResetPartition |
( |
unsigned |
part_index, |
|
|
unsigned |
num_parts |
|
) |
| |
|
pure virtual |
reset the Input split to a certain part id, The InputSplit will be pointed to the head of the new specified segment. This feature may not be supported by every implementation of InputSplit.
- Parameters
-
part_index | The part id of the new input. |
num_parts | The total number of parts. |
Implemented in dmlc::InputSplitShuffle.
The documentation for this class was generated from the following file:
- /work/mxnet/3rdparty/dmlc-core/include/dmlc/io.h