mxnet
Public Member Functions | Static Public Member Functions | List of all members
dmlc::InputSplitShuffle Class Reference

class to construct input split with global shuffling More...

#include <input_split_shuffle.h>

Inheritance diagram for dmlc::InputSplitShuffle:
Inheritance graph
Collaboration diagram for dmlc::InputSplitShuffle:
Collaboration graph

Public Member Functions

virtual ~InputSplitShuffle (void)
 
virtual void BeforeFirst (void)
 reset the position of InputSplit to beginning More...
 
virtual void HintChunkSize (size_t chunk_size)
 hint the inputsplit how large the chunk size it should return when implementing NextChunk this is a hint so may not be enforced, but InputSplit will try adjust its internal buffer size to the hinted value More...
 
virtual size_t GetTotalSize (void)
 get the total size of the InputSplit More...
 
virtual bool NextRecord (Blob *out_rec)
 get the next record, the returning value is valid until next call to NextRecord, NextChunk or NextBatch caller can modify the memory content of out_rec More...
 
virtual bool NextChunk (Blob *out_chunk)
 get a chunk of memory that can contain multiple records, the caller needs to parse the content of the resulting chunk, for text file, out_chunk can contain data of multiple lines for recordio, out_chunk can contain multiple records(including headers) More...
 
virtual void ResetPartition (unsigned rank, unsigned nsplit)
 reset the Input split to a certain part id, The InputSplit will be pointed to the head of the new specified segment. This feature may not be supported by every implementation of InputSplit. More...
 
 InputSplitShuffle (const char *uri, unsigned part_index, unsigned num_parts, const char *type, unsigned num_shuffle_parts, int shuffle_seed)
 constructor More...
 
- Public Member Functions inherited from dmlc::InputSplit
virtual bool NextBatch (Blob *out_chunk, size_t n_records)
 get a chunk of memory that can contain multiple records, with hint for how many records is needed, the caller needs to parse the content of the resulting chunk, for text file, out_chunk can contain data of multiple lines for recordio, out_chunk can contain multiple records(including headers) More...
 
virtual ~InputSplit (void) DMLC_THROW_EXCEPTION
 destructor More...
 

Static Public Member Functions

static InputSplitCreate (const char *uri, unsigned part_index, unsigned num_parts, const char *type, unsigned num_shuffle_parts, int shuffle_seed)
 factory function: create input split with chunk shuffling given a uri More...
 
- Static Public Member Functions inherited from dmlc::InputSplit
static InputSplitCreate (const char *uri, unsigned part_index, unsigned num_parts, const char *type)
 factory function: create input split given a uri More...
 
static InputSplitCreate (const char *uri, const char *index_uri, unsigned part_index, unsigned num_parts, const char *type, const bool shuffle=false, const int seed=0, const size_t batch_size=256, const bool recurse_directories=false)
 factory function: create input split given a uri for input and index More...
 

Detailed Description

class to construct input split with global shuffling

Constructor & Destructor Documentation

virtual dmlc::InputSplitShuffle::~InputSplitShuffle ( void  )
inlinevirtual
dmlc::InputSplitShuffle::InputSplitShuffle ( const char *  uri,
unsigned  part_index,
unsigned  num_parts,
const char *  type,
unsigned  num_shuffle_parts,
int  shuffle_seed 
)
inline

constructor

Parameters
urithe uri of the input, can contain hdfs prefix
part_indexthe part id of current input
num_partstotal number of splits
typetype of record List of possible types: "text", "recordio"
  • "text": text file, each line is treated as a record input split will split on '\n' or '\r'
  • "recordio": binary recordio file, see recordio.h
num_shuffle_partsnumber of shuffle chunks for each split
shuffle_seedshuffle seed for chunk shuffling

Member Function Documentation

virtual void dmlc::InputSplitShuffle::BeforeFirst ( void  )
inlinevirtual

reset the position of InputSplit to beginning

Implements dmlc::InputSplit.

static InputSplit* dmlc::InputSplitShuffle::Create ( const char *  uri,
unsigned  part_index,
unsigned  num_parts,
const char *  type,
unsigned  num_shuffle_parts,
int  shuffle_seed 
)
inlinestatic

factory function: create input split with chunk shuffling given a uri

Parameters
urithe uri of the input, can contain hdfs prefix
part_indexthe part id of current input
num_partstotal number of splits
typetype of record List of possible types: "text", "recordio"
  • "text": text file, each line is treated as a record input split will split on '\n' or '\r'
  • "recordio": binary recordio file, see recordio.h
num_shuffle_partsnumber of shuffle chunks for each split
shuffle_seedshuffle seed for chunk shuffling
Returns
a new input split
See also
InputSplit::Type
virtual size_t dmlc::InputSplitShuffle::GetTotalSize ( void  )
inlinevirtual

get the total size of the InputSplit

Implements dmlc::InputSplit.

virtual void dmlc::InputSplitShuffle::HintChunkSize ( size_t  chunk_size)
inlinevirtual

hint the inputsplit how large the chunk size it should return when implementing NextChunk this is a hint so may not be enforced, but InputSplit will try adjust its internal buffer size to the hinted value

Parameters
chunk_sizethe chunk size

Reimplemented from dmlc::InputSplit.

virtual bool dmlc::InputSplitShuffle::NextChunk ( Blob out_chunk)
inlinevirtual

get a chunk of memory that can contain multiple records, the caller needs to parse the content of the resulting chunk, for text file, out_chunk can contain data of multiple lines for recordio, out_chunk can contain multiple records(including headers)

This function ensures there won't be partial record in the chunk caller can modify the memory content of out_chunk, the memory is valid until next call to NextRecord, NextChunk or NextBatch

Usually NextRecord is sufficient, NextChunk can be used by some multi-threaded parsers to parse the input content

Parameters
out_chunkused to store the result
Returns
true if we can successfully get next record false if we reached end of split
See also
InputSplit::Create for definition of record
RecordIOChunkReader to parse recordio content from out_chunk

Implements dmlc::InputSplit.

virtual bool dmlc::InputSplitShuffle::NextRecord ( Blob out_rec)
inlinevirtual

get the next record, the returning value is valid until next call to NextRecord, NextChunk or NextBatch caller can modify the memory content of out_rec

For text, out_rec contains a single line For recordio, out_rec contains one record content(with header striped)

Parameters
out_recused to store the result
Returns
true if we can successfully get next record false if we reached end of split
See also
InputSplit::Create for definition of record

Implements dmlc::InputSplit.

virtual void dmlc::InputSplitShuffle::ResetPartition ( unsigned  part_index,
unsigned  num_parts 
)
inlinevirtual

reset the Input split to a certain part id, The InputSplit will be pointed to the head of the new specified segment. This feature may not be supported by every implementation of InputSplit.

Parameters
part_indexThe part id of the new input.
num_partsThe total number of parts.

Implements dmlc::InputSplit.


The documentation for this class was generated from the following file: