Gluon Contrib API

Overview

This document lists the contrib APIs in Gluon:

mxnet.gluon.contrib Contrib neural network module.

The Gluon Contrib API, defined in the gluon.contrib package, provides many useful experimental APIs for new features. This is a place for the community to try out the new features, so that feature contributors can receive feedback.

Warning

This package contains experimental APIs and may change in the near future.

In the rest of this document, we list routines provided by the gluon.contrib package.

Contrib

Neural network

Concurrent Lays Block s concurrently.
HybridConcurrent Lays HybridBlock s concurrently.
Identity Block that passes through the input directly.
SparseEmbedding Turns non-negative integers (indexes/tokens) into dense vectors of fixed size.
SyncBatchNorm Cross-GPU Synchronized Batch normalization (SyncBN)

Recurrent neural network

VariationalDropoutCell Applies Variational Dropout on base cell.
Conv1DRNNCell 1D Convolutional RNN cell.
Conv2DRNNCell 2D Convolutional RNN cell.
Conv3DRNNCell 3D Convolutional RNN cells
Conv1DLSTMCell 1D Convolutional LSTM network cell.
Conv2DLSTMCell 2D Convolutional LSTM network cell.
Conv3DLSTMCell 3D Convolutional LSTM network cell.
Conv1DGRUCell 1D Convolutional Gated Rectified Unit (GRU) network cell.
Conv2DGRUCell 2D Convolutional Gated Rectified Unit (GRU) network cell.
Conv3DGRUCell 3D Convolutional Gated Rectified Unit (GRU) network cell.
LSTMPCell Long-Short Term Memory Projected (LSTMP) network cell.

Data

IntervalSampler Samples elements from [0, length) at fixed intervals.

Text dataset

WikiText2 WikiText-2 word-level dataset for language modeling, from Salesforce research.
WikiText103 WikiText-103 word-level dataset for language modeling, from Salesforce research.

API Reference

Contrib neural network module.

Contrib recurrent neural network module.

class mxnet.gluon.contrib.nn.Concurrent(axis=-1, prefix=None, params=None)[source]

Lays Block s concurrently.

This block feeds its input to all children blocks, and produce the output by concatenating all the children blocks’ outputs on the specified axis.

Example:

net = Concurrent()
# use net's name_scope to give children blocks appropriate names.
with net.name_scope():
    net.add(nn.Dense(10, activation='relu'))
    net.add(nn.Dense(20))
    net.add(Identity())
Parameters:axis (int, default -1) – The axis on which to concatenate the outputs.
class mxnet.gluon.contrib.nn.HybridConcurrent(axis=-1, prefix=None, params=None)[source]

Lays HybridBlock s concurrently.

This block feeds its input to all children blocks, and produce the output by concatenating all the children blocks’ outputs on the specified axis.

Example:

net = HybridConcurrent()
# use net's name_scope to give children blocks appropriate names.
with net.name_scope():
    net.add(nn.Dense(10, activation='relu'))
    net.add(nn.Dense(20))
    net.add(Identity())
Parameters:axis (int, default -1) – The axis on which to concatenate the outputs.
class mxnet.gluon.contrib.nn.Identity(prefix=None, params=None)[source]

Block that passes through the input directly.

This block can be used in conjunction with HybridConcurrent block for residual connection.

Example:

net = HybridConcurrent()
# use net's name_scope to give child Blocks appropriate names.
with net.name_scope():
    net.add(nn.Dense(10, activation='relu'))
    net.add(nn.Dense(20))
    net.add(Identity())
class mxnet.gluon.contrib.nn.SparseEmbedding(input_dim, output_dim, dtype='float32', weight_initializer=None, **kwargs)[source]

Turns non-negative integers (indexes/tokens) into dense vectors of fixed size. eg. [4, 20] -> [[0.25, 0.1], [0.6, -0.2]]

This SparseBlock is designed for distributed training with extremely large input dimension. Both weight and gradient w.r.t. weight are RowSparseNDArray.

Note: if sparse_grad is set to True, the gradient w.r.t weight will be sparse. Only a subset of optimizers support sparse gradients, including SGD, AdaGrad and Adam. By default lazy updates is turned on, which may perform differently from standard updates. For more details, please check the Optimization API at: /api/python/optimization/optimization.html

Parameters:
  • input_dim (int) – Size of the vocabulary, i.e. maximum integer index + 1.
  • output_dim (int) – Dimension of the dense embedding.
  • dtype (str or np.dtype, default 'float32') – Data type of output embeddings.
  • weight_initializer (Initializer) – Initializer for the embeddings matrix.
  • Inputs
    • data: (N-1)-D tensor with shape: (x1, x2, ..., xN-1).
  • Output
    • out: N-D tensor with shape: (x1, x2, ..., xN-1, output_dim).
class mxnet.gluon.contrib.nn.SyncBatchNorm(in_channels=0, num_devices=None, momentum=0.9, epsilon=1e-05, center=True, scale=True, use_global_stats=False, beta_initializer='zeros', gamma_initializer='ones', running_mean_initializer='zeros', running_variance_initializer='ones', **kwargs)[source]

Cross-GPU Synchronized Batch normalization (SyncBN)

Standard BN [1] implementation only normalize the data within each device. SyncBN normalizes the input within the whole mini-batch. We follow the sync-onece implmentation described in the paper [2].

Parameters:
  • in_channels (int, default 0) – Number of channels (feature maps) in input data. If not specified, initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
  • num_devices (int, default number of visible GPUs) –
  • momentum (float, default 0.9) – Momentum for the moving average.
  • epsilon (float, default 1e-5) – Small float added to variance to avoid dividing by zero.
  • center (bool, default True) – If True, add offset of beta to normalized tensor. If False, beta is ignored.
  • scale (bool, default True) – If True, multiply by gamma. If False, gamma is not used. When the next layer is linear (also e.g. nn.relu), this can be disabled since the scaling will be done by the next layer.
  • use_global_stats (bool, default False) – If True, use global moving statistics instead of local batch-norm. This will force change batch-norm into a scale shift operator. If False, use local batch-norm.
  • beta_initializer (str or Initializer, default ‘zeros’) – Initializer for the beta weight.
  • gamma_initializer (str or Initializer, default ‘ones’) – Initializer for the gamma weight.
  • moving_mean_initializer (str or Initializer, default ‘zeros’) – Initializer for the moving mean.
  • moving_variance_initializer (str or Initializer, default ‘ones’) – Initializer for the moving variance.
Inputs:
  • data: input tensor with arbitrary shape.
Outputs:
  • out: output tensor with the same shape as data.
Reference:
[1]Ioffe, Sergey, and Christian Szegedy. “Batch normalization: Accelerating deep network training by reducing internal covariate shift.” ICML 2015
[2]Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, and Amit Agrawal. “Context Encoding for Semantic Segmentation.” CVPR 2018

Contrib recurrent neural network module.

class mxnet.gluon.contrib.rnn.Conv1DRNNCell(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, ), i2h_dilate=(1, ), h2h_dilate=(1, ), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCW', activation='tanh', prefix=None, params=None)[source]

1D Convolutional RNN cell.

\[h_t = tanh(W_i \ast x_t + R_i \ast h_{t-1} + b_i)\]
Parameters:
  • input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCW’ the shape should be (C, W).
  • hidden_channels (int) – Number of output channels.
  • i2h_kernel (int or tuple of int) – Input convolution kernel sizes.
  • h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.
  • i2h_pad (int or tuple of int, default (0,)) – Pad for input convolution.
  • i2h_dilate (int or tuple of int, default (1,)) – Input convolution dilate.
  • h2h_dilate (int or tuple of int, default (1,)) – Recurrent convolution dilate.
  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.
  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.
  • i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.
  • h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.
  • conv_layout (str, default 'NCW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCW’ and ‘NWC’.
  • activation (str or Block, default 'tanh') – Type of activation function. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See Activation() for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.
  • prefix (str, default 'conv_rnn_‘) – Prefix for name of layers (and name of weight if params is None).
  • params (RNNParams, default None) – Container for weight sharing between cells. Created if None.
class mxnet.gluon.contrib.rnn.Conv2DRNNCell(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, 0), i2h_dilate=(1, 1), h2h_dilate=(1, 1), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCHW', activation='tanh', prefix=None, params=None)[source]

2D Convolutional RNN cell.

\[h_t = tanh(W_i \ast x_t + R_i \ast h_{t-1} + b_i)\]
Parameters:
  • input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCHW’ the shape should be (C, H, W).
  • hidden_channels (int) – Number of output channels.
  • i2h_kernel (int or tuple of int) – Input convolution kernel sizes.
  • h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.
  • i2h_pad (int or tuple of int, default (0, 0)) – Pad for input convolution.
  • i2h_dilate (int or tuple of int, default (1, 1)) – Input convolution dilate.
  • h2h_dilate (int or tuple of int, default (1, 1)) – Recurrent convolution dilate.
  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.
  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.
  • i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.
  • h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.
  • conv_layout (str, default 'NCHW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCHW’ and ‘NHWC’.
  • activation (str or Block, default 'tanh') – Type of activation function. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See Activation() for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.
  • prefix (str, default 'conv_rnn_‘) – Prefix for name of layers (and name of weight if params is None).
  • params (RNNParams, default None) – Container for weight sharing between cells. Created if None.
class mxnet.gluon.contrib.rnn.Conv3DRNNCell(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, 0, 0), i2h_dilate=(1, 1, 1), h2h_dilate=(1, 1, 1), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCDHW', activation='tanh', prefix=None, params=None)[source]

3D Convolutional RNN cells

\[h_t = tanh(W_i \ast x_t + R_i \ast h_{t-1} + b_i)\]
Parameters:
  • input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCDHW’ the shape should be (C, D, H, W).
  • hidden_channels (int) – Number of output channels.
  • i2h_kernel (int or tuple of int) – Input convolution kernel sizes.
  • h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.
  • i2h_pad (int or tuple of int, default (0, 0, 0)) – Pad for input convolution.
  • i2h_dilate (int or tuple of int, default (1, 1, 1)) – Input convolution dilate.
  • h2h_dilate (int or tuple of int, default (1, 1, 1)) – Recurrent convolution dilate.
  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.
  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.
  • i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.
  • h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.
  • conv_layout (str, default 'NCDHW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCDHW’ and ‘NDHWC’.
  • activation (str or Block, default 'tanh') – Type of activation function. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See Activation() for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.
  • prefix (str, default 'conv_rnn_‘) – Prefix for name of layers (and name of weight if params is None).
  • params (RNNParams, default None) – Container for weight sharing between cells. Created if None.
class mxnet.gluon.contrib.rnn.Conv1DLSTMCell(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, ), i2h_dilate=(1, ), h2h_dilate=(1, ), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCW', activation='tanh', prefix=None, params=None)[source]

1D Convolutional LSTM network cell.

“Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting” paper. Xingjian et al. NIPS2015

\[\begin{split}\begin{array}{ll} i_t = \sigma(W_i \ast x_t + R_i \ast h_{t-1} + b_i) \\ f_t = \sigma(W_f \ast x_t + R_f \ast h_{t-1} + b_f) \\ o_t = \sigma(W_o \ast x_t + R_o \ast h_{t-1} + b_o) \\ c^\prime_t = tanh(W_c \ast x_t + R_c \ast h_{t-1} + b_c) \\ c_t = f_t \circ c_{t-1} + i_t \circ c^\prime_t \\ h_t = o_t \circ tanh(c_t) \\ \end{array}\end{split}\]
Parameters:
  • input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCW’ the shape should be (C, W).
  • hidden_channels (int) – Number of output channels.
  • i2h_kernel (int or tuple of int) – Input convolution kernel sizes.
  • h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.
  • i2h_pad (int or tuple of int, default (0,)) – Pad for input convolution.
  • i2h_dilate (int or tuple of int, default (1,)) – Input convolution dilate.
  • h2h_dilate (int or tuple of int, default (1,)) – Recurrent convolution dilate.
  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.
  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.
  • i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.
  • h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.
  • conv_layout (str, default 'NCW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCW’ and ‘NWC’.
  • activation (str or Block, default 'tanh') – Type of activation function used in c^prime_t. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See Activation() for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.
  • prefix (str, default 'conv_lstm_‘) – Prefix for name of layers (and name of weight if params is None).
  • params (RNNParams, default None) – Container for weight sharing between cells. Created if None.
class mxnet.gluon.contrib.rnn.Conv2DLSTMCell(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, 0), i2h_dilate=(1, 1), h2h_dilate=(1, 1), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCHW', activation='tanh', prefix=None, params=None)[source]

2D Convolutional LSTM network cell.

“Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting” paper. Xingjian et al. NIPS2015

\[\begin{split}\begin{array}{ll} i_t = \sigma(W_i \ast x_t + R_i \ast h_{t-1} + b_i) \\ f_t = \sigma(W_f \ast x_t + R_f \ast h_{t-1} + b_f) \\ o_t = \sigma(W_o \ast x_t + R_o \ast h_{t-1} + b_o) \\ c^\prime_t = tanh(W_c \ast x_t + R_c \ast h_{t-1} + b_c) \\ c_t = f_t \circ c_{t-1} + i_t \circ c^\prime_t \\ h_t = o_t \circ tanh(c_t) \\ \end{array}\end{split}\]
Parameters:
  • input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCHW’ the shape should be (C, H, W).
  • hidden_channels (int) – Number of output channels.
  • i2h_kernel (int or tuple of int) – Input convolution kernel sizes.
  • h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.
  • i2h_pad (int or tuple of int, default (0, 0)) – Pad for input convolution.
  • i2h_dilate (int or tuple of int, default (1, 1)) – Input convolution dilate.
  • h2h_dilate (int or tuple of int, default (1, 1)) – Recurrent convolution dilate.
  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.
  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.
  • i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.
  • h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.
  • conv_layout (str, default 'NCHW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCHW’ and ‘NHWC’.
  • activation (str or Block, default 'tanh') – Type of activation function used in c^prime_t. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See Activation() for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.
  • prefix (str, default 'conv_lstm_‘) – Prefix for name of layers (and name of weight if params is None).
  • params (RNNParams, default None) – Container for weight sharing between cells. Created if None.
class mxnet.gluon.contrib.rnn.Conv3DLSTMCell(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, 0, 0), i2h_dilate=(1, 1, 1), h2h_dilate=(1, 1, 1), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCDHW', activation='tanh', prefix=None, params=None)[source]

3D Convolutional LSTM network cell.

“Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting” paper. Xingjian et al. NIPS2015

\[\begin{split}\begin{array}{ll} i_t = \sigma(W_i \ast x_t + R_i \ast h_{t-1} + b_i) \\ f_t = \sigma(W_f \ast x_t + R_f \ast h_{t-1} + b_f) \\ o_t = \sigma(W_o \ast x_t + R_o \ast h_{t-1} + b_o) \\ c^\prime_t = tanh(W_c \ast x_t + R_c \ast h_{t-1} + b_c) \\ c_t = f_t \circ c_{t-1} + i_t \circ c^\prime_t \\ h_t = o_t \circ tanh(c_t) \\ \end{array}\end{split}\]
Parameters:
  • input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCDHW’ the shape should be (C, D, H, W).
  • hidden_channels (int) – Number of output channels.
  • i2h_kernel (int or tuple of int) – Input convolution kernel sizes.
  • h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.
  • i2h_pad (int or tuple of int, default (0, 0, 0)) – Pad for input convolution.
  • i2h_dilate (int or tuple of int, default (1, 1, 1)) – Input convolution dilate.
  • h2h_dilate (int or tuple of int, default (1, 1, 1)) – Recurrent convolution dilate.
  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.
  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.
  • i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.
  • h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.
  • conv_layout (str, default 'NCDHW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCDHW’ and ‘NDHWC’.
  • activation (str or Block, default 'tanh') – Type of activation function used in c^prime_t. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See Activation() for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.
  • prefix (str, default 'conv_lstm_‘) – Prefix for name of layers (and name of weight if params is None).
  • params (RNNParams, default None) – Container for weight sharing between cells. Created if None.
class mxnet.gluon.contrib.rnn.Conv1DGRUCell(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, ), i2h_dilate=(1, ), h2h_dilate=(1, ), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCW', activation='tanh', prefix=None, params=None)[source]

1D Convolutional Gated Rectified Unit (GRU) network cell.

\[\begin{split}\begin{array}{ll} r_t = \sigma(W_r \ast x_t + R_r \ast h_{t-1} + b_r) \\ z_t = \sigma(W_z \ast x_t + R_z \ast h_{t-1} + b_z) \\ n_t = tanh(W_i \ast x_t + b_i + r_t \circ (R_n \ast h_{t-1} + b_n)) \\ h^\prime_t = (1 - z_t) \circ n_t + z_t \circ h \\ \end{array}\end{split}\]
Parameters:
  • input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCW’ the shape should be (C, W).
  • hidden_channels (int) – Number of output channels.
  • i2h_kernel (int or tuple of int) – Input convolution kernel sizes.
  • h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.
  • i2h_pad (int or tuple of int, default (0,)) – Pad for input convolution.
  • i2h_dilate (int or tuple of int, default (1,)) – Input convolution dilate.
  • h2h_dilate (int or tuple of int, default (1,)) – Recurrent convolution dilate.
  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.
  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.
  • i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.
  • h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.
  • conv_layout (str, default 'NCW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCW’ and ‘NWC’.
  • activation (str or Block, default 'tanh') – Type of activation function used in n_t. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See Activation() for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.
  • prefix (str, default 'conv_gru_‘) – Prefix for name of layers (and name of weight if params is None).
  • params (RNNParams, default None) – Container for weight sharing between cells. Created if None.
class mxnet.gluon.contrib.rnn.Conv2DGRUCell(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, 0), i2h_dilate=(1, 1), h2h_dilate=(1, 1), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCHW', activation='tanh', prefix=None, params=None)[source]

2D Convolutional Gated Rectified Unit (GRU) network cell.

\[\begin{split}\begin{array}{ll} r_t = \sigma(W_r \ast x_t + R_r \ast h_{t-1} + b_r) \\ z_t = \sigma(W_z \ast x_t + R_z \ast h_{t-1} + b_z) \\ n_t = tanh(W_i \ast x_t + b_i + r_t \circ (R_n \ast h_{t-1} + b_n)) \\ h^\prime_t = (1 - z_t) \circ n_t + z_t \circ h \\ \end{array}\end{split}\]
Parameters:
  • input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCHW’ the shape should be (C, H, W).
  • hidden_channels (int) – Number of output channels.
  • i2h_kernel (int or tuple of int) – Input convolution kernel sizes.
  • h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.
  • i2h_pad (int or tuple of int, default (0, 0)) – Pad for input convolution.
  • i2h_dilate (int or tuple of int, default (1, 1)) – Input convolution dilate.
  • h2h_dilate (int or tuple of int, default (1, 1)) – Recurrent convolution dilate.
  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.
  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.
  • i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.
  • h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.
  • conv_layout (str, default 'NCHW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCHW’ and ‘NHWC’.
  • activation (str or Block, default 'tanh') – Type of activation function used in n_t. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See Activation() for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.
  • prefix (str, default 'conv_gru_‘) – Prefix for name of layers (and name of weight if params is None).
  • params (RNNParams, default None) – Container for weight sharing between cells. Created if None.
class mxnet.gluon.contrib.rnn.Conv3DGRUCell(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, 0, 0), i2h_dilate=(1, 1, 1), h2h_dilate=(1, 1, 1), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCDHW', activation='tanh', prefix=None, params=None)[source]

3D Convolutional Gated Rectified Unit (GRU) network cell.

\[\begin{split}\begin{array}{ll} r_t = \sigma(W_r \ast x_t + R_r \ast h_{t-1} + b_r) \\ z_t = \sigma(W_z \ast x_t + R_z \ast h_{t-1} + b_z) \\ n_t = tanh(W_i \ast x_t + b_i + r_t \circ (R_n \ast h_{t-1} + b_n)) \\ h^\prime_t = (1 - z_t) \circ n_t + z_t \circ h \\ \end{array}\end{split}\]
Parameters:
  • input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCDHW’ the shape should be (C, D, H, W).
  • hidden_channels (int) – Number of output channels.
  • i2h_kernel (int or tuple of int) – Input convolution kernel sizes.
  • h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.
  • i2h_pad (int or tuple of int, default (0, 0, 0)) – Pad for input convolution.
  • i2h_dilate (int or tuple of int, default (1, 1, 1)) – Input convolution dilate.
  • h2h_dilate (int or tuple of int, default (1, 1, 1)) – Recurrent convolution dilate.
  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.
  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.
  • i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.
  • h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.
  • conv_layout (str, default 'NCDHW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCDHW’ and ‘NDHWC’.
  • activation (str or Block, default 'tanh') – Type of activation function used in n_t. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See Activation() for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.
  • prefix (str, default 'conv_gru_‘) – Prefix for name of layers (and name of weight if params is None).
  • params (RNNParams, default None) – Container for weight sharing between cells. Created if None.
class mxnet.gluon.contrib.rnn.VariationalDropoutCell(base_cell, drop_inputs=0.0, drop_states=0.0, drop_outputs=0.0)[source]

Applies Variational Dropout on base cell. (https://arxiv.org/pdf/1512.05287.pdf, https://www.stat.berkeley.edu/~tsmoon/files/Conference/asru2015.pdf).

Variational dropout uses the same dropout mask across time-steps. It can be applied to RNN inputs, outputs, and states. The masks for them are not shared.

The dropout mask is initialized when stepping forward for the first time and will remain the same until .reset() is called. Thus, if using the cell and stepping manually without calling .unroll(), the .reset() should be called after each sequence.

Parameters:
  • base_cell (RecurrentCell) – The cell on which to perform variational dropout.
  • drop_inputs (float, default 0.) – The dropout rate for inputs. Won’t apply dropout if it equals 0.
  • drop_states (float, default 0.) – The dropout rate for state inputs on the first state channel. Won’t apply dropout if it equals 0.
  • drop_outputs (float, default 0.) – The dropout rate for outputs. Won’t apply dropout if it equals 0.
unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]

Unrolls an RNN cell across time steps.

Parameters:
  • length (int) – Number of steps to unroll.
  • inputs (Symbol, list of Symbol, or None) –

    If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, ...) if layout is ‘NTC’, or (length, batch_size, ...) if layout is ‘TNC’.

    If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, ...).

  • begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.
  • layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.
  • merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, ...) if layout is ‘NTC’, or (length, batch_size, ...) if layout is ‘TNC’. If None, output whatever is faster.
  • valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.
Returns:

  • outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.
  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

class mxnet.gluon.contrib.rnn.LSTMPCell(hidden_size, projection_size, i2h_weight_initializer=None, h2h_weight_initializer=None, h2r_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0, prefix=None, params=None)[source]

Long-Short Term Memory Projected (LSTMP) network cell. (https://arxiv.org/abs/1402.1128)

Each call computes the following function:

\[\begin{split}\begin{array}{ll} i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{ri} r_{(t-1)} + b_{ri}) \\ f_t = sigmoid(W_{if} x_t + b_{if} + W_{rf} r_{(t-1)} + b_{rf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{rc} r_{(t-1)} + b_{rg}) \\ o_t = sigmoid(W_{io} x_t + b_{io} + W_{ro} r_{(t-1)} + b_{ro}) \\ c_t = f_t * c_{(t-1)} + i_t * g_t \\ h_t = o_t * \tanh(c_t) \\ r_t = W_{hr} h_t \end{array}\end{split}\]

where \(r_t\) is the projected recurrent activation at time t, \(h_t\) is the hidden state at time t, \(c_t\) is the cell state at time t, \(x_t\) is the input at time t, and \(i_t\), \(f_t\), \(g_t\), \(o_t\) are the input, forget, cell, and out gates, respectively.

Parameters:
  • hidden_size (int) – Number of units in cell state symbol.
  • projection_size (int) – Number of units in output symbol.
  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the hidden state.
  • h2r_weight_initializer (str or Initializer) – Initializer for the projection weights matrix, used for the linear transformation of the recurrent state.
  • i2h_bias_initializer (str or Initializer, default 'lstmbias') – Initializer for the bias vector. By default, bias for the forget gate is initialized to 1 while all other biases are initialized to zero.
  • h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
  • prefix (str, default 'lstmp_‘) – Prefix for name of Block`s (and name of weight if params is `None).
  • params (Parameter or None) – Container for weight sharing between cells. Created if None.
  • Inputs
    • data: input tensor with shape (batch_size, input_size).
    • states: a list of two initial recurrent state tensors, with shape (batch_size, projection_size) and (batch_size, hidden_size) respectively.
  • Outputs
    • out: output tensor with shape (batch_size, num_hidden).
    • next_states: a list of two output recurrent state tensors. Each has the same shape as states.

Contrib datasets.

class mxnet.gluon.contrib.data.IntervalSampler(length, interval, rollover=True)[source]

Samples elements from [0, length) at fixed intervals.

Parameters:
  • length (int) – Length of the sequence.
  • interval (int) – The number of items to skip between two samples.
  • rollover (bool, default True) – Whether to start again from the first skipped item after reaching the end. If true, this sampler would start again from the first skipped item until all items are visited. Otherwise, iteration stops when end is reached and skipped items are ignored.

Examples

>>> sampler = contrib.data.IntervalSampler(13, interval=3)
>>> list(sampler)
[0, 3, 6, 9, 12, 1, 4, 7, 10, 2, 5, 8, 11]
>>> sampler = contrib.data.IntervalSampler(13, interval=3, rollover=False)
>>> list(sampler)
[0, 3, 6, 9, 12]

Text datasets.

class mxnet.gluon.contrib.data.text.WikiText2(root='/work/mxnet/datasets/wikitext-2', segment='train', vocab=None, seq_len=35)[source]

WikiText-2 word-level dataset for language modeling, from Salesforce research.

From https://einstein.ai/research/the-wikitext-long-term-dependency-language-modeling-dataset

License: Creative Commons Attribution-ShareAlike

Each sample is a vector of length equal to the specified sequence length. At the end of each sentence, an end-of-sentence token ‘’ is added.

Parameters:
  • root (str, default $MXNET_HOME/datasets/wikitext-2) – Path to temp folder for storing data.
  • segment (str, default 'train') – Dataset segment. Options are ‘train’, ‘validation’, ‘test’.
  • vocab (Vocabulary, default None) – The vocabulary to use for indexing the text dataset. If None, a default vocabulary is created.
  • seq_len (int, default 35) – The sequence length of each sample, regardless of the sentence boundary.
class mxnet.gluon.contrib.data.text.WikiText103(root='/work/mxnet/datasets/wikitext-103', segment='train', vocab=None, seq_len=35)[source]

WikiText-103 word-level dataset for language modeling, from Salesforce research.

From https://einstein.ai/research/the-wikitext-long-term-dependency-language-modeling-dataset

License: Creative Commons Attribution-ShareAlike

Each sample is a vector of length equal to the specified sequence length. At the end of each sentence, an end-of-sentence token ‘’ is added.

Parameters:
  • root (str, default $MXNET_HOME/datasets/wikitext-103) – Path to temp folder for storing data.
  • segment (str, default 'train') – Dataset segment. Options are ‘train’, ‘validation’, ‘test’.
  • vocab (Vocabulary, default None) – The vocabulary to use for indexing the text dataset. If None, a default vocabulary is created.
  • seq_len (int, default 35) – The sequence length of each sample, regardless of the sentence boundary.