gluon.contrib

This document lists the contrib APIs in Gluon:

mxnet.gluon.contrib

Contrib neural network module.

The Gluon Contrib API, defined in the gluon.contrib package, provides many useful experimental APIs for new features. This is a place for the community to try out the new features, so that feature contributors can receive feedback.

Warning

This package contains experimental APIs and may change in the near future.

In the rest of this document, we list routines provided by the gluon.contrib package.

Neural Network

Concurrent

Lays Block s concurrently.

HybridConcurrent

Lays HybridBlock s concurrently.

Identity

Block that passes through the input directly.

SparseEmbedding

Turns non-negative integers (indexes/tokens) into dense vectors of fixed size.

SyncBatchNorm

Cross-GPU Synchronized Batch normalization (SyncBN)

PixelShuffle1D

Pixel-shuffle layer for upsampling in 1 dimension.

PixelShuffle2D

Pixel-shuffle layer for upsampling in 2 dimensions.

PixelShuffle3D

Pixel-shuffle layer for upsampling in 3 dimensions.

Convolutional Neural Network

DeformableConvolution

2-D Deformable Convolution v_1 (Dai, 2017).

Recurrent Neural Network

VariationalDropoutCell

Applies Variational Dropout on base cell.

Conv1DRNNCell

1D Convolutional RNN cell.

Conv2DRNNCell

2D Convolutional RNN cell.

Conv3DRNNCell

3D Convolutional RNN cells

Conv1DLSTMCell

1D Convolutional LSTM network cell.

Conv2DLSTMCell

2D Convolutional LSTM network cell.

Conv3DLSTMCell

3D Convolutional LSTM network cell.

Conv1DGRUCell

1D Convolutional Gated Rectified Unit (GRU) network cell.

Conv2DGRUCell

2D Convolutional Gated Rectified Unit (GRU) network cell.

Conv3DGRUCell

3D Convolutional Gated Rectified Unit (GRU) network cell.

LSTMPCell

Long-Short Term Memory Projected (LSTMP) network cell.

Data

IntervalSampler

Samples elements from [0, length) at fixed intervals.

Text Dataset

WikiText2

WikiText-2 word-level dataset for language modeling, from Salesforce research.

WikiText103

WikiText-103 word-level dataset for language modeling, from Salesforce research.

Estimator

Estimator

Estimator Class for easy model training

Event Handler

StoppingHandler

Stop conditions to stop training Stop training if maximum number of batches or epochs reached.

MetricHandler

Metric Handler that update metric values at batch end

ValidationHandler

Validation Handler that evaluate model on validation dataset

LoggingHandler

Basic Logging Handler that applies to every Gluon estimator by default.

CheckpointHandler

Save the model after user define period

EarlyStoppingHandler

Early stop training if monitored value is not improving

API Reference

Contrib neural network module.

Contributed neural network modules.

class mxnet.gluon.contrib.nn.Concurrent(axis=-1, prefix=None, params=None)[source]

Bases: mxnet.gluon.nn.basic_layers.Sequential

Lays Block s concurrently.

This block feeds its input to all children blocks, and produce the output by concatenating all the children blocks’ outputs on the specified axis.

Example:

net = Concurrent()
# use net's name_scope to give children blocks appropriate names.
with net.name_scope():
    net.add(nn.Dense(10, activation='relu'))
    net.add(nn.Dense(20))
    net.add(Identity())
Parameters

axis (int, default -1) – The axis on which to concatenate the outputs.

forward(x)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

Parameters

*args (list of NDArray) – Input tensors.

class mxnet.gluon.contrib.nn.HybridConcurrent(axis=-1, prefix=None, params=None)[source]

Bases: mxnet.gluon.nn.basic_layers.HybridSequential

Lays HybridBlock s concurrently.

This block feeds its input to all children blocks, and produce the output by concatenating all the children blocks’ outputs on the specified axis.

Example:

net = HybridConcurrent()
# use net's name_scope to give children blocks appropriate names.
with net.name_scope():
    net.add(nn.Dense(10, activation='relu'))
    net.add(nn.Dense(20))
    net.add(Identity())
Parameters

axis (int, default -1) – The axis on which to concatenate the outputs.

hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class mxnet.gluon.contrib.nn.Identity(prefix=None, params=None)[source]

Bases: mxnet.gluon.block.HybridBlock

Block that passes through the input directly.

This block can be used in conjunction with HybridConcurrent block for residual connection.

Example:

net = HybridConcurrent()
# use net's name_scope to give child Blocks appropriate names.
with net.name_scope():
    net.add(nn.Dense(10, activation='relu'))
    net.add(nn.Dense(20))
    net.add(Identity())
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class mxnet.gluon.contrib.nn.SparseEmbedding(input_dim, output_dim, dtype='float32', weight_initializer=None, **kwargs)[source]

Bases: mxnet.gluon.block.Block

Turns non-negative integers (indexes/tokens) into dense vectors of fixed size. eg. [4, 20] -> [[0.25, 0.1], [0.6, -0.2]]

This SparseBlock is designed for distributed training with extremely large input dimension. Both weight and gradient w.r.t. weight are RowSparseNDArray.

Note: if sparse_grad is set to True, the gradient w.r.t weight will be sparse. Only a subset of optimizers support sparse gradients, including SGD, AdaGrad and Adam. By default lazy updates is turned on, which may perform differently from standard updates. For more details, please check the Optimization API at: https://mxnet.incubator.apache.org/api/python/optimization/optimization.html

Parameters
  • input_dim (int) – Size of the vocabulary, i.e. maximum integer index + 1.

  • output_dim (int) – Dimension of the dense embedding.

  • dtype (str or np.dtype, default 'float32') – Data type of output embeddings.

  • weight_initializer (Initializer) – Initializer for the embeddings matrix.

  • Inputs

    • data: (N-1)-D tensor with shape: (x1, x2, …, xN-1).

  • Output

    • out: N-D tensor with shape: (x1, x2, …, xN-1, output_dim).

forward(x)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

Parameters

*args (list of NDArray) – Input tensors.

class mxnet.gluon.contrib.nn.SyncBatchNorm(in_channels=0, num_devices=None, momentum=0.9, epsilon=1e-05, center=True, scale=True, use_global_stats=False, beta_initializer='zeros', gamma_initializer='ones', running_mean_initializer='zeros', running_variance_initializer='ones', **kwargs)[source]

Bases: mxnet.gluon.nn.basic_layers.BatchNorm

Cross-GPU Synchronized Batch normalization (SyncBN)

Standard BN 1 implementation only normalize the data within each device. SyncBN normalizes the input within the whole mini-batch. We follow the implementation described in the paper 2.

Note: Current implementation of SyncBN does not support FP16 training. For FP16 inference, use standard nn.BatchNorm instead of SyncBN.

Parameters
  • in_channels (int, default 0) – Number of channels (feature maps) in input data. If not specified, initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.

  • num_devices (int, default number of visible GPUs) –

  • momentum (float, default 0.9) – Momentum for the moving average.

  • epsilon (float, default 1e-5) – Small float added to variance to avoid dividing by zero.

  • center (bool, default True) – If True, add offset of beta to normalized tensor. If False, beta is ignored.

  • scale (bool, default True) – If True, multiply by gamma. If False, gamma is not used. When the next layer is linear (also e.g. nn.relu), this can be disabled since the scaling will be done by the next layer.

  • use_global_stats (bool, default False) – If True, use global moving statistics instead of local batch-norm. This will force change batch-norm into a scale shift operator. If False, use local batch-norm.

  • beta_initializer (str or Initializer, default ‘zeros’) – Initializer for the beta weight.

  • gamma_initializer (str or Initializer, default ‘ones’) – Initializer for the gamma weight.

  • running_mean_initializer (str or Initializer, default ‘zeros’) – Initializer for the running mean.

  • running_variance_initializer (str or Initializer, default ‘ones’) – Initializer for the running variance.

Inputs:
  • data: input tensor with arbitrary shape.

Outputs:
  • out: output tensor with the same shape as data.

Reference:
1

Ioffe, Sergey, and Christian Szegedy. “Batch normalization: Accelerating deep network training by reducing internal covariate shift.” ICML 2015

2

Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, and Amit Agrawal. “Context Encoding for Semantic Segmentation.” CVPR 2018

hybrid_forward(F, x, gamma, beta, running_mean, running_var)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class mxnet.gluon.contrib.nn.PixelShuffle1D(factor)[source]

Bases: mxnet.gluon.block.HybridBlock

Pixel-shuffle layer for upsampling in 1 dimension.

Pixel-shuffling is the operation of taking groups of values along the channel dimension and regrouping them into blocks of pixels along the W dimension, thereby effectively multiplying that dimension by a constant factor in size.

For example, a feature map of shape \((fC, W)\) is reshaped into \((C, fW)\) by forming little value groups of size \(f\) and arranging them in a grid of size \(W\).

Parameters
  • factor (int or 1-tuple of int) – Upsampling factor, applied to the W dimension.

  • Inputs

    • data: Tensor of shape (N, f*C, W).

  • Outputs

    • out: Tensor of shape (N, C, W*f).

Examples

>>> pxshuf = PixelShuffle1D(2)
>>> x = mx.nd.zeros((1, 8, 3))
>>> pxshuf(x).shape
(1, 4, 6)
hybrid_forward(F, x)[source]

Perform pixel-shuffling on the input.

class mxnet.gluon.contrib.nn.PixelShuffle2D(factor)[source]

Bases: mxnet.gluon.block.HybridBlock

Pixel-shuffle layer for upsampling in 2 dimensions.

Pixel-shuffling is the operation of taking groups of values along the channel dimension and regrouping them into blocks of pixels along the H and W dimensions, thereby effectively multiplying those dimensions by a constant factor in size.

For example, a feature map of shape \((f^2 C, H, W)\) is reshaped into \((C, fH, fW)\) by forming little \(f \times f\) blocks of pixels and arranging them in an \(H \times W\) grid.

Pixel-shuffling together with regular convolution is an alternative, learnable way of upsampling an image by arbitrary factors. It is reported to help overcome checkerboard artifacts that are common in upsampling with transposed convolutions (also called deconvolutions). See the paper Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network for further details.

Parameters
  • factor (int or 2-tuple of int) – Upsampling factors, applied to the H and W dimensions, in that order.

  • Inputs

    • data: Tensor of shape (N, f1*f2*C, H, W).

  • Outputs

    • out: Tensor of shape (N, C, H*f1, W*f2).

Examples

>>> pxshuf = PixelShuffle2D((2, 3))
>>> x = mx.nd.zeros((1, 12, 3, 5))
>>> pxshuf(x).shape
(1, 2, 6, 15)
hybrid_forward(F, x)[source]

Perform pixel-shuffling on the input.

class mxnet.gluon.contrib.nn.PixelShuffle3D(factor)[source]

Bases: mxnet.gluon.block.HybridBlock

Pixel-shuffle layer for upsampling in 3 dimensions.

Pixel-shuffling (or voxel-shuffling in 3D) is the operation of taking groups of values along the channel dimension and regrouping them into blocks of voxels along the D, H and W dimensions, thereby effectively multiplying those dimensions by a constant factor in size.

For example, a feature map of shape \((f^3 C, D, H, W)\) is reshaped into \((C, fD, fH, fW)\) by forming little \(f \times f \times f\) blocks of voxels and arranging them in a \(D \times H \times W\) grid.

Pixel-shuffling together with regular convolution is an alternative, learnable way of upsampling an image by arbitrary factors. It is reported to help overcome checkerboard artifacts that are common in upsampling with transposed convolutions (also called deconvolutions). See the paper Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network for further details.

Parameters
  • factor (int or 3-tuple of int) – Upsampling factors, applied to the D, H and W dimensions, in that order.

  • Inputs

    • data: Tensor of shape (N, f1*f2*f3*C, D, H, W).

  • Outputs

    • out: Tensor of shape (N, C, D*f1, H*f2, W*f3).

Examples

>>> pxshuf = PixelShuffle3D((2, 3, 4))
>>> x = mx.nd.zeros((1, 48, 3, 5, 7))
>>> pxshuf(x).shape
(1, 2, 6, 15, 28)
hybrid_forward(F, x)[source]

Perform pixel-shuffling on the input.

Contrib convolutional neural network module.

class mxnet.gluon.contrib.cnn.DeformableConvolution(channels, kernel_size=(1, 1), strides=(1, 1), padding=(0, 0), dilation=(1, 1), groups=1, num_deformable_group=1, layout='NCHW', use_bias=True, in_channels=0, activation=None, weight_initializer=None, bias_initializer='zeros', offset_weight_initializer='zeros', offset_bias_initializer='zeros', offset_use_bias=True, op_name='DeformableConvolution', adj=None, prefix=None, params=None)[source]

Bases: mxnet.gluon.block.HybridBlock

2-D Deformable Convolution v_1 (Dai, 2017). Normal Convolution uses sampling points in a regular grid, while the sampling points of Deformablem Convolution can be offset. The offset is learned with a separate convolution layer during the training. Both the convolution layer for generating the output features and the offsets are included in this gluon layer.

Parameters
  • channels (int,) – The dimensionality of the output space i.e. the number of output channels in the convolution.

  • kernel_size (int or tuple/list of 2 ints, (Default value = (1,1))) – Specifies the dimensions of the convolution window.

  • strides (int or tuple/list of 2 ints, (Default value = (1,1))) – Specifies the strides of the convolution.

  • padding (int or tuple/list of 2 ints, (Default value = (0,0))) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points.

  • dilation (int or tuple/list of 2 ints, (Default value = (1,1))) – Specifies the dilation rate to use for dilated convolution.

  • groups (int, (Default value = 1)) – Controls the connections between inputs and outputs. At groups=1, all inputs are convolved to all outputs. At groups=2, the operation becomes equivalent to having two convolution layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.

  • num_deformable_group (int, (Default value = 1)) – Number of deformable group partitions.

  • layout (str, (Default value = NCHW)) – Dimension ordering of data and weight. Can be ‘NCW’, ‘NWC’, ‘NCHW’, ‘NHWC’, ‘NCDHW’, ‘NDHWC’, etc. ‘N’, ‘C’, ‘H’, ‘W’, ‘D’ stands for batch, channel, height, width and depth dimensions respectively. Convolution is performed over ‘D’, ‘H’, and ‘W’ dimensions.

  • use_bias (bool, (Default value = True)) – Whether the layer for generating the output features uses a bias vector.

  • in_channels (int, (Default value = 0)) – The number of input channels to this layer. If not specified, initialization will be deferred to the first time forward is called and input channels will be inferred from the shape of input data.

  • activation (str, (Default value = None)) – Activation function to use. See Activation(). If you don’t specify anything, no activation is applied (ie. “linear” activation: a(x) = x).

  • weight_initializer (str or Initializer, (Default value = None)) – Initializer for the weight weights matrix for the convolution layer for generating the output features.

  • bias_initializer (str or Initializer, (Default value = zeros)) – Initializer for the bias vector for the convolution layer for generating the output features.

  • offset_weight_initializer (str or Initializer, (Default value = zeros)) – Initializer for the weight weights matrix for the convolution layer for generating the offset.

  • offset_bias_initializer (str or Initializer, (Default value = zeros),) – Initializer for the bias vector for the convolution layer for generating the offset.

  • offset_use_bias (bool, (Default value = True)) – Whether the layer for generating the offset uses a bias vector.

  • Inputs

    • data: 4D input tensor with shape (batch_size, in_channels, height, width) when layout is NCHW. For other layouts shape is permuted accordingly.

  • Outputs

    • out: 4D output tensor with shape (batch_size, channels, out_height, out_width) when layout is NCHW. out_height and out_width are calculated as:

      out_height = floor((height+2*padding[0]-dilation[0]*(kernel_size[0]-1)-1)/stride[0])+1
      out_width = floor((width+2*padding[1]-dilation[1]*(kernel_size[1]-1)-1)/stride[1])+1
      

hybrid_forward(F, x, offset_weight, deformable_conv_weight, offset_bias=None, deformable_conv_bias=None)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class mxnet.gluon.contrib.cnn.ModulatedDeformableConvolution(channels, kernel_size=(1, 1), strides=(1, 1), padding=(0, 0), dilation=(1, 1), groups=1, num_deformable_group=1, layout='NCHW', use_bias=True, in_channels=0, activation=None, weight_initializer=None, bias_initializer='zeros', offset_weight_initializer='zeros', offset_bias_initializer='zeros', offset_use_bias=True, op_name='ModulatedDeformableConvolution', adj=None, prefix=None, params=None)[source]

Bases: mxnet.gluon.block.HybridBlock

2-D Deformable Convolution v2 (Dai, 2018).

The modulated deformable convolution operation is described in https://arxiv.org/abs/1811.11168

Parameters
  • channels (int,) – The dimensionality of the output space i.e. the number of output channels in the convolution.

  • kernel_size (int or tuple/list of 2 ints, (Default value = (1,1))) – Specifies the dimensions of the convolution window.

  • strides (int or tuple/list of 2 ints, (Default value = (1,1))) – Specifies the strides of the convolution.

  • padding (int or tuple/list of 2 ints, (Default value = (0,0))) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points.

  • dilation (int or tuple/list of 2 ints, (Default value = (1,1))) – Specifies the dilation rate to use for dilated convolution.

  • groups (int, (Default value = 1)) – Controls the connections between inputs and outputs. At groups=1, all inputs are convolved to all outputs. At groups=2, the operation becomes equivalent to having two convolution layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.

  • num_deformable_group (int, (Default value = 1)) – Number of deformable group partitions.

  • layout (str, (Default value = NCHW)) – Dimension ordering of data and weight. Can be ‘NCW’, ‘NWC’, ‘NCHW’, ‘NHWC’, ‘NCDHW’, ‘NDHWC’, etc. ‘N’, ‘C’, ‘H’, ‘W’, ‘D’ stands for batch, channel, height, width and depth dimensions respectively. Convolution is performed over ‘D’, ‘H’, and ‘W’ dimensions.

  • use_bias (bool, (Default value = True)) – Whether the layer for generating the output features uses a bias vector.

  • in_channels (int, (Default value = 0)) – The number of input channels to this layer. If not specified, initialization will be deferred to the first time forward is called and input channels will be inferred from the shape of input data.

  • activation (str, (Default value = None)) – Activation function to use. See Activation(). If you don’t specify anything, no activation is applied (ie. “linear” activation: a(x) = x).

  • weight_initializer (str or Initializer, (Default value = None)) – Initializer for the weight weights matrix for the convolution layer for generating the output features.

  • bias_initializer (str or Initializer, (Default value = zeros)) – Initializer for the bias vector for the convolution layer for generating the output features.

  • offset_weight_initializer (str or Initializer, (Default value = zeros)) – Initializer for the weight weights matrix for the convolution layer for generating the offset.

  • offset_bias_initializer (str or Initializer, (Default value = zeros),) – Initializer for the bias vector for the convolution layer for generating the offset.

  • offset_use_bias (bool, (Default value = True)) – Whether the layer for generating the offset uses a bias vector.

  • Inputs

    • data: 4D input tensor with shape (batch_size, in_channels, height, width) when layout is NCHW. For other layouts shape is permuted accordingly.

  • Outputs

    • out: 4D output tensor with shape (batch_size, channels, out_height, out_width) when layout is NCHW. out_height and out_width are calculated as:

      out_height = floor((height+2*padding[0]-dilation[0]*(kernel_size[0]-1)-1)/stride[0])+1
      out_width = floor((width+2*padding[1]-dilation[1]*(kernel_size[1]-1)-1)/stride[1])+1
      

hybrid_forward(F, x, offset_weight, deformable_conv_weight, offset_bias=None, deformable_conv_bias=None)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

Contrib recurrent neural network module.

class mxnet.gluon.contrib.rnn.Conv1DRNNCell(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, ), i2h_dilate=(1, ), h2h_dilate=(1, ), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCW', activation='tanh', prefix=None, params=None)[source]

Bases: mxnet.gluon.contrib.rnn.conv_rnn_cell._ConvRNNCell

1D Convolutional RNN cell.

\[h_t = tanh(W_i \ast x_t + R_i \ast h_{t-1} + b_i)\]
Parameters
  • input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCW’ the shape should be (C, W).

  • hidden_channels (int) – Number of output channels.

  • i2h_kernel (int or tuple of int) – Input convolution kernel sizes.

  • h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.

  • i2h_pad (int or tuple of int, default (0,)) – Pad for input convolution.

  • i2h_dilate (int or tuple of int, default (1,)) – Input convolution dilate.

  • h2h_dilate (int or tuple of int, default (1,)) – Recurrent convolution dilate.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.

  • i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.

  • h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.

  • conv_layout (str, default 'NCW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCW’ and ‘NWC’.

  • activation (str or gluon.Block, default 'tanh') – Type of activation function. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See Activation() for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.

  • prefix (str, default 'conv_rnn_’) – Prefix for name of layers (and name of weight if params is None).

  • params (RNNParams, default None) – Container for weight sharing between cells. Created if None.

class mxnet.gluon.contrib.rnn.Conv2DRNNCell(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, 0), i2h_dilate=(1, 1), h2h_dilate=(1, 1), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCHW', activation='tanh', prefix=None, params=None)[source]

Bases: mxnet.gluon.contrib.rnn.conv_rnn_cell._ConvRNNCell

2D Convolutional RNN cell.

\[h_t = tanh(W_i \ast x_t + R_i \ast h_{t-1} + b_i)\]
Parameters
  • input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCHW’ the shape should be (C, H, W).

  • hidden_channels (int) – Number of output channels.

  • i2h_kernel (int or tuple of int) – Input convolution kernel sizes.

  • h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.

  • i2h_pad (int or tuple of int, default (0, 0)) – Pad for input convolution.

  • i2h_dilate (int or tuple of int, default (1, 1)) – Input convolution dilate.

  • h2h_dilate (int or tuple of int, default (1, 1)) – Recurrent convolution dilate.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.

  • i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.

  • h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.

  • conv_layout (str, default 'NCHW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCHW’ and ‘NHWC’.

  • activation (str or gluon.Block, default 'tanh') – Type of activation function. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See Activation() for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.

  • prefix (str, default 'conv_rnn_’) – Prefix for name of layers (and name of weight if params is None).

  • params (RNNParams, default None) – Container for weight sharing between cells. Created if None.

class mxnet.gluon.contrib.rnn.Conv3DRNNCell(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, 0, 0), i2h_dilate=(1, 1, 1), h2h_dilate=(1, 1, 1), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCDHW', activation='tanh', prefix=None, params=None)[source]

Bases: mxnet.gluon.contrib.rnn.conv_rnn_cell._ConvRNNCell

3D Convolutional RNN cells

\[h_t = tanh(W_i \ast x_t + R_i \ast h_{t-1} + b_i)\]
Parameters
  • input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCDHW’ the shape should be (C, D, H, W).

  • hidden_channels (int) – Number of output channels.

  • i2h_kernel (int or tuple of int) – Input convolution kernel sizes.

  • h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.

  • i2h_pad (int or tuple of int, default (0, 0, 0)) – Pad for input convolution.

  • i2h_dilate (int or tuple of int, default (1, 1, 1)) – Input convolution dilate.

  • h2h_dilate (int or tuple of int, default (1, 1, 1)) – Recurrent convolution dilate.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.

  • i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.

  • h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.

  • conv_layout (str, default 'NCDHW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCDHW’ and ‘NDHWC’.

  • activation (str or gluon.Block, default 'tanh') – Type of activation function. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See Activation() for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.

  • prefix (str, default 'conv_rnn_’) – Prefix for name of layers (and name of weight if params is None).

  • params (RNNParams, default None) – Container for weight sharing between cells. Created if None.

class mxnet.gluon.contrib.rnn.Conv1DLSTMCell(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, ), i2h_dilate=(1, ), h2h_dilate=(1, ), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCW', activation='tanh', prefix=None, params=None)[source]

Bases: mxnet.gluon.contrib.rnn.conv_rnn_cell._ConvLSTMCell

1D Convolutional LSTM network cell.

“Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting” paper. Xingjian et al. NIPS2015

\[\begin{split}\begin{array}{ll} i_t = \sigma(W_i \ast x_t + R_i \ast h_{t-1} + b_i) \\ f_t = \sigma(W_f \ast x_t + R_f \ast h_{t-1} + b_f) \\ o_t = \sigma(W_o \ast x_t + R_o \ast h_{t-1} + b_o) \\ c^\prime_t = tanh(W_c \ast x_t + R_c \ast h_{t-1} + b_c) \\ c_t = f_t \circ c_{t-1} + i_t \circ c^\prime_t \\ h_t = o_t \circ tanh(c_t) \\ \end{array}\end{split}\]
Parameters
  • input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCW’ the shape should be (C, W).

  • hidden_channels (int) – Number of output channels.

  • i2h_kernel (int or tuple of int) – Input convolution kernel sizes.

  • h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.

  • i2h_pad (int or tuple of int, default (0,)) – Pad for input convolution.

  • i2h_dilate (int or tuple of int, default (1,)) – Input convolution dilate.

  • h2h_dilate (int or tuple of int, default (1,)) – Recurrent convolution dilate.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.

  • i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.

  • h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.

  • conv_layout (str, default 'NCW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCW’ and ‘NWC’.

  • activation (str or gluon.Block, default 'tanh') – Type of activation function used in c^prime_t. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See Activation() for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.

  • prefix (str, default 'conv_lstm_’) – Prefix for name of layers (and name of weight if params is None).

  • params (RNNParams, default None) – Container for weight sharing between cells. Created if None.

class mxnet.gluon.contrib.rnn.Conv2DLSTMCell(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, 0), i2h_dilate=(1, 1), h2h_dilate=(1, 1), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCHW', activation='tanh', prefix=None, params=None)[source]

Bases: mxnet.gluon.contrib.rnn.conv_rnn_cell._ConvLSTMCell

2D Convolutional LSTM network cell.

“Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting” paper. Xingjian et al. NIPS2015

\[\begin{split}\begin{array}{ll} i_t = \sigma(W_i \ast x_t + R_i \ast h_{t-1} + b_i) \\ f_t = \sigma(W_f \ast x_t + R_f \ast h_{t-1} + b_f) \\ o_t = \sigma(W_o \ast x_t + R_o \ast h_{t-1} + b_o) \\ c^\prime_t = tanh(W_c \ast x_t + R_c \ast h_{t-1} + b_c) \\ c_t = f_t \circ c_{t-1} + i_t \circ c^\prime_t \\ h_t = o_t \circ tanh(c_t) \\ \end{array}\end{split}\]
Parameters
  • input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCHW’ the shape should be (C, H, W).

  • hidden_channels (int) – Number of output channels.

  • i2h_kernel (int or tuple of int) – Input convolution kernel sizes.

  • h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.

  • i2h_pad (int or tuple of int, default (0, 0)) – Pad for input convolution.

  • i2h_dilate (int or tuple of int, default (1, 1)) – Input convolution dilate.

  • h2h_dilate (int or tuple of int, default (1, 1)) – Recurrent convolution dilate.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.

  • i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.

  • h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.

  • conv_layout (str, default 'NCHW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCHW’ and ‘NHWC’.

  • activation (str or gluon.Block, default 'tanh') – Type of activation function used in c^prime_t. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See Activation() for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.

  • prefix (str, default 'conv_lstm_’) – Prefix for name of layers (and name of weight if params is None).

  • params (RNNParams, default None) – Container for weight sharing between cells. Created if None.

class mxnet.gluon.contrib.rnn.Conv3DLSTMCell(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, 0, 0), i2h_dilate=(1, 1, 1), h2h_dilate=(1, 1, 1), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCDHW', activation='tanh', prefix=None, params=None)[source]

Bases: mxnet.gluon.contrib.rnn.conv_rnn_cell._ConvLSTMCell

3D Convolutional LSTM network cell.

“Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting” paper. Xingjian et al. NIPS2015

\[\begin{split}\begin{array}{ll} i_t = \sigma(W_i \ast x_t + R_i \ast h_{t-1} + b_i) \\ f_t = \sigma(W_f \ast x_t + R_f \ast h_{t-1} + b_f) \\ o_t = \sigma(W_o \ast x_t + R_o \ast h_{t-1} + b_o) \\ c^\prime_t = tanh(W_c \ast x_t + R_c \ast h_{t-1} + b_c) \\ c_t = f_t \circ c_{t-1} + i_t \circ c^\prime_t \\ h_t = o_t \circ tanh(c_t) \\ \end{array}\end{split}\]
Parameters
  • input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCDHW’ the shape should be (C, D, H, W).

  • hidden_channels (int) – Number of output channels.

  • i2h_kernel (int or tuple of int) – Input convolution kernel sizes.

  • h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.

  • i2h_pad (int or tuple of int, default (0, 0, 0)) – Pad for input convolution.

  • i2h_dilate (int or tuple of int, default (1, 1, 1)) – Input convolution dilate.

  • h2h_dilate (int or tuple of int, default (1, 1, 1)) – Recurrent convolution dilate.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.

  • i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.

  • h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.

  • conv_layout (str, default 'NCDHW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCDHW’ and ‘NDHWC’.

  • activation (str or gluon.Block, default 'tanh') – Type of activation function used in c^prime_t. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See Activation() for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.

  • prefix (str, default 'conv_lstm_’) – Prefix for name of layers (and name of weight if params is None).

  • params (RNNParams, default None) – Container for weight sharing between cells. Created if None.

class mxnet.gluon.contrib.rnn.Conv1DGRUCell(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, ), i2h_dilate=(1, ), h2h_dilate=(1, ), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCW', activation='tanh', prefix=None, params=None)[source]

Bases: mxnet.gluon.contrib.rnn.conv_rnn_cell._ConvGRUCell

1D Convolutional Gated Rectified Unit (GRU) network cell.

\[\begin{split}\begin{array}{ll} r_t = \sigma(W_r \ast x_t + R_r \ast h_{t-1} + b_r) \\ z_t = \sigma(W_z \ast x_t + R_z \ast h_{t-1} + b_z) \\ n_t = tanh(W_i \ast x_t + b_i + r_t \circ (R_n \ast h_{t-1} + b_n)) \\ h^\prime_t = (1 - z_t) \circ n_t + z_t \circ h \\ \end{array}\end{split}\]
Parameters
  • input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCW’ the shape should be (C, W).

  • hidden_channels (int) – Number of output channels.

  • i2h_kernel (int or tuple of int) – Input convolution kernel sizes.

  • h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.

  • i2h_pad (int or tuple of int, default (0,)) – Pad for input convolution.

  • i2h_dilate (int or tuple of int, default (1,)) – Input convolution dilate.

  • h2h_dilate (int or tuple of int, default (1,)) – Recurrent convolution dilate.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.

  • i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.

  • h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.

  • conv_layout (str, default 'NCW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCW’ and ‘NWC’.

  • activation (str or gluon.Block, default 'tanh') – Type of activation function used in n_t. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See Activation() for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.

  • prefix (str, default 'conv_gru_’) – Prefix for name of layers (and name of weight if params is None).

  • params (RNNParams, default None) – Container for weight sharing between cells. Created if None.

class mxnet.gluon.contrib.rnn.Conv2DGRUCell(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, 0), i2h_dilate=(1, 1), h2h_dilate=(1, 1), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCHW', activation='tanh', prefix=None, params=None)[source]

Bases: mxnet.gluon.contrib.rnn.conv_rnn_cell._ConvGRUCell

2D Convolutional Gated Rectified Unit (GRU) network cell.

\[\begin{split}\begin{array}{ll} r_t = \sigma(W_r \ast x_t + R_r \ast h_{t-1} + b_r) \\ z_t = \sigma(W_z \ast x_t + R_z \ast h_{t-1} + b_z) \\ n_t = tanh(W_i \ast x_t + b_i + r_t \circ (R_n \ast h_{t-1} + b_n)) \\ h^\prime_t = (1 - z_t) \circ n_t + z_t \circ h \\ \end{array}\end{split}\]
Parameters
  • input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCHW’ the shape should be (C, H, W).

  • hidden_channels (int) – Number of output channels.

  • i2h_kernel (int or tuple of int) – Input convolution kernel sizes.

  • h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.

  • i2h_pad (int or tuple of int, default (0, 0)) – Pad for input convolution.

  • i2h_dilate (int or tuple of int, default (1, 1)) – Input convolution dilate.

  • h2h_dilate (int or tuple of int, default (1, 1)) – Recurrent convolution dilate.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.

  • i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.

  • h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.

  • conv_layout (str, default 'NCHW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCHW’ and ‘NHWC’.

  • activation (str or gluon.Block, default 'tanh') – Type of activation function used in n_t. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See Activation() for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.

  • prefix (str, default 'conv_gru_’) – Prefix for name of layers (and name of weight if params is None).

  • params (RNNParams, default None) – Container for weight sharing between cells. Created if None.

class mxnet.gluon.contrib.rnn.Conv3DGRUCell(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, 0, 0), i2h_dilate=(1, 1, 1), h2h_dilate=(1, 1, 1), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCDHW', activation='tanh', prefix=None, params=None)[source]

Bases: mxnet.gluon.contrib.rnn.conv_rnn_cell._ConvGRUCell

3D Convolutional Gated Rectified Unit (GRU) network cell.

\[\begin{split}\begin{array}{ll} r_t = \sigma(W_r \ast x_t + R_r \ast h_{t-1} + b_r) \\ z_t = \sigma(W_z \ast x_t + R_z \ast h_{t-1} + b_z) \\ n_t = tanh(W_i \ast x_t + b_i + r_t \circ (R_n \ast h_{t-1} + b_n)) \\ h^\prime_t = (1 - z_t) \circ n_t + z_t \circ h \\ \end{array}\end{split}\]
Parameters
  • input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCDHW’ the shape should be (C, D, H, W).

  • hidden_channels (int) – Number of output channels.

  • i2h_kernel (int or tuple of int) – Input convolution kernel sizes.

  • h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.

  • i2h_pad (int or tuple of int, default (0, 0, 0)) – Pad for input convolution.

  • i2h_dilate (int or tuple of int, default (1, 1, 1)) – Input convolution dilate.

  • h2h_dilate (int or tuple of int, default (1, 1, 1)) – Recurrent convolution dilate.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.

  • i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.

  • h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.

  • conv_layout (str, default 'NCDHW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCDHW’ and ‘NDHWC’.

  • activation (str or gluon.Block, default 'tanh') – Type of activation function used in n_t. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See Activation() for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.

  • prefix (str, default 'conv_gru_’) – Prefix for name of layers (and name of weight if params is None).

  • params (RNNParams, default None) – Container for weight sharing between cells. Created if None.

class mxnet.gluon.contrib.rnn.VariationalDropoutCell(base_cell, drop_inputs=0.0, drop_states=0.0, drop_outputs=0.0)[source]

Bases: mxnet.gluon.rnn.rnn_cell.ModifierCell

Applies Variational Dropout on base cell. https://arxiv.org/pdf/1512.05287.pdf

Variational dropout uses the same dropout mask across time-steps. It can be applied to RNN inputs, outputs, and states. The masks for them are not shared.

The dropout mask is initialized when stepping forward for the first time and will remain the same until .reset() is called. Thus, if using the cell and stepping manually without calling .unroll(), the .reset() should be called after each sequence.

Parameters
  • base_cell (RecurrentCell) – The cell on which to perform variational dropout.

  • drop_inputs (float, default 0.) – The dropout rate for inputs. Won’t apply dropout if it equals 0.

  • drop_states (float, default 0.) – The dropout rate for state inputs on the first state channel. Won’t apply dropout if it equals 0.

  • drop_outputs (float, default 0.) – The dropout rate for outputs. Won’t apply dropout if it equals 0.

hybrid_forward(F, inputs, states)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

reset()[source]

Reset before re-using the cell for another graph.

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]

Unrolls an RNN cell across time steps.

Parameters
  • length (int) – Number of steps to unroll.

  • inputs (Symbol, list of Symbol, or None) –

    If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

    If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

  • begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

  • layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

  • merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

  • valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns

  • outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

class mxnet.gluon.contrib.rnn.LSTMPCell(hidden_size, projection_size, i2h_weight_initializer=None, h2h_weight_initializer=None, h2r_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0, prefix=None, params=None)[source]

Bases: mxnet.gluon.rnn.rnn_cell.HybridRecurrentCell

Long-Short Term Memory Projected (LSTMP) network cell. (https://arxiv.org/abs/1402.1128)

Each call computes the following function:

\[\begin{split}\begin{array}{ll} i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{ri} r_{(t-1)} + b_{ri}) \\ f_t = sigmoid(W_{if} x_t + b_{if} + W_{rf} r_{(t-1)} + b_{rf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{rc} r_{(t-1)} + b_{rg}) \\ o_t = sigmoid(W_{io} x_t + b_{io} + W_{ro} r_{(t-1)} + b_{ro}) \\ c_t = f_t * c_{(t-1)} + i_t * g_t \\ h_t = o_t * \tanh(c_t) \\ r_t = W_{hr} h_t \end{array}\end{split}\]

where \(r_t\) is the projected recurrent activation at time t, \(h_t\) is the hidden state at time t, \(c_t\) is the cell state at time t, \(x_t\) is the input at time t, and \(i_t\), \(f_t\), \(g_t\), \(o_t\) are the input, forget, cell, and out gates, respectively.

Parameters
  • hidden_size (int) – Number of units in cell state symbol.

  • projection_size (int) – Number of units in output symbol.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the hidden state.

  • h2r_weight_initializer (str or Initializer) – Initializer for the projection weights matrix, used for the linear transformation of the recurrent state.

  • i2h_bias_initializer (str or Initializer, default 'lstmbias') – Initializer for the bias vector. By default, bias for the forget gate is initialized to 1 while all other biases are initialized to zero.

  • h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.

  • prefix (str, default 'lstmp_’) – Prefix for name of Block`s (and name of weight if params is `None).

  • params (Parameter or None) – Container for weight sharing between cells. Created if None.

  • Inputs

    • data: input tensor with shape (batch_size, input_size).

    • states: a list of two initial recurrent state tensors, with shape (batch_size, projection_size) and (batch_size, hidden_size) respectively.

  • Outputs

    • out: output tensor with shape (batch_size, num_hidden).

    • next_states: a list of two output recurrent state tensors. Each has the same shape as states.

hybrid_forward(F, inputs, states, i2h_weight, h2h_weight, h2r_weight, i2h_bias, h2h_bias)[source]

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

state_info(batch_size=0)[source]

shape and layout information of states

Dataset sampler.

class mxnet.gluon.contrib.data.sampler.IntervalSampler(length, interval, rollover=True)[source]

Bases: mxnet.gluon.data.sampler.Sampler

Samples elements from [0, length) at fixed intervals.

Parameters
  • length (int) – Length of the sequence.

  • interval (int) – The number of items to skip between two samples.

  • rollover (bool, default True) – Whether to start again from the first skipped item after reaching the end. If true, this sampler would start again from the first skipped item until all items are visited. Otherwise, iteration stops when end is reached and skipped items are ignored.

Examples

>>> sampler = contrib.data.IntervalSampler(13, interval=3)
>>> list(sampler)
[0, 3, 6, 9, 12, 1, 4, 7, 10, 2, 5, 8, 11]
>>> sampler = contrib.data.IntervalSampler(13, interval=3, rollover=False)
>>> list(sampler)
[0, 3, 6, 9, 12]

Text datasets.

class mxnet.gluon.contrib.data.text.WikiText2(root='/home/jenkins_slave/.mxnet/datasets/wikitext-2', segment='train', vocab=None, seq_len=35)[source]

Bases: mxnet.gluon.contrib.data.text._WikiText

WikiText-2 word-level dataset for language modeling, from Salesforce research.

From https://blog.einstein.ai/the-wikitext-long-term-dependency-language-modeling-dataset/

License: Creative Commons Attribution-ShareAlike

Each sample is a vector of length equal to the specified sequence length. At the end of each sentence, an end-of-sentence token ‘<eos>’ is added.

Parameters
  • root (str, default $MXNET_HOME/datasets/wikitext-2) – Path to temp folder for storing data.

  • segment (str, default 'train') – Dataset segment. Options are ‘train’, ‘validation’, ‘test’.

  • vocab (Vocabulary, default None) – The vocabulary to use for indexing the text dataset. If None, a default vocabulary is created.

  • seq_len (int, default 35) – The sequence length of each sample, regardless of the sentence boundary.

class mxnet.gluon.contrib.data.text.WikiText103(root='/home/jenkins_slave/.mxnet/datasets/wikitext-103', segment='train', vocab=None, seq_len=35)[source]

Bases: mxnet.gluon.contrib.data.text._WikiText

WikiText-103 word-level dataset for language modeling, from Salesforce research.

From https://blog.einstein.ai/the-wikitext-long-term-dependency-language-modeling-dataset/

License: Creative Commons Attribution-ShareAlike

Each sample is a vector of length equal to the specified sequence length. At the end of each sentence, an end-of-sentence token ‘<eos>’ is added.

Parameters
  • root (str, default $MXNET_HOME/datasets/wikitext-103) – Path to temp folder for storing data.

  • segment (str, default 'train') – Dataset segment. Options are ‘train’, ‘validation’, ‘test’.

  • vocab (Vocabulary, default None) – The vocabulary to use for indexing the text dataset. If None, a default vocabulary is created.

  • seq_len (int, default 35) – The sequence length of each sample, regardless of the sentence boundary.

Gluon Estimator Module

class mxnet.gluon.contrib.estimator.BatchProcessor[source]

Bases: object

BatchProcessor Class for plug and play fit_batch & evaluate_batch

During training or validation, data are divided into minibatches for processing. This class aims at providing hooks of training or validating on a minibatch of data. Users may provide customized fit_batch() and evaluate_batch() methods by inheriting from this class and overriding class methods.

BatchProcessor can be used to replace fit_batch() and evaluate_batch() in the base estimator class

evaluate_batch(estimator, val_batch, batch_axis=0)[source]

Evaluate the estimator model on a batch of validation data.

Parameters
  • estimator (Estimator) – Reference to the estimator

  • val_batch (tuple) – Data and label of a batch from the validation data loader.

  • batch_axis (int, default 0) – Batch axis to split the validation data into devices.

fit_batch(estimator, train_batch, batch_axis=0)[source]

Trains the estimator model on a batch of training data.

Parameters
  • estimator (Estimator) – Reference to the estimator

  • train_batch (tuple) – Data and label of a batch from the training data loader.

  • batch_axis (int, default 0) – Batch axis to split the training data into devices.

Returns

  • data (List of NDArray) – Sharded data from the batch. Data is sharded with gluon.split_and_load.

  • label (List of NDArray) – Sharded label from the batch. Labels are sharded with gluon.split_and_load.

  • pred (List of NDArray) – Prediction on each of the sharded inputs.

  • loss (List of NDArray) – Loss on each of the sharded inputs.

class mxnet.gluon.contrib.estimator.CheckpointHandler(model_dir, model_prefix='model', monitor=None, verbose=0, save_best=False, mode='auto', epoch_period=1, batch_period=None, max_checkpoints=5, resume_from_checkpoint=False)[source]

Bases: mxnet.gluon.contrib.estimator.event_handler.TrainBegin, mxnet.gluon.contrib.estimator.event_handler.BatchEnd, mxnet.gluon.contrib.estimator.event_handler.EpochEnd

Save the model after user define period

CheckpointHandler saves the network architecture after first batch if the model can be fully hybridized, saves model parameters and trainer states after user defined period, default saves every epoch.

Parameters
  • model_dir (str) – File directory to save all the model related files including model architecture, model parameters, and trainer states.

  • model_prefix (str default 'model') – Prefix to add for all checkpoint file names.

  • monitor (EvalMetric, default None) – The metrics to monitor and determine if model has improved

  • verbose (int, default 0) – Verbosity mode, 1 means inform user every time a checkpoint is saved

  • save_best (bool, default False) – If True, monitor must not be None, CheckpointHandler will save the model parameters and trainer states with the best monitored value.

  • mode (str, default 'auto') – One of {auto, min, max}, if save_best=True, the comparison to make and determine if the monitored value has improved. if ‘auto’ mode, CheckpointHandler will try to use min or max based on the monitored metric name.

  • epoch_period (int, default 1) – Epoch intervals between saving the network. By default, checkpoints are saved every epoch.

  • batch_period (int, default None) – Batch intervals between saving the network. By default, checkpoints are not saved based on the number of batches.

  • max_checkpoints (int, default 5) – Maximum number of checkpoint files to keep in the model_dir, older checkpoints will be removed. Best checkpoint file is not counted.

  • resume_from_checkpoint (bool, default False) – Whether to resume training from checkpoint in model_dir. If True and checkpoints found, CheckpointHandler will load net parameters and trainer states, and train the remaining of epochs and batches.

class mxnet.gluon.contrib.estimator.EarlyStoppingHandler(monitor, min_delta=0, patience=0, mode='auto', baseline=None)[source]

Bases: mxnet.gluon.contrib.estimator.event_handler.TrainBegin, mxnet.gluon.contrib.estimator.event_handler.EpochEnd, mxnet.gluon.contrib.estimator.event_handler.TrainEnd

Early stop training if monitored value is not improving

Parameters
  • monitor (EvalMetric) – The metric to monitor, and stop training if this metric does not improve.

  • min_delta (float, default 0) – Minimal change in monitored value to be considered as an improvement.

  • patience (int, default 0) – Number of epochs to wait for improvement before terminate training.

  • mode (str, default 'auto') – One of {auto, min, max}, if save_best_only=True, the comparison to make and determine if the monitored value has improved. if ‘auto’ mode, checkpoint handler will try to use min or max based on the monitored metric name.

  • baseline (float) – Baseline value to compare the monitored value with.

class mxnet.gluon.contrib.estimator.Estimator(net, loss, train_metrics=None, val_metrics=None, initializer=None, trainer=None, context=None, val_net=None, val_loss=None, batch_processor=None)[source]

Bases: object

Estimator Class for easy model training

Estimator can be used to facilitate the training & validation process

Parameters
  • net (gluon.Block) – The model used for training.

  • loss (gluon.loss.Loss) – Loss (objective) function to calculate during training.

  • train_metrics (EvalMetric or list of EvalMetric) – Training metrics for evaluating models on training dataset.

  • val_metrics (EvalMetric or list of EvalMetric) – Validation metrics for evaluating models on validation dataset.

  • initializer (Initializer) – Initializer to initialize the network.

  • trainer (Trainer) – Trainer to apply optimizer on network parameters.

  • context (Context or list of Context) – Device(s) to run the training on.

  • val_net (gluon.Block) –

    The model used for validation. The validation model does not necessarily belong to the same model class as the training model. But the two models typically share the same architecture. Therefore the validation model can reuse parameters of the training model.

    The code example of consruction of val_net sharing the same network parameters as the training net is given below:

    >>> net = _get_train_network()
    >>> val_net = _get_test_network(params=net.collect_params())
    >>> net.initialize(ctx=ctx)
    >>> est = Estimator(net, loss, val_net=val_net)
    

    Proper namespace match is required for weight sharing between two networks. Most networks inheriting Block can share their parameters correctly. An exception is Sequential networks that Block scope must be specified for correct weight sharing. For the naming in mxnet Gluon API, please refer to the site (https://mxnet.apache.org/api/python/docs/tutorials/packages/gluon/blocks/naming.html) for future information.

  • val_loss (gluon.loss.loss) – Loss (objective) function to calculate during validation. If set val_loss None, it will use the same loss function as self.loss

  • batch_processor (BatchProcessor) – BatchProcessor provides customized fit_batch() and evaluate_batch() methods

evaluate(val_data, batch_axis=0, event_handlers=None)[source]

Evaluate model on validation data.

This function calls evaluate_batch() on each of the batches from the validation data loader. Thus, for custom use cases, it’s possible to inherit the estimator class and override evaluate_batch().

Parameters
  • val_data (DataLoader) – Validation data loader with data and labels.

  • batch_axis (int, default 0) – Batch axis to split the validation data into devices.

  • event_handlers (EventHandler or list of EventHandler) – List of EventHandlers to apply during validation. Besides event handlers specified here, a default MetricHandler and a LoggingHandler will be added if not specified explicitly.

fit(train_data, val_data=None, epochs=None, event_handlers=None, batches=None, batch_axis=0)[source]

Trains the model with a given DataLoader for a specified number of epochs or batches. The batch size is inferred from the data loader’s batch_size.

This function calls fit_batch() on each of the batches from the training data loader. Thus, for custom use cases, it’s possible to inherit the estimator class and override fit_batch().

Parameters
  • train_data (DataLoader) – Training data loader with data and labels.

  • val_data (DataLoader, default None) – Validation data loader with data and labels.

  • epochs (int, default None) – Number of epochs to iterate on the training data. You can only specify one and only one type of iteration(epochs or batches).

  • event_handlers (EventHandler or list of EventHandler) – List of EventHandlers to apply during training. Besides the event handlers specified here, a StoppingHandler, LoggingHandler and MetricHandler will be added by default if not yet specified manually. If validation data is provided, a ValidationHandler is also added if not already specified.

  • batches (int, default None) – Number of batches to iterate on the training data. You can only specify one and only one type of iteration(epochs or batches).

  • batch_axis (int, default 0) – Batch axis to split the training data into devices.

logger = None

logging.Logger object associated with the Estimator.

The logger is used for all logs generated by this estimator and its handlers. A new logging.Logger is created during Estimator construction and configured to write all logs with level logging.INFO or higher to sys.stdout.

You can modify the logging settings using the standard Python methods. For example, to save logs to a file in addition to printing them to stdout output, you can attach a logging.FileHandler to the logger.

>>> est = Estimator(net, loss)
>>> import logging
>>> est.logger.addHandler(logging.FileHandler(filename))
class mxnet.gluon.contrib.estimator.GradientUpdateHandler(priority=-2000)[source]

Bases: mxnet.gluon.contrib.estimator.event_handler.BatchEnd

Gradient Update Handler that apply gradients on network weights

GradientUpdateHandler takes the priority level. It updates weight parameters at the end of each batch

Parameters
  • priority (scalar, default -2000) – priority level of the gradient update handler. Priority level is sorted in ascending order. The lower the number is, the higher priority level the handler is.

  • ----------

class mxnet.gluon.contrib.estimator.LoggingHandler(log_interval='epoch', metrics=None, priority=inf)[source]

Bases: mxnet.gluon.contrib.estimator.event_handler.TrainBegin, mxnet.gluon.contrib.estimator.event_handler.TrainEnd, mxnet.gluon.contrib.estimator.event_handler.EpochBegin, mxnet.gluon.contrib.estimator.event_handler.EpochEnd, mxnet.gluon.contrib.estimator.event_handler.BatchBegin, mxnet.gluon.contrib.estimator.event_handler.BatchEnd

Basic Logging Handler that applies to every Gluon estimator by default.

LoggingHandler logs hyper-parameters, training statistics, and other useful information during training

Parameters
  • log_interval (int or str, default 'epoch') – Logging interval during training. log_interval=’epoch’: display metrics every epoch log_interval=integer k: display metrics every interval of k batches

  • metrics (list of EvalMetrics) – Metrics to be logged, logged at batch end, epoch end, train end.

  • priority (scalar, default np.Inf) – Priority level of the LoggingHandler. Priority level is sorted in ascending order. The lower the number is, the higher priority level the handler is.

class mxnet.gluon.contrib.estimator.MetricHandler(metrics, priority=-1000)[source]

Bases: mxnet.gluon.contrib.estimator.event_handler.EpochBegin, mxnet.gluon.contrib.estimator.event_handler.BatchEnd

Metric Handler that update metric values at batch end

MetricHandler takes model predictions and true labels and update the metrics, it also update metric wrapper for loss with loss values. Validation loss and metrics will be handled by ValidationHandler

Parameters
  • metrics (List of EvalMetrics) – Metrics to be updated at batch end.

  • priority (scalar) – Priority level of the MetricHandler. Priority level is sorted in ascending order. The lower the number is, the higher priority level the handler is.

class mxnet.gluon.contrib.estimator.StoppingHandler(max_epoch=None, max_batch=None)[source]

Bases: mxnet.gluon.contrib.estimator.event_handler.TrainBegin, mxnet.gluon.contrib.estimator.event_handler.BatchEnd, mxnet.gluon.contrib.estimator.event_handler.EpochEnd

Stop conditions to stop training Stop training if maximum number of batches or epochs reached.

Parameters
  • max_epoch (int, default None) – Number of maximum epochs to train.

  • max_batch (int, default None) – Number of maximum batches to train.

class mxnet.gluon.contrib.estimator.ValidationHandler(val_data, eval_fn, epoch_period=1, batch_period=None, priority=-1000, event_handlers=None)[source]

Bases: mxnet.gluon.contrib.estimator.event_handler.TrainBegin, mxnet.gluon.contrib.estimator.event_handler.BatchEnd, mxnet.gluon.contrib.estimator.event_handler.EpochEnd

Validation Handler that evaluate model on validation dataset

ValidationHandler takes validation dataset, an evaluation function, metrics to be evaluated, and how often to run the validation. You can provide custom evaluation function or use the one provided my Estimator

Parameters
  • val_data (DataLoader) – Validation data set to run evaluation.

  • eval_fn (function) – A function defines how to run evaluation and calculate loss and metrics.

  • epoch_period (int, default 1) – How often to run validation at epoch end, by default ValidationHandler validate every epoch.

  • batch_period (int, default None) – How often to run validation at batch end, by default ValidationHandler does not validate at batch end.

  • priority (scalar, default -1000) – Priority level of the ValidationHandler. Priority level is sorted in ascending order. The lower the number is, the higher priority level the handler is.

  • event_handlers (EventHandler or list of EventHandlers) – List of EventHandler to apply during validaiton. This argument is used by self.eval_fn function in order to process customized event handlers.