Gluon Recurrent Neural Network API

Overview

This document lists the recurrent neural network API in Gluon:

Recurrent Layers

Recurrent layers can be used in Sequential with other regular neural network layers. For example, to construct a sequence labeling model where a prediction is made for each time-step:

model = mx.gluon.nn.Sequential()
with model.name_scope():
    model.add(mx.gluon.nn.Embedding(30, 10))
    model.add(mx.gluon.rnn.LSTM(20))
    model.add(mx.gluon.nn.Dense(5, flatten=False))
model.initialize()
model(mx.nd.ones((2,3)))
RNN Applies a multi-layer Elman RNN with tanh or ReLU non-linearity to an input sequence.
LSTM Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence.
GRU Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence.

Recurrent Cells

Recurrent cells allows fine-grained control when defining recurrent models. User can explicit step and unroll to construct complex networks. It provides more flexibility but is slower than recurrent layers. Recurrent cells can be stacked with SequentialRNNCell:

model = mx.gluon.rnn.SequentialRNNCell()
with model.name_scope():
    model.add(mx.gluon.rnn.LSTMCell(20))
    model.add(mx.gluon.rnn.LSTMCell(20))
states = model.begin_state(batch_size=32)
inputs = mx.nd.random.uniform(shape=(5, 32, 10))
outputs = []
for i in range(5):
    output, states = model(inputs[i], states)
    outputs.append(output)
RNNCell Elman RNN recurrent neural network cell.
LSTMCell Long-Short Term Memory (LSTM) network cell.
GRUCell Gated Rectified Unit (GRU) network cell.
RecurrentCell Abstract base class for RNN cells
SequentialRNNCell Sequentially stacking multiple RNN cells.
BidirectionalCell Bidirectional RNN cell.
DropoutCell Applies dropout on input.
ZoneoutCell Applies Zoneout on base cell.
ResidualCell Adds residual connection as described in Wu et al, 2016 (https://arxiv.org/abs/1609.08144).

API Reference

Recurrent neural network module.

class mxnet.gluon.rnn.BidirectionalCell(l_cell, r_cell, output_prefix='bi_')[source]

Bidirectional RNN cell.

Parameters:
class mxnet.gluon.rnn.DropoutCell(rate, axes=(), prefix=None, params=None)[source]

Applies dropout on input.

Parameters:
  • rate (float) – Percentage of elements to drop out, which is 1 - percentage to retain.
  • axes (tuple of int, default ()) – The axes on which dropout mask is shared. If empty, regular dropout is applied.
Inputs:
  • data: input tensor with shape (batch_size, size).
  • states: a list of recurrent state tensors.
Outputs:
  • out: output tensor with shape (batch_size, size).
  • next_states: returns input states directly.
class mxnet.gluon.rnn.GRU(hidden_size, num_layers=1, layout='TNC', dropout=0, bidirectional=False, input_size=0, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', **kwargs)[source]

Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence.

For each element in the input sequence, each layer computes the following function:

\[\begin{split}\begin{array}{ll} r_t = sigmoid(W_{ir} x_t + b_{ir} + W_{hr} h_{(t-1)} + b_{hr}) \\ i_t = sigmoid(W_{ii} x_t + b_{ii} + W_hi h_{(t-1)} + b_{hi}) \\ n_t = \tanh(W_{in} x_t + b_{in} + r_t * (W_{hn} h_{(t-1)}+ b_{hn})) \\ h_t = (1 - i_t) * n_t + i_t * h_{(t-1)} \\ \end{array}\end{split}\]

where \(h_t\) is the hidden state at time t, \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer, and \(r_t\), \(i_t\), \(n_t\) are the reset, input, and new gates, respectively.

Parameters:
  • hidden_size (int) – The number of features in the hidden state h
  • num_layers (int, default 1) – Number of recurrent layers.
  • layout (str, default 'TNC') – The format of input and output tensors. T, N and C stand for sequence length, batch size, and feature dimensions respectively.
  • dropout (float, default 0) – If non-zero, introduces a dropout layer on the outputs of each RNN layer except the last layer
  • bidirectional (bool, default False) – If True, becomes a bidirectional RNN.
  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.
  • i2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
  • h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
  • input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.
  • prefix (str or None) – Prefix of this Block.
  • params (ParameterDict or None) – Shared Parameters for this Block.
Inputs:
  • data: input tensor with shape (sequence_length, batch_size, input_size) when layout is “TNC”. For other layouts dimensions are permuted accordingly.
  • states: initial recurrent state tensor with shape (num_layers, batch_size, num_hidden). If bidirectional is True, shape will instead be (2*num_layers, batch_size, num_hidden). If states is None, zeros will be used as default begin states.
Outputs:
  • out: output tensor with shape (sequence_length, batch_size, num_hidden) when layout is “TNC”. If bidirectional is True, output shape will instead be (sequence_length, batch_size, 2*num_hidden)
  • out_states: output recurrent state tensor with the same shape as states. If states is None out_states will not be returned.

Examples

>>> layer = mx.gluon.rnn.GRU(100, 3)
>>> layer.initialize()
>>> input = mx.nd.random.uniform(shape=(5, 3, 10))
>>> # by default zeros are used as begin state
>>> output = layer(input)
>>> # manually specify begin state.
>>> h0 = mx.nd.random.uniform(shape=(3, 3, 100))
>>> output, hn = layer(input, h0)
class mxnet.gluon.rnn.GRUCell(hidden_size, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0, prefix=None, params=None)[source]

Gated Rectified Unit (GRU) network cell. Note: this is an implementation of the cuDNN version of GRUs (slight modification compared to Cho et al. 2014).

Each call computes the following function:

\[\begin{split}\begin{array}{ll} r_t = sigmoid(W_{ir} x_t + b_{ir} + W_{hr} h_{(t-1)} + b_{hr}) \\ i_t = sigmoid(W_{ii} x_t + b_{ii} + W_hi h_{(t-1)} + b_{hi}) \\ n_t = \tanh(W_{in} x_t + b_{in} + r_t * (W_{hn} h_{(t-1)}+ b_{hn})) \\ h_t = (1 - i_t) * n_t + i_t * h_{(t-1)} \\ \end{array}\end{split}\]

where \(h_t\) is the hidden state at time t, \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer, and \(r_t\), \(i_t\), \(n_t\) are the reset, input, and new gates, respectively.

Parameters:
  • hidden_size (int) – Number of units in output symbol.
  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.
  • i2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
  • h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
  • prefix (str, default ‘gru_‘) – prefix for name of Block`s (and name of weight if params is `None).
  • params (Parameter or None) – Container for weight sharing between cells. Created if None.
Inputs:
  • data: input tensor with shape (batch_size, input_size).
  • states: a list of one initial recurrent state tensor with shape (batch_size, num_hidden).
Outputs:
  • out: output tensor with shape (batch_size, num_hidden).
  • next_states: a list of one output recurrent state tensor with the same shape as states.
class mxnet.gluon.rnn.HybridRecurrentCell(prefix=None, params=None)[source]

HybridRecurrentCell supports hybridize.

class mxnet.gluon.rnn.LSTM(hidden_size, num_layers=1, layout='TNC', dropout=0, bidirectional=False, input_size=0, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', **kwargs)[source]

Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence.

For each element in the input sequence, each layer computes the following function:

\[\begin{split}\begin{array}{ll} i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{hi} h_{(t-1)} + b_{hi}) \\ f_t = sigmoid(W_{if} x_t + b_{if} + W_{hf} h_{(t-1)} + b_{hf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hc} h_{(t-1)} + b_{hg}) \\ o_t = sigmoid(W_{io} x_t + b_{io} + W_{ho} h_{(t-1)} + b_{ho}) \\ c_t = f_t * c_{(t-1)} + i_t * g_t \\ h_t = o_t * \tanh(c_t) \end{array}\end{split}\]

where \(h_t\) is the hidden state at time t, \(c_t\) is the cell state at time t, \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer, and \(i_t\), \(f_t\), \(g_t\), \(o_t\) are the input, forget, cell, and out gates, respectively.

Parameters:
  • hidden_size (int) – The number of features in the hidden state h.
  • num_layers (int, default 1) – Number of recurrent layers.
  • layout (str, default 'TNC') – The format of input and output tensors. T, N and C stand for sequence length, batch size, and feature dimensions respectively.
  • dropout (float, default 0) – If non-zero, introduces a dropout layer on the outputs of each RNN layer except the last layer.
  • bidirectional (bool, default False) – If True, becomes a bidirectional RNN.
  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.
  • i2h_bias_initializer (str or Initializer, default 'lstmbias') – Initializer for the bias vector. By default, bias for the forget gate is initialized to 1 while all other biases are initialized to zero.
  • h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
  • input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.
  • prefix (str or None) – Prefix of this Block.
  • params (ParameterDict or None) – Shared Parameters for this Block.
Inputs:
  • data: input tensor with shape (sequence_length, batch_size, input_size) when layout is “TNC”. For other layouts dimensions are permuted accordingly.
  • states: a list of two initial recurrent state tensors. Each has shape (num_layers, batch_size, num_hidden). If bidirectional is True, shape will instead be (2*num_layers, batch_size, num_hidden). If states is None, zeros will be used as default begin states.
Outputs:
  • out: output tensor with shape (sequence_length, batch_size, num_hidden) when layout is “TNC”. If bidirectional is True, output shape will instead be (sequence_length, batch_size, 2*num_hidden)
  • out_states: a list of two output recurrent state tensors with the same shape as in states. If states is None out_states will not be returned.

Examples

>>> layer = mx.gluon.rnn.LSTM(100, 3)
>>> layer.initialize()
>>> input = mx.nd.random.uniform(shape=(5, 3, 10))
>>> # by default zeros are used as begin state
>>> output = layer(input)
>>> # manually specify begin state.
>>> h0 = mx.nd.random.uniform(shape=(3, 3, 100))
>>> c0 = mx.nd.random.uniform(shape=(3, 3, 100))
>>> output, hn = layer(input, [h0, c0])
class mxnet.gluon.rnn.LSTMCell(hidden_size, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0, prefix=None, params=None)[source]

Long-Short Term Memory (LSTM) network cell.

Each call computes the following function:

\[\begin{split}\begin{array}{ll} i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{hi} h_{(t-1)} + b_{hi}) \\ f_t = sigmoid(W_{if} x_t + b_{if} + W_{hf} h_{(t-1)} + b_{hf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hc} h_{(t-1)} + b_{hg}) \\ o_t = sigmoid(W_{io} x_t + b_{io} + W_{ho} h_{(t-1)} + b_{ho}) \\ c_t = f_t * c_{(t-1)} + i_t * g_t \\ h_t = o_t * \tanh(c_t) \end{array}\end{split}\]

where \(h_t\) is the hidden state at time t, \(c_t\) is the cell state at time t, \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer, and \(i_t\), \(f_t\), \(g_t\), \(o_t\) are the input, forget, cell, and out gates, respectively.

Parameters:
  • hidden_size (int) – Number of units in output symbol.
  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.
  • i2h_bias_initializer (str or Initializer, default 'lstmbias') – Initializer for the bias vector. By default, bias for the forget gate is initialized to 1 while all other biases are initialized to zero.
  • h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
  • prefix (str, default ‘lstm_‘) – Prefix for name of Block`s (and name of weight if params is `None).
  • params (Parameter or None) – Container for weight sharing between cells. Created if None.
Inputs:
  • data: input tensor with shape (batch_size, input_size).
  • states: a list of two initial recurrent state tensors. Each has shape (batch_size, num_hidden).
Outputs:
  • out: output tensor with shape (batch_size, num_hidden).
  • next_states: a list of two output recurrent state tensors. Each has the same shape as states.
class mxnet.gluon.rnn.ModifierCell(base_cell)[source]

Base class for modifier cells. A modifier cell takes a base cell, apply modifications on it (e.g. Zoneout), and returns a new cell.

After applying modifiers the base cell should no longer be called directly. The modifier cell should be used instead.

class mxnet.gluon.rnn.RNN(hidden_size, num_layers=1, activation='relu', layout='TNC', dropout=0, bidirectional=False, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0, **kwargs)[source]

Applies a multi-layer Elman RNN with tanh or ReLU non-linearity to an input sequence.

For each element in the input sequence, each layer computes the following function:

\[h_t = \tanh(w_{ih} * x_t + b_{ih} + w_{hh} * h_{(t-1)} + b_{hh})\]

where \(h_t\) is the hidden state at time t, and \(x_t\) is the output of the previous layer at time t or \(input_t\) for the first layer. If nonlinearity=’relu’, then ReLU is used instead of tanh.

Parameters:
  • hidden_size (int) – The number of features in the hidden state h.
  • num_layers (int, default 1) – Number of recurrent layers.
  • activation ({'relu' or 'tanh'}, default 'relu') – The activation function to use.
  • layout (str, default 'TNC') – The format of input and output tensors. T, N and C stand for sequence length, batch size, and feature dimensions respectively.
  • dropout (float, default 0) – If non-zero, introduces a dropout layer on the outputs of each RNN layer except the last layer.
  • bidirectional (bool, default False) – If True, becomes a bidirectional RNN.
  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.
  • i2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
  • h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
  • input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.
  • prefix (str or None) – Prefix of this Block.
  • params (ParameterDict or None) – Shared Parameters for this Block.
Inputs:
  • data: input tensor with shape (sequence_length, batch_size, input_size) when layout is “TNC”. For other layouts dimensions are permuted accordingly.
  • states: initial recurrent state tensor with shape (num_layers, batch_size, num_hidden). If bidirectional is True, shape will instead be (2*num_layers, batch_size, num_hidden). If states is None, zeros will be used as default begin states.
Outputs:
  • out: output tensor with shape (sequence_length, batch_size, num_hidden) when layout is “TNC”. If bidirectional is True, output shape will instead be (sequence_length, batch_size, 2*num_hidden)
  • out_states: output recurrent state tensor with the same shape as states. If states is None out_states will not be returned.

Examples

>>> layer = mx.gluon.rnn.RNN(100, 3)
>>> layer.initialize()
>>> input = mx.nd.random.uniform(shape=(5, 3, 10))
>>> # by default zeros are used as begin state
>>> output = layer(input)
>>> # manually specify begin state.
>>> h0 = mx.nd.random.uniform(shape=(3, 3, 100))
>>> output, hn = layer(input, h0)
class mxnet.gluon.rnn.RNNCell(hidden_size, activation='tanh', i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0, prefix=None, params=None)[source]

Elman RNN recurrent neural network cell.

Each call computes the following function:

\[h_t = \tanh(w_{ih} * x_t + b_{ih} + w_{hh} * h_{(t-1)} + b_{hh})\]

where \(h_t\) is the hidden state at time t, and \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer. If nonlinearity=’relu’, then ReLU is used instead of tanh.

Parameters:
  • hidden_size (int) – Number of units in output symbol
  • activation (str or Symbol, default 'tanh') – Type of activation function.
  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.
  • i2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
  • h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
  • prefix (str, default ‘rnn_‘) – Prefix for name of Block`s (and name of weight if params is `None).
  • params (Parameter or None) – Container for weight sharing between cells. Created if None.
Inputs:
  • data: input tensor with shape (batch_size, input_size).
  • states: a list of one initial recurrent state tensor with shape (batch_size, num_hidden).
Outputs:
  • out: output tensor with shape (batch_size, num_hidden).
  • next_states: a list of one output recurrent state tensor with the same shape as states.
class mxnet.gluon.rnn.RecurrentCell(prefix=None, params=None)[source]

Abstract base class for RNN cells

Parameters:
  • prefix (str, optional) – Prefix for names of Block`s (this prefix is also used for names of weights if `params is None i.e. if params are being created and not reused)
  • params (Parameter or None, optional) – Container for weight sharing between cells. A new Parameter container is created if params is None.
begin_state(batch_size=0, func=, **kwargs)[source]

Initial state for this cell.

Parameters:
  • func (callable, default symbol.zeros) –

    Function for creating initial state.

    For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.

    For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.

  • batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.
  • **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.
Returns:

states – Starting states for the first RNN step.

Return type:

nested list of Symbol

forward(inputs, states)[source]

Unrolls the recurrent cell for one time step.

Parameters:
  • inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).
  • states (list of sym.Variable) – RNN state from previous step or the output of begin_state().
Returns:

  • output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.
  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.

See also

begin_state()
This function can provide the states for the first time step.
unroll()
This function unrolls an RNN for a given number of (>=1) time steps.
reset()[source]

Reset before re-using the cell for another graph.

state_info(batch_size=0)[source]

shape and layout information of states

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]

Unrolls an RNN cell across time steps.

Parameters:
  • length (int) – Number of steps to unroll.
  • inputs (Symbol, list of Symbol, or None) –

    If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, ...) if layout is ‘NTC’, or (length, batch_size, ...) if layout is ‘TNC’.

    If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, ...).

  • begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.
  • layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.
  • merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, ...) if layout is ‘NTC’, or (length, batch_size, ...) if layout is ‘TNC’. If None, output whatever is faster.
  • valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.
Returns:

  • outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.
  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

class mxnet.gluon.rnn.ResidualCell(base_cell)[source]

Adds residual connection as described in Wu et al, 2016 (https://arxiv.org/abs/1609.08144). Output of the cell is output of the base cell plus input.

class mxnet.gluon.rnn.SequentialRNNCell(prefix=None, params=None)[source]

Sequentially stacking multiple RNN cells.

add(cell)[source]

Appends a cell into the stack.

Parameters:cell (RecurrentCell) – The cell to add.
class mxnet.gluon.rnn.ZoneoutCell(base_cell, zoneout_outputs=0.0, zoneout_states=0.0)[source]

Applies Zoneout on base cell.