# gluon.rnn¶

Build-in recurrent neural network layers are provided in the following two modules:

 mxnet.gluon.rnn Recurrent neural network module. mxnet.gluon.contrib.rnn Contrib recurrent neural network module.

## Recurrent Cells¶

 rnn.LSTMCell Long-Short Term Memory (LSTM) network cell. rnn.GRUCell Gated Rectified Unit (GRU) network cell. rnn.RecurrentCell Abstract base class for RNN cells rnn.SequentialRNNCell Sequentially stacking multiple RNN cells. rnn.BidirectionalCell Bidirectional RNN cell. rnn.DropoutCell Applies dropout on input. rnn.ZoneoutCell Applies Zoneout on base cell. rnn.ResidualCell Adds residual connection as described in Wu et al, 2016 (https://arxiv.org/abs/1609.08144).

## Recurrent Layers¶

 rnn.RNN Applies a multi-layer Elman RNN with tanh or ReLU non-linearity to an input sequence. rnn.LSTM Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence. rnn.GRU Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence.

## API Reference¶

Recurrent neural network module.

Classes

 BidirectionalCell(l_cell, r_cell[, …]) Bidirectional RNN cell. DropoutCell(rate[, axes, prefix, params]) Applies dropout on input. GRU(hidden_size[, num_layers, layout, …]) Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence. GRUCell(hidden_size[, …]) Gated Rectified Unit (GRU) network cell. HybridRecurrentCell([prefix, params]) HybridRecurrentCell supports hybridize. HybridSequentialRNNCell([prefix, params]) Sequentially stacking multiple HybridRNN cells. LSTM(hidden_size[, num_layers, layout, …]) Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence. LSTMCell(hidden_size[, …]) Long-Short Term Memory (LSTM) network cell. ModifierCell(base_cell) Base class for modifier cells. RNN(hidden_size[, num_layers, activation, …]) Applies a multi-layer Elman RNN with tanh or ReLU non-linearity to an input sequence. RNNCell(hidden_size[, activation, …]) Elman RNN recurrent neural network cell. RecurrentCell([prefix, params]) Abstract base class for RNN cells ResidualCell(base_cell) Adds residual connection as described in Wu et al, 2016 (https://arxiv.org/abs/1609.08144). SequentialRNNCell([prefix, params]) Sequentially stacking multiple RNN cells. ZoneoutCell(base_cell[, zoneout_outputs, …]) Applies Zoneout on base cell.
class mxnet.gluon.rnn.BidirectionalCell(l_cell, r_cell, output_prefix='bi_')[source]

Bases: mxnet.gluon.rnn.rnn_cell.HybridRecurrentCell

Bidirectional RNN cell.

Parameters

Methods

 begin_state(**kwargs) Initial state for this cell. state_info([batch_size]) shape and layout information of states unroll(length, inputs[, begin_state, …]) Unrolls an RNN cell across time steps.
begin_state(**kwargs)[source]

Initial state for this cell.

Parameters
• func (callable, default symbol.zeros) –

Function for creating initial state.

For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.

For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.

• batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.

• **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.

Returns

states – Starting states for the first RNN step.

Return type

nested list of Symbol

state_info(batch_size=0)[source]

shape and layout information of states

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]

Unrolls an RNN cell across time steps.

Parameters
• length (int) – Number of steps to unroll.

• inputs (Symbol, list of Symbol, or None) –

If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

• begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

• layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

• merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

• valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns

• outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

• states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

class mxnet.gluon.rnn.DropoutCell(rate, axes=(), prefix=None, params=None)[source]

Bases: mxnet.gluon.rnn.rnn_cell.HybridRecurrentCell

Applies dropout on input.

Parameters
• rate (float) – Percentage of elements to drop out, which is 1 - percentage to retain.

• axes (tuple of int, default ()) – The axes on which dropout mask is shared. If empty, regular dropout is applied.

Methods

 hybrid_forward(F, inputs, states) Overrides to construct symbolic graph for this Block. state_info([batch_size]) shape and layout information of states unroll(length, inputs[, begin_state, …]) Unrolls an RNN cell across time steps.
Inputs:
• data: input tensor with shape (batch_size, size).

• states: a list of recurrent state tensors.

Outputs:
• out: output tensor with shape (batch_size, size).

• next_states: returns input states directly.

hybrid_forward(F, inputs, states)[source]

Overrides to construct symbolic graph for this Block.

Parameters
• x (Symbol or NDArray) – The first input tensor.

• *args (list of Symbol or list of NDArray) – Additional input tensors.

state_info(batch_size=0)[source]

shape and layout information of states

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]

Unrolls an RNN cell across time steps.

Parameters
• length (int) – Number of steps to unroll.

• inputs (Symbol, list of Symbol, or None) –

If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

• begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

• layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

• merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

• valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns

• outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

• states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

class mxnet.gluon.rnn.GRU(hidden_size, num_layers=1, layout='TNC', dropout=0, bidirectional=False, input_size=0, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', dtype='float32', **kwargs)[source]

Bases: mxnet.gluon.rnn.rnn_layer._RNNLayer

Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence. Note: this is an implementation of the cuDNN version of GRUs (slight modification compared to Cho et al. 2014; the reset gate $$r_t$$ is applied after matrix multiplication).

For each element in the input sequence, each layer computes the following function:

$\begin{split}\begin{array}{ll} r_t = sigmoid(W_{ir} x_t + b_{ir} + W_{hr} h_{(t-1)} + b_{hr}) \\ i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{hi} h_{(t-1)} + b_{hi}) \\ n_t = \tanh(W_{in} x_t + b_{in} + r_t * (W_{hn} h_{(t-1)} + b_{hn})) \\ h_t = (1 - i_t) * n_t + i_t * h_{(t-1)} \\ \end{array}\end{split}$

where $$h_t$$ is the hidden state at time t, $$x_t$$ is the hidden state of the previous layer at time t or $$input_t$$ for the first layer, and $$r_t$$, $$i_t$$, $$n_t$$ are the reset, input, and new gates, respectively.

Parameters
• hidden_size (int) – The number of features in the hidden state h

• num_layers (int, default 1) – Number of recurrent layers.

• layout (str, default 'TNC') – The format of input and output tensors. T, N and C stand for sequence length, batch size, and feature dimensions respectively.

• dropout (float, default 0) – If non-zero, introduces a dropout layer on the outputs of each RNN layer except the last layer

• bidirectional (bool, default False) – If True, becomes a bidirectional RNN.

• i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.

• h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.

• i2h_bias_initializer (str or Initializer) – Initializer for the bias vector.

• h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.

• dtype (str, default 'float32') – Type to initialize the parameters and default states to

• input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.

• prefix (str or None) – Prefix of this Block.

• params (ParameterDict or None) – Shared Parameters for this Block.

Inputs:
• data: input tensor with shape (sequence_length, batch_size, input_size) when layout is “TNC”. For other layouts, dimensions are permuted accordingly using transpose() operator which adds performance overhead. Consider creating batches in TNC layout during data batching step.

• states: initial recurrent state tensor with shape (num_layers, batch_size, num_hidden). If bidirectional is True, shape will instead be (2*num_layers, batch_size, num_hidden). If states is None, zeros will be used as default begin states.

Outputs:
• out: output tensor with shape (sequence_length, batch_size, num_hidden) when layout is “TNC”. If bidirectional is True, output shape will instead be (sequence_length, batch_size, 2*num_hidden)

• out_states: output recurrent state tensor with the same shape as states. If states is None out_states will not be returned.

Examples

>>> layer = mx.gluon.rnn.GRU(100, 3)
>>> layer.initialize()
>>> input = mx.nd.random.uniform(shape=(5, 3, 10))
>>> # by default zeros are used as begin state
>>> output = layer(input)
>>> # manually specify begin state.
>>> h0 = mx.nd.random.uniform(shape=(3, 3, 100))
>>> output, hn = layer(input, h0)

class mxnet.gluon.rnn.GRUCell(hidden_size, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0, prefix=None, params=None)[source]

Bases: mxnet.gluon.rnn.rnn_cell.HybridRecurrentCell

Gated Rectified Unit (GRU) network cell. Note: this is an implementation of the cuDNN version of GRUs (slight modification compared to Cho et al. 2014; the reset gate $$r_t$$ is applied after matrix multiplication).

Each call computes the following function:

$\begin{split}\begin{array}{ll} r_t = sigmoid(W_{ir} x_t + b_{ir} + W_{hr} h_{(t-1)} + b_{hr}) \\ i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{hi} h_{(t-1)} + b_{hi}) \\ n_t = \tanh(W_{in} x_t + b_{in} + r_t * (W_{hn} h_{(t-1)} + b_{hn})) \\ h_t = (1 - i_t) * n_t + i_t * h_{(t-1)} \\ \end{array}\end{split}$

Methods

 hybrid_forward(F, inputs, states, …) Overrides to construct symbolic graph for this Block. state_info([batch_size]) shape and layout information of states

where $$h_t$$ is the hidden state at time t, $$x_t$$ is the hidden state of the previous layer at time t or $$input_t$$ for the first layer, and $$r_t$$, $$i_t$$, $$n_t$$ are the reset, input, and new gates, respectively.

Parameters
• hidden_size (int) – Number of units in output symbol.

• i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.

• h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.

• i2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.

• h2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.

• prefix (str, default 'gru_') – prefix for name of Blocks (and name of weight if params is None).

• params (Parameter or None, default None) – Container for weight sharing between cells. Created if None.

Inputs:
• data: input tensor with shape (batch_size, input_size).

• states: a list of one initial recurrent state tensor with shape (batch_size, num_hidden).

Outputs:
• out: output tensor with shape (batch_size, num_hidden).

• next_states: a list of one output recurrent state tensor with the same shape as states.

hybrid_forward(F, inputs, states, i2h_weight, h2h_weight, i2h_bias, h2h_bias)[source]

Overrides to construct symbolic graph for this Block.

Parameters
• x (Symbol or NDArray) – The first input tensor.

• *args (list of Symbol or list of NDArray) – Additional input tensors.

state_info(batch_size=0)[source]

shape and layout information of states

class mxnet.gluon.rnn.HybridRecurrentCell(prefix=None, params=None)[source]

Bases: mxnet.gluon.rnn.rnn_cell.RecurrentCell, mxnet.gluon.block.HybridBlock

HybridRecurrentCell supports hybridize.

Methods

 hybrid_forward(F, x, *args, **kwargs) Overrides to construct symbolic graph for this Block.
hybrid_forward(F, x, *args, **kwargs)[source]

Overrides to construct symbolic graph for this Block.

Parameters
• x (Symbol or NDArray) – The first input tensor.

• *args (list of Symbol or list of NDArray) – Additional input tensors.

class mxnet.gluon.rnn.HybridSequentialRNNCell(prefix=None, params=None)[source]

Bases: mxnet.gluon.rnn.rnn_cell.HybridRecurrentCell

Sequentially stacking multiple HybridRNN cells.

Methods

 add(cell) Appends a cell into the stack. begin_state(**kwargs) Initial state for this cell. hybrid_forward(F, inputs, states) Overrides to construct symbolic graph for this Block. state_info([batch_size]) shape and layout information of states unroll(length, inputs[, begin_state, …]) Unrolls an RNN cell across time steps.
add(cell)[source]

Appends a cell into the stack.

Parameters

cell (RecurrentCell) – The cell to add.

begin_state(**kwargs)[source]

Initial state for this cell.

Parameters
• func (callable, default symbol.zeros) –

Function for creating initial state.

For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.

For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.

• batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.

• **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.

Returns

states – Starting states for the first RNN step.

Return type

nested list of Symbol

hybrid_forward(F, inputs, states)[source]

Overrides to construct symbolic graph for this Block.

Parameters
• x (Symbol or NDArray) – The first input tensor.

• *args (list of Symbol or list of NDArray) – Additional input tensors.

state_info(batch_size=0)[source]

shape and layout information of states

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]

Unrolls an RNN cell across time steps.

Parameters
• length (int) – Number of steps to unroll.

• inputs (Symbol, list of Symbol, or None) –

If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

• begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

• layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

• merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

• valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns

• outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

• states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

class mxnet.gluon.rnn.LSTM(hidden_size, num_layers=1, layout='TNC', dropout=0, bidirectional=False, input_size=0, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', projection_size=None, h2r_weight_initializer=None, state_clip_min=None, state_clip_max=None, state_clip_nan=False, dtype='float32', **kwargs)[source]

Bases: mxnet.gluon.rnn.rnn_layer._RNNLayer

Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence.

For each element in the input sequence, each layer computes the following function:

$\begin{split}\begin{array}{ll} i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{hi} h_{(t-1)} + b_{hi}) \\ f_t = sigmoid(W_{if} x_t + b_{if} + W_{hf} h_{(t-1)} + b_{hf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hc} h_{(t-1)} + b_{hg}) \\ o_t = sigmoid(W_{io} x_t + b_{io} + W_{ho} h_{(t-1)} + b_{ho}) \\ c_t = f_t * c_{(t-1)} + i_t * g_t \\ h_t = o_t * \tanh(c_t) \end{array}\end{split}$

where $$h_t$$ is the hidden state at time t, $$c_t$$ is the cell state at time t, $$x_t$$ is the hidden state of the previous layer at time t or $$input_t$$ for the first layer, and $$i_t$$, $$f_t$$, $$g_t$$, $$o_t$$ are the input, forget, cell, and out gates, respectively.

Parameters
• hidden_size (int) – The number of features in the hidden state h.

• num_layers (int, default 1) – Number of recurrent layers.

• layout (str, default 'TNC') – The format of input and output tensors. T, N and C stand for sequence length, batch size, and feature dimensions respectively.

• dropout (float, default 0) – If non-zero, introduces a dropout layer on the outputs of each RNN layer except the last layer.

• bidirectional (bool, default False) – If True, becomes a bidirectional RNN.

• i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.

• h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.

• i2h_bias_initializer (str or Initializer, default 'lstmbias') – Initializer for the bias vector. By default, bias for the forget gate is initialized to 1 while all other biases are initialized to zero.

• h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.

• projection_size (int, default None) – The number of features after projection.

• h2r_weight_initializer (str or Initializer, default None) – Initializer for the projected recurrent weights matrix, used for the linear transformation of the recurrent state to the projected space.

• state_clip_min (float or None, default None) – Minimum clip value of LSTM states. This option must be used together with state_clip_max. If None, clipping is not applied.

• state_clip_max (float or None, default None) – Maximum clip value of LSTM states. This option must be used together with state_clip_min. If None, clipping is not applied.

• state_clip_nan (boolean, default False) – Whether to stop NaN from propagating in state by clipping it to min/max. If the clipping range is not specified, this option is ignored.

• dtype (str, default 'float32') – Type to initialize the parameters and default states to

• input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.

• prefix (str or None) – Prefix of this Block.

• params (ParameterDict or None) – Shared Parameters for this Block.

Inputs:
• data: input tensor with shape (sequence_length, batch_size, input_size) when layout is “TNC”. For other layouts, dimensions are permuted accordingly using transpose() operator which adds performance overhead. Consider creating batches in TNC layout during data batching step.

• states: a list of two initial recurrent state tensors. Each has shape (num_layers, batch_size, num_hidden). If bidirectional is True, shape will instead be (2*num_layers, batch_size, num_hidden). If states is None, zeros will be used as default begin states.

Outputs:
• out: output tensor with shape (sequence_length, batch_size, num_hidden) when layout is “TNC”. If bidirectional is True, output shape will instead be (sequence_length, batch_size, 2*num_hidden)

• out_states: a list of two output recurrent state tensors with the same shape as in states. If states is None out_states will not be returned.

Examples

>>> layer = mx.gluon.rnn.LSTM(100, 3)
>>> layer.initialize()
>>> input = mx.nd.random.uniform(shape=(5, 3, 10))
>>> # by default zeros are used as begin state
>>> output = layer(input)
>>> # manually specify begin state.
>>> h0 = mx.nd.random.uniform(shape=(3, 3, 100))
>>> c0 = mx.nd.random.uniform(shape=(3, 3, 100))
>>> output, hn = layer(input, [h0, c0])

class mxnet.gluon.rnn.LSTMCell(hidden_size, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0, prefix=None, params=None, activation='tanh', recurrent_activation='sigmoid')[source]

Bases: mxnet.gluon.rnn.rnn_cell.HybridRecurrentCell

Long-Short Term Memory (LSTM) network cell.

Each call computes the following function:

$\begin{split}\begin{array}{ll} i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{hi} h_{(t-1)} + b_{hi}) \\ f_t = sigmoid(W_{if} x_t + b_{if} + W_{hf} h_{(t-1)} + b_{hf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hc} h_{(t-1)} + b_{hg}) \\ o_t = sigmoid(W_{io} x_t + b_{io} + W_{ho} h_{(t-1)} + b_{ho}) \\ c_t = f_t * c_{(t-1)} + i_t * g_t \\ h_t = o_t * \tanh(c_t) \end{array}\end{split}$

Methods

 hybrid_forward(F, inputs, states, …) Overrides to construct symbolic graph for this Block. state_info([batch_size]) shape and layout information of states

where $$h_t$$ is the hidden state at time t, $$c_t$$ is the cell state at time t, $$x_t$$ is the hidden state of the previous layer at time t or $$input_t$$ for the first layer, and $$i_t$$, $$f_t$$, $$g_t$$, $$o_t$$ are the input, forget, cell, and out gates, respectively.

Parameters
• hidden_size (int) – Number of units in output symbol.

• i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.

• h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.

• i2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.

• h2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.

• prefix (str, default 'lstm_') – Prefix for name of Blocks (and name of weight if params is None).

• params (Parameter or None, default None) – Container for weight sharing between cells. Created if None.

• activation (str, default 'tanh') – Activation type to use. See nd/symbol Activation for supported types.

• recurrent_activation (str, default 'sigmoid') – Activation type to use for the recurrent step. See nd/symbol Activation for supported types.

• Inputs

• data: input tensor with shape (batch_size, input_size).

• states: a list of two initial recurrent state tensors. Each has shape (batch_size, num_hidden).

• Outputs

• out: output tensor with shape (batch_size, num_hidden).

• next_states: a list of two output recurrent state tensors. Each has the same shape as states.

hybrid_forward(F, inputs, states, i2h_weight, h2h_weight, i2h_bias, h2h_bias)[source]

Overrides to construct symbolic graph for this Block.

Parameters
• x (Symbol or NDArray) – The first input tensor.

• *args (list of Symbol or list of NDArray) – Additional input tensors.

state_info(batch_size=0)[source]

shape and layout information of states

class mxnet.gluon.rnn.ModifierCell(base_cell)[source]

Bases: mxnet.gluon.rnn.rnn_cell.HybridRecurrentCell

Base class for modifier cells. A modifier cell takes a base cell, apply modifications on it (e.g. Zoneout), and returns a new cell.

After applying modifiers the base cell should no longer be called directly. The modifier cell should be used instead.

Methods

 begin_state([func]) Initial state for this cell. hybrid_forward(F, inputs, states) Overrides to construct symbolic graph for this Block. state_info([batch_size]) shape and layout information of states

Attributes

 params Returns this Block’s parameter dictionary (does not include its children’s parameters).
begin_state(func=<function zeros>, **kwargs)[source]

Initial state for this cell.

Parameters
• func (callable, default symbol.zeros) –

Function for creating initial state.

For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.

For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.

• batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.

• **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.

Returns

states – Starting states for the first RNN step.

Return type

nested list of Symbol

hybrid_forward(F, inputs, states)[source]

Overrides to construct symbolic graph for this Block.

Parameters
• x (Symbol or NDArray) – The first input tensor.

• *args (list of Symbol or list of NDArray) – Additional input tensors.

property params

Returns this Block’s parameter dictionary (does not include its children’s parameters).

state_info(batch_size=0)[source]

shape and layout information of states

class mxnet.gluon.rnn.RNN(hidden_size, num_layers=1, activation='relu', layout='TNC', dropout=0, bidirectional=False, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0, dtype='float32', **kwargs)[source]

Bases: mxnet.gluon.rnn.rnn_layer._RNNLayer

Applies a multi-layer Elman RNN with tanh or ReLU non-linearity to an input sequence.

For each element in the input sequence, each layer computes the following function:

$h_t = \tanh(w_{ih} * x_t + b_{ih} + w_{hh} * h_{(t-1)} + b_{hh})$

where $$h_t$$ is the hidden state at time t, and $$x_t$$ is the output of the previous layer at time t or $$input_t$$ for the first layer. If nonlinearity=’relu’, then ReLU is used instead of tanh.

Parameters
• hidden_size (int) – The number of features in the hidden state h.

• num_layers (int, default 1) – Number of recurrent layers.

• activation ({'relu' or 'tanh'}, default 'relu') – The activation function to use.

• layout (str, default 'TNC') – The format of input and output tensors. T, N and C stand for sequence length, batch size, and feature dimensions respectively.

• dropout (float, default 0) – If non-zero, introduces a dropout layer on the outputs of each RNN layer except the last layer.

• bidirectional (bool, default False) – If True, becomes a bidirectional RNN.

• i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.

• h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.

• i2h_bias_initializer (str or Initializer) – Initializer for the bias vector.

• h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.

• input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.

• dtype (str, default 'float32') – Type to initialize the parameters and default states to

• prefix (str or None) – Prefix of this Block.

• params (ParameterDict or None) – Shared Parameters for this Block.

Inputs:
• data: input tensor with shape (sequence_length, batch_size, input_size) when layout is “TNC”. For other layouts, dimensions are permuted accordingly using transpose() operator which adds performance overhead. Consider creating batches in TNC layout during data batching step.

• states: initial recurrent state tensor with shape (num_layers, batch_size, num_hidden). If bidirectional is True, shape will instead be (2*num_layers, batch_size, num_hidden). If states is None, zeros will be used as default begin states.

Outputs:
• out: output tensor with shape (sequence_length, batch_size, num_hidden) when layout is “TNC”. If bidirectional is True, output shape will instead be (sequence_length, batch_size, 2*num_hidden)

• out_states: output recurrent state tensor with the same shape as states. If states is None out_states will not be returned.

Examples

>>> layer = mx.gluon.rnn.RNN(100, 3)
>>> layer.initialize()
>>> input = mx.nd.random.uniform(shape=(5, 3, 10))
>>> # by default zeros are used as begin state
>>> output = layer(input)
>>> # manually specify begin state.
>>> h0 = mx.nd.random.uniform(shape=(3, 3, 100))
>>> output, hn = layer(input, h0)

class mxnet.gluon.rnn.RNNCell(hidden_size, activation='tanh', i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0, prefix=None, params=None)[source]

Bases: mxnet.gluon.rnn.rnn_cell.HybridRecurrentCell

Elman RNN recurrent neural network cell.

Each call computes the following function:

$h_t = \tanh(w_{ih} * x_t + b_{ih} + w_{hh} * h_{(t-1)} + b_{hh})$

Methods

 hybrid_forward(F, inputs, states, …) Overrides to construct symbolic graph for this Block. state_info([batch_size]) shape and layout information of states

where $$h_t$$ is the hidden state at time t, and $$x_t$$ is the hidden state of the previous layer at time t or $$input_t$$ for the first layer. If nonlinearity=’relu’, then ReLU is used instead of tanh.

Parameters
• hidden_size (int) – Number of units in output symbol

• activation (str or Symbol, default 'tanh') – Type of activation function.

• i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.

• h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.

• i2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.

• h2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.

• prefix (str, default 'rnn_') – Prefix for name of Blocks (and name of weight if params is None).

• params (Parameter or None) – Container for weight sharing between cells. Created if None.

Inputs:
• data: input tensor with shape (batch_size, input_size).

• states: a list of one initial recurrent state tensor with shape (batch_size, num_hidden).

Outputs:
• out: output tensor with shape (batch_size, num_hidden).

• next_states: a list of one output recurrent state tensor with the same shape as states.

hybrid_forward(F, inputs, states, i2h_weight, h2h_weight, i2h_bias, h2h_bias)[source]

Overrides to construct symbolic graph for this Block.

Parameters
• x (Symbol or NDArray) – The first input tensor.

• *args (list of Symbol or list of NDArray) – Additional input tensors.

state_info(batch_size=0)[source]

shape and layout information of states

class mxnet.gluon.rnn.RecurrentCell(prefix=None, params=None)[source]

Bases: mxnet.gluon.block.Block

Abstract base class for RNN cells

Parameters
• prefix (str, optional) – Prefix for names of Blocks (this prefix is also used for names of weights if params is None i.e. if params are being created and not reused)

• params (Parameter or None, default None) – Container for weight sharing between cells. A new Parameter container is created if params is None.

Methods

 begin_state([batch_size, func]) Initial state for this cell. forward(inputs, states) Unrolls the recurrent cell for one time step. Reset before re-using the cell for another graph. state_info([batch_size]) shape and layout information of states unroll(length, inputs[, begin_state, …]) Unrolls an RNN cell across time steps.
begin_state(batch_size=0, func=<function zeros>, **kwargs)[source]

Initial state for this cell.

Parameters
• func (callable, default symbol.zeros) –

Function for creating initial state.

For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.

For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.

• batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.

• **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.

Returns

states – Starting states for the first RNN step.

Return type

nested list of Symbol

forward(inputs, states)[source]

Unrolls the recurrent cell for one time step.

Parameters
• inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).

• states (list of sym.Variable) – RNN state from previous step or the output of begin_state().

Returns

• output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.

• states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.

begin_state()

This function can provide the states for the first time step.

unroll()

This function unrolls an RNN for a given number of (>=1) time steps.

reset()[source]

Reset before re-using the cell for another graph.

state_info(batch_size=0)[source]

shape and layout information of states

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]

Unrolls an RNN cell across time steps.

Parameters
• length (int) – Number of steps to unroll.

• inputs (Symbol, list of Symbol, or None) –

If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

• begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

• layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

• merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

• valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns

• outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

• states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

class mxnet.gluon.rnn.ResidualCell(base_cell)[source]

Bases: mxnet.gluon.rnn.rnn_cell.ModifierCell

Adds residual connection as described in Wu et al, 2016 (https://arxiv.org/abs/1609.08144). Output of the cell is output of the base cell plus input.

Methods

 hybrid_forward(F, inputs, states) Overrides to construct symbolic graph for this Block. unroll(length, inputs[, begin_state, …]) Unrolls an RNN cell across time steps.
hybrid_forward(F, inputs, states)[source]

Overrides to construct symbolic graph for this Block.

Parameters
• x (Symbol or NDArray) – The first input tensor.

• *args (list of Symbol or list of NDArray) – Additional input tensors.

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]

Unrolls an RNN cell across time steps.

Parameters
• length (int) – Number of steps to unroll.

• inputs (Symbol, list of Symbol, or None) –

If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

• begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

• layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

• merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

• valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns

• outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

• states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

class mxnet.gluon.rnn.SequentialRNNCell(prefix=None, params=None)[source]

Bases: mxnet.gluon.rnn.rnn_cell.RecurrentCell

Sequentially stacking multiple RNN cells.

Methods

 add(cell) Appends a cell into the stack. begin_state(**kwargs) Initial state for this cell. state_info([batch_size]) shape and layout information of states unroll(length, inputs[, begin_state, …]) Unrolls an RNN cell across time steps.
add(cell)[source]

Appends a cell into the stack.

Parameters

cell (RecurrentCell) – The cell to add.

begin_state(**kwargs)[source]

Initial state for this cell.

Parameters
• func (callable, default symbol.zeros) –

Function for creating initial state.

For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.

For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.

• batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.

• **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.

Returns

states – Starting states for the first RNN step.

Return type

nested list of Symbol

state_info(batch_size=0)[source]

shape and layout information of states

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]

Unrolls an RNN cell across time steps.

Parameters
• length (int) – Number of steps to unroll.

• inputs (Symbol, list of Symbol, or None) –

If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

• begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

• layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

• merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

• valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns

• outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

• states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

class mxnet.gluon.rnn.ZoneoutCell(base_cell, zoneout_outputs=0.0, zoneout_states=0.0)[source]

Bases: mxnet.gluon.rnn.rnn_cell.ModifierCell

Applies Zoneout on base cell.

Methods

 hybrid_forward(F, inputs, states) Overrides to construct symbolic graph for this Block. Reset before re-using the cell for another graph.
hybrid_forward(F, inputs, states)[source]

Overrides to construct symbolic graph for this Block.

Parameters
• x (Symbol or NDArray) – The first input tensor.

• *args (list of Symbol or list of NDArray) – Additional input tensors.

reset()[source]

Reset before re-using the cell for another graph.