mx.symbol.LayerNorm

Description

Layer normalization.

Normalizes the channels of the input tensor by mean and variance, and applies a scale gamma as well as offset beta.

Assume the input has more than one dimension and we normalize along axis 1. We first compute the mean and variance along this axis and then compute the normalized output, which has the same shape as input, as following:

\[out = \frac{data - mean(data, axis)}{\sqrt{var(data, axis) + \epsilon}} * gamma + beta\]

Both gamma and beta are learnable parameters.

Unlike BatchNorm and InstanceNorm, the mean and var are computed along the channel dimension.

Assume the input has size k on axis 1, then both gamma and beta have shape (k,). If output_mean_var is set to be true, then outputs both data_mean and data_std. Note that no gradient will be passed through these two outputs.

The parameter axis specifies which axis of the input shape denotes the ‘channel’ (separately normalized groups). The default is -1, which sets the channel axis to be the last item in the input shape.

Usage

mx.symbol.LayerNorm(...)

Arguments

Argument

Description

data

NDArray-or-Symbol.

Input data to layer normalization

gamma

NDArray-or-Symbol gamma array

beta

NDArray-or-Symbol beta array

axis

int, optional, default=’-1’.

The axis to perform layer normalization. Usually, this should be be axis of the channel dimension. Negative values means indexing from right to left.

eps

float, optional, default=9.99999975e-06.

An epsilon parameter to prevent division by 0.

output.mean.var

boolean, optional, default=0.

Output the mean and std calculated along the given axis.

name

string, optional.

Name of the resulting symbol.