mx.symbol.BatchNorm
¶
Description¶
Batch normalization.
Normalizes a data batch by mean and variance, and applies a scale gamma
as
well as offset beta
.
Assume the input has more than one dimension and we normalize along axis 1. We first compute the mean and variance along this axis:
Then compute the normalized output, which has the same shape as input, as following:
Both mean and var returns a scalar by treating the input as a vector.
Assume the input has size k on axis 1, then both gamma
and beta
have shape (k,). If output_mean_var
is set to be true, then outputs both data_mean
and
the inverse of data_var
, which are needed for the backward pass. Note that gradient of these
two outputs are blocked.
Besides the inputs and the outputs, this operator accepts two auxiliary
states, moving_mean
and moving_var
, which are k-length
vectors. They are global statistics for the whole dataset, which are updated
by:
moving_mean = moving_mean * momentum + data_mean * (1 - momentum)
moving_var = moving_var * momentum + data_var * (1 - momentum)
If ``use_global_stats`` is set to be true, then ``moving_mean`` and
``moving_var`` are used instead of ``data_mean`` and ``data_var`` to compute
the output. It is often used during inference.
The parameter ``axis`` specifies which axis of the input shape denotes
the 'channel' (separately normalized groups). The default is 1. Specifying -1 sets the channel
axis to be the last item in the input shape.
Both ``gamma`` and ``beta`` are learnable parameters. But if ``fix_gamma`` is true,
then set ``gamma`` to 1 and its gradient to 0.
Note
When fix_gamma
is set to True, no sparse support is provided. If fix_gamma is
set to False,
the sparse tensors will fallback.
Usage¶
mx.symbol.BatchNorm(...)
Arguments¶
Argument |
Description |
---|---|
|
NDArray-or-Symbol. Input data to batch normalization |
|
NDArray-or-Symbol gamma array |
|
NDArray-or-Symbol beta array |
|
NDArray-or-Symbol running mean of input |
|
NDArray-or-Symbol running variance of input |
|
double, optional, default=0.0010000000474974513. Epsilon to prevent div 0. Must be no less than CUDNN_BN_MIN_EPSILON defined in cudnn.h when using cudnn (usually 1e-5) |
|
float, optional, default=0.899999976. Momentum for moving average |
|
boolean, optional, default=1. Fix gamma while training |
|
boolean, optional, default=0. Whether use global moving statistics instead of local batch-norm. This will force change batch-norm into a scale shift operator. |
|
boolean, optional, default=0. Output the mean and inverse std |
|
int, optional, default=’1’. Specify which shape axis the channel is specified |
|
boolean, optional, default=0. Do not select CUDNN operator, if available |
|
float or None, optional, default=None. The minimum scalar value in the form of float32 obtained through calibration. If present, it will be used to by quantized batch norm op to calculate primitive scale.Note: this calib_range is to calib bn output. |
|
float or None, optional, default=None. The maximum scalar value in the form of float32 obtained through calibration. If present, it will be used to by quantized batch norm op to calculate primitive scale.Note: this calib_range is to calib bn output. |
|
string, optional. Name of the resulting symbol. |
Value¶
out
The result mx.symbol
Link to Source Code: http://github.com/apache/incubator-mxnet/blob/1.6.0/src/operator/nn/batch_norm.cc#L571