contrib.symbol

Symbol namespace used to register contrib functions

Functions

AdaptiveAvgPooling2D([data, kernel, …])

Applies a 2D adaptive average pooling over a 4D input with the shape of (NCHW).

BilinearResize2D([data, like, height, …])

Perform 2D resizing (upsampling or downsampling) for 4D input using bilinear interpolation.

CTCLoss([data, label, data_lengths, …])

Connectionist Temporal Classification Loss.

DeformablePSROIPooling([data, rois, trans, …])

Performs deformable position-sensitive region-of-interest pooling on inputs.

MultiBoxDetection([cls_prob, loc_pred, …])

Convert multibox detection predictions.

MultiBoxPrior([data, sizes, ratios, clip, …])

Generate prior(anchor) boxes from data, sizes and ratios.

MultiBoxTarget([anchor, label, cls_pred, …])

Compute Multibox training targets

MultiProposal([cls_prob, bbox_pred, …])

Generate region proposals via RPN

PSROIPooling([data, rois, spatial_scale, …])

Performs region-of-interest pooling on inputs.

Proposal([cls_prob, bbox_pred, im_info, …])

Generate region proposals via RPN

ROIAlign([data, rois, pooled_size, …])

This operator takes a 4D feature map as an input array and region proposals as rois, then align the feature map over sub-regions of input and produces a fixed-sized output array.

RROIAlign([data, rois, pooled_size, …])

Performs Rotated ROI Align on the input array.

SyncBatchNorm([data, gamma, beta, …])

Batch normalization.

allclose([a, b, rtol, atol, equal_nan, …])

This operators implements the numpy.allclose(a, b, rtol=1e-05, atol=1e-08, equal_nan=False)

arange_like([data, start, step, repeat, …])

Return an array with evenly spaced values.

backward_gradientmultiplier([data, scalar, …])

param data

source input

backward_hawkesll([name, attr, out])

param name

Name of the resulting symbol.

backward_index_copy([name, attr, out])

param name

Name of the resulting symbol.

backward_quadratic([name, attr, out])

param name

Name of the resulting symbol.

bipartite_matching([data, is_ascend, …])

Compute bipartite matching.

boolean_mask([data, index, axis, name, …])

Given an n-d NDArray data, and a 1-d NDArray index, the operator produces an un-predeterminable shaped n-d NDArray out, which stands for the rows in x where the corresonding element in index is non-zero.

box_decode([data, anchors, std0, std1, …])

Decode bounding boxes training target with normalized center offsets.

box_encode([samples, matches, anchors, …])

Encode bounding boxes training target with normalized center offsets.

box_iou([lhs, rhs, format, name, attr, out])

Bounding box overlap of two arrays.

box_nms([data, overlap_thresh, …])

Apply non-maximum suppression to input.

box_non_maximum_suppression([data, …])

Apply non-maximum suppression to input.

calibrate_entropy([hist, hist_edges, …])

Provide calibrated min/max for input histogram.

count_sketch([data, h, s, out_dim, …])

Apply CountSketch to input: map a d-dimension data to k-dimension data”

ctc_loss([data, label, data_lengths, …])

Connectionist Temporal Classification Loss.

dequantize([data, min_range, max_range, …])

Dequantize the input tensor into a float tensor.

dgl_adjacency([data, name, attr, out])

This operator converts a CSR matrix whose values are edge Ids to an adjacency matrix whose values are ones.

dgl_csr_neighbor_non_uniform_sample(…)

This operator samples sub-graph from a csr graph via an non-uniform probability.

dgl_csr_neighbor_uniform_sample(…)

This operator samples sub-graphs from a csr graph via an uniform probability.

dgl_graph_compact(*graph_data, **kwargs)

This operator compacts a CSR matrix generated by dgl_csr_neighbor_uniform_sample and dgl_csr_neighbor_non_uniform_sample.

dgl_subgraph(*data, **kwargs)

This operator constructs an induced subgraph for a given set of vertices from a graph.

div_sqrt_dim([data, name, attr, out])

Rescale the input by the square root of the channel dimension.

dynamic_reshape([data, shape, name, attr, out])

Experimental support for reshape operator with dynamic shape.

edge_id([data, u, v, name, attr, out])

This operator implements the edge_id function for a graph stored in a CSR matrix (the value of the CSR stores the edge Id of the graph).

fft([data, compute_size, name, attr, out])

Apply 1D FFT to input”

getnnz([data, axis, name, attr, out])

Number of stored values for a sparse tensor, including explicit zeros.

gradientmultiplier([data, scalar, is_int, …])

This operator implements the gradient multiplier function.

group_adagrad_update([weight, grad, …])

Update function for Group AdaGrad optimizer.

hawkesll([lda, alpha, beta, state, lags, …])

Computes the log likelihood of a univariate Hawkes process.

index_array([data, axes, name, attr, out])

Returns an array of indexes of the input array.

index_copy([old_tensor, index_vector, …])

Copies the elements of a new_tensor into the old_tensor.

interleaved_matmul_encdec_qk([queries, …])

Compute the matrix multiplication between the projections of queries and keys in multihead attention use as encoder-decoder.

interleaved_matmul_encdec_valatt([…])

Compute the matrix multiplication between the projections of values and the attention weights in multihead attention use as encoder-decoder.

interleaved_matmul_selfatt_qk([…])

Compute the matrix multiplication between the projections of queries and keys in multihead attention use as self attention.

interleaved_matmul_selfatt_valatt([…])

Compute the matrix multiplication between the projections of values and the attention weights in multihead attention use as self attention.

intgemm_fully_connected([data, weight, …])

Multiply matrices using 8-bit integers.

intgemm_maxabsolute([data, name, attr, out])

Compute the maximum absolute value in a tensor of float32 fast on a CPU.

intgemm_prepare_data([data, maxabs, name, …])

This operator converts quantizes float32 to int8 while also banning -128.

intgemm_prepare_weight([weight, maxabs, …])

This operator converts a weight matrix in column-major format to intgemm’s internal fast representation of weight matrices.

intgemm_take_weight([weight, indices, name, …])

Index a weight matrix stored in intgemm’s weight format.

mrcnn_mask_target([rois, gt_masks, matches, …])

Generate mask targets for Mask-RCNN.

quadratic([data, a, b, c, name, attr, out])

This operators implements the quadratic function.

quantize([data, min_range, max_range, …])

Quantize a input tensor from float to out_type, with user-specified min_range and max_range.

quantize_asym([data, min_calib_range, …])

Quantize a input tensor from float to uint8_t.

quantize_v2([data, out_type, …])

Quantize a input tensor from float to out_type, with user-specified min_calib_range and max_calib_range or the input range collected at runtime.

quantized_act([data, min_data, max_data, …])

Activation operator for input and output data type of int8.

quantized_batch_norm([data, gamma, beta, …])

BatchNorm operator for input and output data type of int8.

quantized_batch_norm_relu([data, gamma, …])

BatchNorm with ReLU operator for input and output data type of int8.

quantized_concat(*data, **kwargs)

Joins input arrays along a given axis.

quantized_conv([data, weight, bias, …])

Convolution operator for input, weight and bias data type of int8, and accumulates in type int32 for the output.

quantized_elemwise_add([lhs, rhs, lhs_min, …])

elemwise_add operator for input dataA and input dataB data type of int8,

quantized_elemwise_mul([lhs, rhs, lhs_min, …])

Multiplies arguments int8 element-wise.

quantized_embedding([data, weight, …])

Maps integer indices to int8 vector representations (embeddings).

quantized_flatten([data, min_data, …])

param data

A ndarray/symbol of type float32

quantized_fully_connected([data, weight, …])

Fully Connected operator for input, weight and bias data type of int8, and accumulates in type int32 for the output.

quantized_npi_add([lhs, rhs, lhs_min, …])

elemwise_add operator for input dataA and input dataB data type of int8,

quantized_pooling([data, min_data, …])

Pooling operator for input and output data type of int8.

quantized_reshape([data, min_data, …])

param data

Array to be reshaped.

quantized_rnn([data, parameters, state, …])

RNN operator for input data type of uint8.

quantized_transpose([data, min_data, …])

param data

Array to be transposed.

requantize([data, min_range, max_range, …])

Given data that is quantized in int32 and the corresponding thresholds, requantize the data into int8 using min and max thresholds either calculated at runtime or from calibration.

round_ste([data, name, attr, out])

Straight-through-estimator of round().

sign_ste([data, name, attr, out])

Straight-through-estimator of sign().

sldwin_atten_context([score, value, …])

Compute the context vector for sliding window attention, used in Longformer (https://arxiv.org/pdf/2004.05150.pdf).

sldwin_atten_mask_like([score, dilation, …])

Compute the mask for the sliding window attention score, used in Longformer (https://arxiv.org/pdf/2004.05150.pdf).

sldwin_atten_score([query, key, dilation, …])

Compute the sliding window attention score, which is used in Longformer (https://arxiv.org/pdf/2004.05150.pdf).

AdaptiveAvgPooling2D(data=None, kernel=_Null, pool_type=_Null, global_pool=_Null, cudnn_off=_Null, pooling_convention=_Null, stride=_Null, pad=_Null, p_value=_Null, count_include_pad=_Null, layout=_Null, output_size=_Null, name=None, attr=None, out=None, **kwargs)

Applies a 2D adaptive average pooling over a 4D input with the shape of (NCHW). The pooling kernel and stride sizes are automatically chosen for desired output sizes.

  • If a single integer is provided for output_size, the output size is (N x C x output_size x output_size) for any input (NCHW).

  • If a tuple of integers (height, width) are provided for output_size, the output size is (N x C x height x width) for any input (NCHW).

Defined in /work/mxnet/src/operator/contrib/adaptive_avg_pooling.cc:L308

Parameters
  • data (Symbol) – Input data

  • kernel (Shape(tuple), optional, default=[]) – Pooling kernel size: (y, x) or (d, y, x)

  • pool_type ({'avg', 'lp', 'max', 'sum'},optional, default='max') – Pooling type to be applied.

  • global_pool (boolean, optional, default=0) – Ignore kernel size, do global pooling based on current input feature map.

  • cudnn_off (boolean, optional, default=0) – Turn off cudnn pooling and use MXNet pooling operator.

  • pooling_convention ({'full', 'same', 'valid'},optional, default='valid') – Pooling convention to be applied.

  • stride (Shape(tuple), optional, default=[]) – Stride: for pooling (y, x) or (d, y, x). Defaults to 1 for each dimension.

  • pad (Shape(tuple), optional, default=[]) – Pad for pooling: (y, x) or (d, y, x). Defaults to no padding.

  • p_value (int or None, optional, default='None') – Value of p for Lp pooling, can be 1 or 2, required for Lp Pooling.

  • count_include_pad (boolean or None, optional, default=None) – Only used for AvgPool, specify whether to count padding elements for averagecalculation. For example, with a 5*5 kernel on a 3*3 corner of a image,the sum of the 9 valid elements will be divided by 25 if this is set to true,or it will be divided by 9 if this is set to false. Defaults to true.

  • layout ({None, 'NCDHW', 'NCHW', 'NCW', 'NDHWC', 'NHWC', 'NWC'},optional, default='None') – Set layout for input and output. Empty for default layout: NCW for 1d, NCHW for 2d and NCDHW for 3d.

  • output_size (Shape or None, optional, default=None) – Only used for Adaptive Pooling. int (output size) or a tuple of int for output (height, width).

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

BilinearResize2D(data=None, like=None, height=_Null, width=_Null, scale_height=_Null, scale_width=_Null, mode=_Null, align_corners=_Null, name=None, attr=None, out=None, **kwargs)

Perform 2D resizing (upsampling or downsampling) for 4D input using bilinear interpolation.

Expected input is a 4 dimensional NDArray (NCHW) and the output with the shape of (N x C x height x width). The key idea of bilinear interpolation is to perform linear interpolation first in one direction, and then again in the other direction. See the wikipedia of Bilinear interpolation for more details.

Defined in /work/mxnet/src/operator/contrib/bilinear_resize.cc:L211

Parameters
  • data (Symbol) – Input data

  • like (Symbol) – Resize data to it’s shape

  • height (int, optional, default='1') – output height (required, but ignored if scale_height is defined or mode is not “size”)

  • width (int, optional, default='1') – output width (required, but ignored if scale_width is defined or mode is not “size”)

  • scale_height (float or None, optional, default=None) – sampling scale of the height (optional, used in modes “scale” and “odd_scale”)

  • scale_width (float or None, optional, default=None) – sampling scale of the width (optional, used in modes “scale” and “odd_scale”)

  • mode ({'like', 'odd_scale', 'size', 'to_even_down', 'to_even_up', 'to_odd_down', 'to_odd_up'},optional, default='size') – resizing mode. “simple” - output height equals parameter “height” if “scale_height” parameter is not defined or input height multiplied by “scale_height” otherwise. Same for width;”odd_scale” - if original height or width is odd, then result height is calculated like result_h = (original_h - 1) * scale + 1; for scale > 1 the result shape would be like if we did deconvolution with kernel = (1, 1) and stride = (height_scale, width_scale); and for scale < 1 shape would be like we did convolution with kernel = (1, 1) and stride = (int(1 / height_scale), int( 1/ width_scale);”like” - resize first input to the height and width of second input; “to_even_down” - resize input to nearest lower even height and width (if original height is odd then result height = original height - 1);”to_even_up” - resize input to nearest bigger even height and width (if original height is odd then result height = original height + 1);”to_odd_down” - resize input to nearest odd height and width (if original height is odd then result height = original height - 1);”to_odd_up” - resize input to nearest odd height and width (if original height is odd then result height = original height + 1);

  • align_corners (boolean, optional, default=1) – With align_corners = True, the interpolating doesn’t proportionally align theoutput and input pixels, and thus the output values can depend on the input size.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

CTCLoss(data=None, label=None, data_lengths=None, label_lengths=None, use_data_lengths=_Null, use_label_lengths=_Null, blank_label=_Null, name=None, attr=None, out=None, **kwargs)

Connectionist Temporal Classification Loss.

Note

The existing alias contrib_CTCLoss is deprecated.

The shapes of the inputs and outputs:

  • data: (sequence_length, batch_size, alphabet_size)

  • label: (batch_size, label_sequence_length)

  • out: (batch_size)

The data tensor consists of sequences of activation vectors (without applying softmax), with i-th channel in the last dimension corresponding to i-th label for i between 0 and alphabet_size-1 (i.e always 0-indexed). Alphabet size should include one additional value reserved for blank label. When blank_label is "first", the 0-th channel is be reserved for activation of blank label, or otherwise if it is “last”, (alphabet_size-1)-th channel should be reserved for blank label.

label is an index matrix of integers. When blank_label is "first", the value 0 is then reserved for blank label, and should not be passed in this matrix. Otherwise, when blank_label is "last", the value (alphabet_size-1) is reserved for blank label.

If a sequence of labels is shorter than label_sequence_length, use the special padding value at the end of the sequence to conform it to the correct length. The padding value is 0 when blank_label is "first", and -1 otherwise.

For example, suppose the vocabulary is [a, b, c], and in one batch we have three sequences ‘ba’, ‘cbb’, and ‘abac’. When blank_label is "first", we can index the labels as {‘a’: 1, ‘b’: 2, ‘c’: 3}, and we reserve the 0-th channel for blank label in data tensor. The resulting label tensor should be padded to be:

[[2, 1, 0, 0], [3, 2, 2, 0], [1, 2, 1, 3]]

When blank_label is "last", we can index the labels as {‘a’: 0, ‘b’: 1, ‘c’: 2}, and we reserve the channel index 3 for blank label in data tensor. The resulting label tensor should be padded to be:

[[1, 0, -1, -1], [2, 1, 1, -1], [0, 1, 0, 2]]

out is a list of CTC loss values, one per example in the batch.

See Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks, A. Graves et al. for more information on the definition and the algorithm.

Defined in /work/mxnet/src/operator/nn/ctc_loss.cc:L104

Parameters
  • data (Symbol) – Input ndarray

  • label (Symbol) – Ground-truth labels for the loss.

  • data_lengths (Symbol) – Lengths of data for each of the samples. Only required when use_data_lengths is true.

  • label_lengths (Symbol) – Lengths of labels for each of the samples. Only required when use_label_lengths is true.

  • use_data_lengths (boolean, optional, default=0) – Whether the data lenghts are decided by data_lengths. If false, the lengths are equal to the max sequence length.

  • use_label_lengths (boolean, optional, default=0) – Whether the label lenghts are decided by label_lengths, or derived from padding_mask. If false, the lengths are derived from the first occurrence of the value of padding_mask. The value of padding_mask is 0 when first CTC label is reserved for blank, and -1 when last label is reserved for blank. See blank_label.

  • blank_label ({'first', 'last'},optional, default='first') – Set the label that is reserved for blank label.If “first”, 0-th label is reserved, and label values for tokens in the vocabulary are between 1 and alphabet_size-1, and the padding mask is -1. If “last”, last label value alphabet_size-1 is reserved for blank label instead, and label values for tokens in the vocabulary are between 0 and alphabet_size-2, and the padding mask is 0.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

DeformablePSROIPooling(data=None, rois=None, trans=None, spatial_scale=_Null, output_dim=_Null, group_size=_Null, pooled_size=_Null, part_size=_Null, sample_per_part=_Null, trans_std=_Null, no_trans=_Null, name=None, attr=None, out=None, **kwargs)

Performs deformable position-sensitive region-of-interest pooling on inputs. The DeformablePSROIPooling operation is described in https://arxiv.org/abs/1703.06211 .batch_size will change to the number of region bounding boxes after DeformablePSROIPooling

Parameters
  • data (Symbol) – Input data to the pooling operator, a 4D Feature maps

  • rois (Symbol) – Bounding box coordinates, a 2D array of [[batch_index, x1, y1, x2, y2]]. (x1, y1) and (x2, y2) are top left and down right corners of designated region of interest. batch_index indicates the index of corresponding image in the input data

  • trans (Symbol) – transition parameter

  • spatial_scale (float, required) – Ratio of input feature map height (or w) to raw image height (or w). Equals the reciprocal of total stride in convolutional layers

  • output_dim (long, required) – fix output dim

  • group_size (long, required) – fix group size

  • pooled_size (long, required) – fix pooled size

  • part_size (long, optional, default=0) – fix part size

  • sample_per_part (long, optional, default=1) – fix samples per part

  • trans_std (float, optional, default=0) – fix transition std

  • no_trans (boolean, optional, default=0) – Whether to disable trans parameter.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

MultiBoxDetection(cls_prob=None, loc_pred=None, anchor=None, clip=_Null, threshold=_Null, background_id=_Null, nms_threshold=_Null, force_suppress=_Null, variances=_Null, nms_topk=_Null, name=None, attr=None, out=None, **kwargs)

Convert multibox detection predictions.

Parameters
  • cls_prob (Symbol) – Class probabilities.

  • loc_pred (Symbol) – Location regression predictions.

  • anchor (Symbol) – Multibox prior anchor boxes

  • clip (boolean, optional, default=1) – Clip out-of-boundary boxes.

  • threshold (float, optional, default=0.00999999978) – Threshold to be a positive prediction.

  • background_id (int, optional, default='0') – Background id.

  • nms_threshold (float, optional, default=0.5) – Non-maximum suppression threshold.

  • force_suppress (boolean, optional, default=0) – Suppress all detections regardless of class_id.

  • variances (tuple of <float>, optional, default=[0.1,0.1,0.2,0.2]) – Variances to be decoded from box regression output.

  • nms_topk (int, optional, default='-1') – Keep maximum top k detections before nms, -1 for no limit.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

MultiBoxPrior(data=None, sizes=_Null, ratios=_Null, clip=_Null, steps=_Null, offsets=_Null, name=None, attr=None, out=None, **kwargs)

Generate prior(anchor) boxes from data, sizes and ratios.

Parameters
  • data (Symbol) – Input data.

  • sizes (tuple of <float>, optional, default=[1]) – List of sizes of generated MultiBoxPriores.

  • ratios (tuple of <float>, optional, default=[1]) – List of aspect ratios of generated MultiBoxPriores.

  • clip (boolean, optional, default=0) – Whether to clip out-of-boundary boxes.

  • steps (tuple of <float>, optional, default=[-1,-1]) – Priorbox step across y and x, -1 for auto calculation.

  • offsets (tuple of <float>, optional, default=[0.5,0.5]) – Priorbox center offsets, y and x respectively

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

MultiBoxTarget(anchor=None, label=None, cls_pred=None, overlap_threshold=_Null, ignore_label=_Null, negative_mining_ratio=_Null, negative_mining_thresh=_Null, minimum_negative_samples=_Null, variances=_Null, name=None, attr=None, out=None, **kwargs)

Compute Multibox training targets

Parameters
  • anchor (Symbol) – Generated anchor boxes.

  • label (Symbol) – Object detection labels.

  • cls_pred (Symbol) – Class predictions.

  • overlap_threshold (float, optional, default=0.5) – Anchor-GT overlap threshold to be regarded as a positive match.

  • ignore_label (float, optional, default=-1) – Label for ignored anchors.

  • negative_mining_ratio (float, optional, default=-1) – Max negative to positive samples ratio, use -1 to disable mining

  • negative_mining_thresh (float, optional, default=0.5) – Threshold used for negative mining.

  • minimum_negative_samples (int, optional, default='0') – Minimum number of negative samples.

  • variances (tuple of <float>, optional, default=[0.1,0.1,0.2,0.2]) – Variances to be encoded in box regression target.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

MultiProposal(cls_prob=None, bbox_pred=None, im_info=None, rpn_pre_nms_top_n=_Null, rpn_post_nms_top_n=_Null, threshold=_Null, rpn_min_size=_Null, scales=_Null, ratios=_Null, feature_stride=_Null, output_score=_Null, iou_loss=_Null, name=None, attr=None, out=None, **kwargs)

Generate region proposals via RPN

Parameters
  • cls_prob (Symbol) – Score of how likely proposal is object.

  • bbox_pred (Symbol) – BBox Predicted deltas from anchors for proposals

  • im_info (Symbol) – Image size and scale.

  • rpn_pre_nms_top_n (int, optional, default='6000') – Number of top scoring boxes to keep before applying NMS to RPN proposals

  • rpn_post_nms_top_n (int, optional, default='300') – Number of top scoring boxes to keep after applying NMS to RPN proposals

  • threshold (float, optional, default=0.699999988) – NMS value, below which to suppress.

  • rpn_min_size (int, optional, default='16') – Minimum height or width in proposal

  • scales (tuple of <float>, optional, default=[4,8,16,32]) – Used to generate anchor windows by enumerating scales

  • ratios (tuple of <float>, optional, default=[0.5,1,2]) – Used to generate anchor windows by enumerating ratios

  • feature_stride (int, optional, default='16') – The size of the receptive field each unit in the convolution layer of the rpn,for example the product of all stride’s prior to this layer.

  • output_score (boolean, optional, default=0) – Add score to outputs

  • iou_loss (boolean, optional, default=0) – Usage of IoU Loss

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

PSROIPooling(data=None, rois=None, spatial_scale=_Null, output_dim=_Null, pooled_size=_Null, group_size=_Null, name=None, attr=None, out=None, **kwargs)

Performs region-of-interest pooling on inputs. Resize bounding box coordinates by spatial_scale and crop input feature maps accordingly. The cropped feature maps are pooled by max pooling to a fixed size output indicated by pooled_size. batch_size will change to the number of region bounding boxes after PSROIPooling

Parameters
  • data (Symbol) – Input data to the pooling operator, a 4D Feature maps

  • rois (Symbol) – Bounding box coordinates, a 2D array of [[batch_index, x1, y1, x2, y2]]. (x1, y1) and (x2, y2) are top left and down right corners of designated region of interest. batch_index indicates the index of corresponding image in the input data

  • spatial_scale (float, required) – Ratio of input feature map height (or w) to raw image height (or w). Equals the reciprocal of total stride in convolutional layers

  • output_dim (int, required) – fix output dim

  • pooled_size (int, required) – fix pooled size

  • group_size (int, optional, default='0') – fix group size

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

Proposal(cls_prob=None, bbox_pred=None, im_info=None, rpn_pre_nms_top_n=_Null, rpn_post_nms_top_n=_Null, threshold=_Null, rpn_min_size=_Null, scales=_Null, ratios=_Null, feature_stride=_Null, output_score=_Null, iou_loss=_Null, name=None, attr=None, out=None, **kwargs)

Generate region proposals via RPN

Parameters
  • cls_prob (Symbol) – Score of how likely proposal is object.

  • bbox_pred (Symbol) – BBox Predicted deltas from anchors for proposals

  • im_info (Symbol) – Image size and scale.

  • rpn_pre_nms_top_n (int, optional, default='6000') – Number of top scoring boxes to keep before applying NMS to RPN proposals

  • rpn_post_nms_top_n (int, optional, default='300') – Number of top scoring boxes to keep after applying NMS to RPN proposals

  • threshold (float, optional, default=0.699999988) – NMS value, below which to suppress.

  • rpn_min_size (int, optional, default='16') – Minimum height or width in proposal

  • scales (tuple of <float>, optional, default=[4,8,16,32]) – Used to generate anchor windows by enumerating scales

  • ratios (tuple of <float>, optional, default=[0.5,1,2]) – Used to generate anchor windows by enumerating ratios

  • feature_stride (int, optional, default='16') – The size of the receptive field each unit in the convolution layer of the rpn,for example the product of all stride’s prior to this layer.

  • output_score (boolean, optional, default=0) – Add score to outputs

  • iou_loss (boolean, optional, default=0) – Usage of IoU Loss

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

ROIAlign(data=None, rois=None, pooled_size=_Null, spatial_scale=_Null, sample_ratio=_Null, position_sensitive=_Null, aligned=_Null, name=None, attr=None, out=None, **kwargs)

This operator takes a 4D feature map as an input array and region proposals as rois, then align the feature map over sub-regions of input and produces a fixed-sized output array. This operator is typically used in Faster R-CNN & Mask R-CNN networks. If roi batchid is less than 0, it will be ignored, and the corresponding output will be set to 0.

Different from ROI pooling, ROI Align removes the harsh quantization, properly aligning the extracted features with the input. RoIAlign computes the value of each sampling point by bilinear interpolation from the nearby grid points on the feature map. No quantization is performed on any coordinates involved in the RoI, its bins, or the sampling points. Bilinear interpolation is used to compute the exact values of the input features at four regularly sampled locations in each RoI bin. Then the feature map can be aggregated by avgpooling.

References

He, Kaiming, et al. “Mask R-CNN.” ICCV, 2017

Defined in /work/mxnet/src/operator/contrib/roi_align.cc:L551

Parameters
  • data (Symbol) – Input data to the pooling operator, a 4D Feature maps

  • rois (Symbol) – Bounding box coordinates, a 2D array, if batchid is less than 0, it will be ignored.

  • pooled_size (Shape(tuple), required) – ROI Align output roi feature map height and width: (h, w)

  • spatial_scale (float, required) – Ratio of input feature map height (or w) to raw image height (or w). Equals the reciprocal of total stride in convolutional layers

  • sample_ratio (int, optional, default='-1') – Optional sampling ratio of ROI align, using adaptive size by default.

  • position_sensitive (boolean, optional, default=0) – Whether to perform position-sensitive RoI pooling. PSRoIPooling is first proposaled by R-FCN and it can reduce the input channels by ph*pw times, where (ph, pw) is the pooled_size

  • aligned (boolean, optional, default=0) – Center-aligned ROIAlign introduced in Detectron2. To enable, set aligned to True.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

RROIAlign(data=None, rois=None, pooled_size=_Null, spatial_scale=_Null, sampling_ratio=_Null, name=None, attr=None, out=None, **kwargs)

Performs Rotated ROI Align on the input array.

This operator takes a 4D feature map as an input array and region proposals as rois, then align the feature map over sub-regions of input and produces a fixed-sized output array.

Different from ROI Align, RROI Align uses rotated rois, which is suitable for text detection. RRoIAlign computes the value of each sampling point by bilinear interpolation from the nearby grid points on the rotated feature map. No quantization is performed on any coordinates involved in the RoI, its bins, or the sampling points. Bilinear interpolation is used to compute the exact values of the input features at four regularly sampled locations in each RoI bin. Then the feature map can be aggregated by avgpooling.

References

Ma, Jianqi, et al. “Arbitrary-Oriented Scene Text Detection via Rotation Proposals.” IEEE Transactions on Multimedia, 2018.

Defined in /work/mxnet/src/operator/contrib/rroi_align.cc:L300

Parameters
  • data (Symbol) – Input data to the pooling operator, a 4D Feature maps

  • rois (Symbol) – Bounding box coordinates, a 2D array

  • pooled_size (Shape(tuple), required) – RROI align output shape (h,w)

  • spatial_scale (float, required) – Ratio of input feature map height (or width) to raw image height (or width). Equals the reciprocal of total stride in convolutional layers

  • sampling_ratio (int, optional, default='-1') – Optional sampling ratio of RROI align, using adaptive size by default.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

SyncBatchNorm(data=None, gamma=None, beta=None, moving_mean=None, moving_var=None, eps=_Null, momentum=_Null, fix_gamma=_Null, use_global_stats=_Null, output_mean_var=_Null, ndev=_Null, key=_Null, name=None, attr=None, out=None, **kwargs)

Batch normalization.

Normalizes a data batch by mean and variance, and applies a scale gamma as well as offset beta. Standard BN 1 implementation only normalize the data within each device. SyncBN normalizes the input within the whole mini-batch. We follow the sync-onece implmentation described in the paper 2.

Assume the input has more than one dimension and we normalize along axis 1. We first compute the mean and variance along this axis:

\[\begin{split}data\_mean[i] = mean(data[:,i,:,...]) \\ data\_var[i] = var(data[:,i,:,...])\end{split}\]

Then compute the normalized output, which has the same shape as input, as following:

\[out[:,i,:,...] = \frac{data[:,i,:,...] - data\_mean[i]}{\sqrt{data\_var[i]+\epsilon}} * gamma[i] + beta[i]\]

Both mean and var returns a scalar by treating the input as a vector.

Assume the input has size k on axis 1, then both gamma and beta have shape (k,). If output_mean_var is set to be true, then outputs both data_mean and data_var as well, which are needed for the backward pass.

Besides the inputs and the outputs, this operator accepts two auxiliary states, moving_mean and moving_var, which are k-length vectors. They are global statistics for the whole dataset, which are updated by:

moving_mean = moving_mean * momentum + data_mean * (1 - momentum)
moving_var = moving_var * momentum + data_var * (1 - momentum)

If use_global_stats is set to be true, then moving_mean and moving_var are used instead of data_mean and data_var to compute the output. It is often used during inference.

Both gamma and beta are learnable parameters. But if fix_gamma is true, then set gamma to 1 and its gradient to 0.

Reference:
1

Ioffe, Sergey, and Christian Szegedy. “Batch normalization: Accelerating deep network training by reducing internal covariate shift.” ICML 2015

2

Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, and Amit Agrawal. “Context Encoding for Semantic Segmentation.” CVPR 2018

Defined in /work/mxnet/src/operator/contrib/sync_batch_norm.cc:L97

Parameters
  • data (Symbol) – Input data to batch normalization

  • gamma (Symbol) – gamma array

  • beta (Symbol) – beta array

  • moving_mean (Symbol) – running mean of input

  • moving_var (Symbol) – running variance of input

  • eps (float, optional, default=0.00100000005) – Epsilon to prevent div 0

  • momentum (float, optional, default=0.899999976) – Momentum for moving average

  • fix_gamma (boolean, optional, default=1) – Fix gamma while training

  • use_global_stats (boolean, optional, default=0) – Whether use global moving statistics instead of local batch-norm. This will force change batch-norm into a scale shift operator.

  • output_mean_var (boolean, optional, default=0) – Output All,normal mean and var

  • ndev (int, optional, default='1') – The count of GPU devices

  • key (string, required) – Hash key for synchronization, please set the same hash key for same layer, Block.prefix is typically used as in gluon.nn.contrib.SyncBatchNorm.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

allclose(a=None, b=None, rtol=_Null, atol=_Null, equal_nan=_Null, name=None, attr=None, out=None, **kwargs)

This operators implements the numpy.allclose(a, b, rtol=1e-05, atol=1e-08, equal_nan=False)

\[f(x) = |a-b|<=atol+rtol|b|\]

where \(a, b\) are the input tensors of equal types an shapes \(atol, rtol\) the values of absolute and relative tolerance (by default, rtol=1e-05, atol=1e-08)

Examples:

a = [1e10, 1e-7],
b = [1.00001e10, 1e-8]
y = allclose(a, b)
y = False

a = [1e10, 1e-8],
b = [1.00001e10, 1e-9]
y = allclose(a, b)
y = True

Defined in /work/mxnet/src/operator/contrib/allclose_op.cc:L56

Parameters
  • a (Symbol) – Input array a

  • b (Symbol) – Input array b

  • rtol (float, optional, default=9.99999975e-06) – Relative tolerance.

  • atol (float, optional, default=9.99999994e-09) – Absolute tolerance.

  • equal_nan (boolean, optional, default=1) – Whether to compare NaN’s as equal. If True, NaN’s in A will be considered equal to NaN’s in B in the output array.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

arange_like(data=None, start=_Null, step=_Null, repeat=_Null, ctx=_Null, axis=_Null, name=None, attr=None, out=None, **kwargs)

Return an array with evenly spaced values. If axis is not given, the output will have the same shape as the input array. Otherwise, the output will be a 1-D array with size of the specified axis in input shape.

Examples:

x = [[0.14883883 0.7772398  0.94865847 0.7225052 ]
     [0.23729339 0.6112595  0.66538996 0.5132841 ]
     [0.30822644 0.9912457  0.15502319 0.7043658 ]]
     <NDArray 3x4 @cpu(0)>

out = mx.nd.contrib.arange_like(x, start=0)

  [[ 0.  1.  2.  3.]
   [ 4.  5.  6.  7.]
   [ 8.  9. 10. 11.]]
   <NDArray 3x4 @cpu(0)>

out = mx.nd.contrib.arange_like(x, start=0, axis=-1)

  [0. 1. 2. 3.]
  <NDArray 4 @cpu(0)>
Parameters
  • data (Symbol) – The input

  • start (double, optional, default=0) – Start of interval. The interval includes this value. The default start value is 0.

  • step (double, optional, default=1) – Spacing between values.

  • repeat (int, optional, default='1') – The repeating time of all elements. E.g repeat=3, the element a will be repeated three times –> a, a, a.

  • ctx (string, optional, default='') – Context of output, in format [cpu|gpu|cpu_pinned](n).Only used for imperative calls.

  • axis (int or None, optional, default='None') – Arange elements according to the size of a certain axis of input array. The negative numbers are interpreted counting from the backward. If not provided, will arange elements according to the input shape.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

backward_gradientmultiplier(data=None, scalar=_Null, is_int=_Null, name=None, attr=None, out=None, **kwargs)
Parameters
  • data (Symbol) – source input

  • scalar (double, optional, default=1) – Scalar input value

  • is_int (boolean, optional, default=1) – Indicate whether scalar input is int type

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

backward_hawkesll(name=None, attr=None, out=None, **kwargs)
Parameters

name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

backward_index_copy(name=None, attr=None, out=None, **kwargs)
Parameters

name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

backward_quadratic(name=None, attr=None, out=None, **kwargs)
Parameters

name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

bipartite_matching(data=None, is_ascend=_Null, threshold=_Null, topk=_Null, name=None, attr=None, out=None, **kwargs)
Compute bipartite matching.

The matching is performed on score matrix with shape [B, N, M] - B: batch_size - N: number of rows to match - M: number of columns as reference to be matched against.

Returns: x : matched column indices. -1 indicating non-matched elements in rows. y : matched row indices.

Note:

Zero gradients are back-propagated in this op for now.

Example:

s = [[0.5, 0.6], [0.1, 0.2], [0.3, 0.4]]
x, y = bipartite_matching(x, threshold=1e-12, is_ascend=False)
x = [1, -1, 0]
y = [2, 0]

Defined in /work/mxnet/src/operator/contrib/bounding_box.cc:L183

Parameters
  • data (Symbol) – The input

  • is_ascend (boolean, optional, default=0) – Use ascend order for scores instead of descending. Please set threshold accordingly.

  • threshold (float, required) – Ignore matching when score < thresh, if is_ascend=false, or ignore score > thresh, if is_ascend=true.

  • topk (int, optional, default='-1') – Limit the number of matches to topk, set -1 for no limit

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

boolean_mask(data=None, index=None, axis=_Null, name=None, attr=None, out=None, **kwargs)

Given an n-d NDArray data, and a 1-d NDArray index, the operator produces an un-predeterminable shaped n-d NDArray out, which stands for the rows in x where the corresonding element in index is non-zero.

>>> data = mx.nd.array([[1, 2, 3],[4, 5, 6],[7, 8, 9]])
>>> index = mx.nd.array([0, 1, 0])
>>> out = mx.nd.contrib.boolean_mask(data, index)
>>> out

[[4. 5. 6.]] <NDArray 1x3 @cpu(0)>

Defined in /work/mxnet/src/operator/contrib/boolean_mask.cc:L213

Parameters
  • data (Symbol) – Data

  • index (Symbol) – Mask

  • axis (int, optional, default='0') – An integer that represents the axis in NDArray to mask from.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

box_decode(data=None, anchors=None, std0=_Null, std1=_Null, std2=_Null, std3=_Null, clip=_Null, format=_Null, name=None, attr=None, out=None, **kwargs)
Decode bounding boxes training target with normalized center offsets.

Input bounding boxes are using corner type: x_{min}, y_{min}, x_{max}, y_{max} or center type: x, y, width, height.) array

Defined in /work/mxnet/src/operator/contrib/bounding_box.cc:L241

Parameters
  • data (Symbol) – (B, N, 4) predicted bbox offset

  • anchors (Symbol) – (1, N, 4) encoded in corner or center

  • std0 (float, optional, default=1) – value to be divided from the 1st encoded values

  • std1 (float, optional, default=1) – value to be divided from the 2nd encoded values

  • std2 (float, optional, default=1) – value to be divided from the 3rd encoded values

  • std3 (float, optional, default=1) – value to be divided from the 4th encoded values

  • clip (float, optional, default=-1) – If larger than 0, bounding box target will be clipped to this value.

  • format ({'center', 'corner'},optional, default='center') – The box encoding type. “corner” means boxes are encoded as [xmin, ymin, xmax, ymax], “center” means boxes are encodes as [x, y, width, height].

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

box_encode(samples=None, matches=None, anchors=None, refs=None, means=None, stds=None, name=None, attr=None, out=None, **kwargs)
Encode bounding boxes training target with normalized center offsets.

Input bounding boxes are using corner type: x_{min}, y_{min}, x_{max}, y_{max}.) array

Defined in /work/mxnet/src/operator/contrib/bounding_box.cc:L212

Parameters
  • samples (Symbol) – (B, N) value +1 (positive), -1 (negative), 0 (ignore)

  • matches (Symbol) – (B, N) value range [0, M)

  • anchors (Symbol) – (B, N, 4) encoded in corner

  • refs (Symbol) – (B, M, 4) encoded in corner

  • means (Symbol) – (4,) Mean value to be subtracted from encoded values

  • stds (Symbol) – (4,) Std value to be divided from encoded values

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

box_iou(lhs=None, rhs=None, format=_Null, name=None, attr=None, out=None, **kwargs)
Bounding box overlap of two arrays.

The overlap is defined as Intersection-over-Union, aka, IOU. - lhs: (a_1, a_2, …, a_n, 4) array - rhs: (b_1, b_2, …, b_n, 4) array - output: (a_1, a_2, …, a_n, b_1, b_2, …, b_n) array

Note:

Zero gradients are back-propagated in this op for now.

Example:

x = [[0.5, 0.5, 1.0, 1.0], [0.0, 0.0, 0.5, 0.5]]
y = [[0.25, 0.25, 0.75, 0.75]]
box_iou(x, y, format='corner') = [[0.1428], [0.1428]]

Defined in /work/mxnet/src/operator/contrib/bounding_box.cc:L136

Parameters
  • lhs (Symbol) – The first input

  • rhs (Symbol) – The second input

  • format ({'center', 'corner'},optional, default='corner') – The box encoding type. “corner” means boxes are encoded as [xmin, ymin, xmax, ymax], “center” means boxes are encodes as [x, y, width, height].

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

box_nms(data=None, overlap_thresh=_Null, valid_thresh=_Null, topk=_Null, coord_start=_Null, score_index=_Null, id_index=_Null, background_id=_Null, force_suppress=_Null, in_format=_Null, out_format=_Null, name=None, attr=None, out=None, **kwargs)

Apply non-maximum suppression to input.

The output will be sorted in descending order according to score. Boxes with overlaps larger than overlap_thresh, smaller scores and background boxes will be removed and filled with -1, the corresponding position will be recorded for backward propogation.

During back-propagation, the gradient will be copied to the original position according to the input index. For positions that have been suppressed, the in_grad will be assigned 0. In summary, gradients are sticked to its boxes, will either be moved or discarded according to its original index in input.

Input requirements:

1. Input tensor have at least 2 dimensions, (n, k), any higher dims will be regarded
as batch, e.g. (a, b, c, d, n, k) == (a*b*c*d, n, k)
2. n is the number of boxes in each batch
3. k is the width of each box item.

By default, a box is [id, score, xmin, ymin, xmax, ymax, …], additional elements are allowed.

  • id_index: optional, use -1 to ignore, useful if force_suppress=False, which means we will skip highly overlapped boxes if one is apple while the other is car.

  • background_id: optional, default=-1, class id for background boxes, useful when id_index >= 0 which means boxes with background id will be filtered before nms.

  • coord_start: required, default=2, the starting index of the 4 coordinates. Two formats are supported:

    • corner: [xmin, ymin, xmax, ymax]

    • center: [x, y, width, height]

  • score_index: required, default=1, box score/confidence. When two boxes overlap IOU > overlap_thresh, the one with smaller score will be suppressed.

  • in_format and out_format: default=’corner’, specify in/out box formats.

Examples:

x = [[0, 0.5, 0.1, 0.1, 0.2, 0.2], [1, 0.4, 0.1, 0.1, 0.2, 0.2],
     [0, 0.3, 0.1, 0.1, 0.14, 0.14], [2, 0.6, 0.5, 0.5, 0.7, 0.8]]
box_nms(x, overlap_thresh=0.1, coord_start=2, score_index=1, id_index=0,
    force_suppress=True, in_format='corner', out_typ='corner') =
    [[2, 0.6, 0.5, 0.5, 0.7, 0.8], [0, 0.5, 0.1, 0.1, 0.2, 0.2],
     [-1, -1, -1, -1, -1, -1], [-1, -1, -1, -1, -1, -1]]
out_grad = [[0.1, 0.1, 0.1, 0.1, 0.1, 0.1], [0.2, 0.2, 0.2, 0.2, 0.2, 0.2],
            [0.3, 0.3, 0.3, 0.3, 0.3, 0.3], [0.4, 0.4, 0.4, 0.4, 0.4, 0.4]]
# exe.backward
in_grad = [[0.2, 0.2, 0.2, 0.2, 0.2, 0.2], [0, 0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0, 0], [0.1, 0.1, 0.1, 0.1, 0.1, 0.1]]

Defined in /work/mxnet/src/operator/contrib/bounding_box.cc:L93

Parameters
  • data (Symbol) – The input

  • overlap_thresh (float, optional, default=0.5) – Overlapping(IoU) threshold to suppress object with smaller score.

  • valid_thresh (float, optional, default=0) – Filter input boxes to those whose scores greater than valid_thresh.

  • topk (int, optional, default='-1') – Apply nms to topk boxes with descending scores, -1 to no restriction.

  • coord_start (int, optional, default='2') – Start index of the consecutive 4 coordinates.

  • score_index (int, optional, default='1') – Index of the scores/confidence of boxes.

  • id_index (int, optional, default='-1') – Optional, index of the class categories, -1 to disable.

  • background_id (int, optional, default='-1') – Optional, id of the background class which will be ignored in nms.

  • force_suppress (boolean, optional, default=0) – Optional, if set false and id_index is provided, nms will only apply to boxes belongs to the same category

  • in_format ({'center', 'corner'},optional, default='corner') – The input box encoding type. “corner” means boxes are encoded as [xmin, ymin, xmax, ymax], “center” means boxes are encodes as [x, y, width, height].

  • out_format ({'center', 'corner'},optional, default='corner') – The output box encoding type. “corner” means boxes are encoded as [xmin, ymin, xmax, ymax], “center” means boxes are encodes as [x, y, width, height].

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

box_non_maximum_suppression(data=None, overlap_thresh=_Null, valid_thresh=_Null, topk=_Null, coord_start=_Null, score_index=_Null, id_index=_Null, background_id=_Null, force_suppress=_Null, in_format=_Null, out_format=_Null, name=None, attr=None, out=None, **kwargs)

Apply non-maximum suppression to input.

The output will be sorted in descending order according to score. Boxes with overlaps larger than overlap_thresh, smaller scores and background boxes will be removed and filled with -1, the corresponding position will be recorded for backward propogation.

During back-propagation, the gradient will be copied to the original position according to the input index. For positions that have been suppressed, the in_grad will be assigned 0. In summary, gradients are sticked to its boxes, will either be moved or discarded according to its original index in input.

Input requirements:

1. Input tensor have at least 2 dimensions, (n, k), any higher dims will be regarded
as batch, e.g. (a, b, c, d, n, k) == (a*b*c*d, n, k)
2. n is the number of boxes in each batch
3. k is the width of each box item.

By default, a box is [id, score, xmin, ymin, xmax, ymax, …], additional elements are allowed.

  • id_index: optional, use -1 to ignore, useful if force_suppress=False, which means we will skip highly overlapped boxes if one is apple while the other is car.

  • background_id: optional, default=-1, class id for background boxes, useful when id_index >= 0 which means boxes with background id will be filtered before nms.

  • coord_start: required, default=2, the starting index of the 4 coordinates. Two formats are supported:

    • corner: [xmin, ymin, xmax, ymax]

    • center: [x, y, width, height]

  • score_index: required, default=1, box score/confidence. When two boxes overlap IOU > overlap_thresh, the one with smaller score will be suppressed.

  • in_format and out_format: default=’corner’, specify in/out box formats.

Examples:

x = [[0, 0.5, 0.1, 0.1, 0.2, 0.2], [1, 0.4, 0.1, 0.1, 0.2, 0.2],
     [0, 0.3, 0.1, 0.1, 0.14, 0.14], [2, 0.6, 0.5, 0.5, 0.7, 0.8]]
box_nms(x, overlap_thresh=0.1, coord_start=2, score_index=1, id_index=0,
    force_suppress=True, in_format='corner', out_typ='corner') =
    [[2, 0.6, 0.5, 0.5, 0.7, 0.8], [0, 0.5, 0.1, 0.1, 0.2, 0.2],
     [-1, -1, -1, -1, -1, -1], [-1, -1, -1, -1, -1, -1]]
out_grad = [[0.1, 0.1, 0.1, 0.1, 0.1, 0.1], [0.2, 0.2, 0.2, 0.2, 0.2, 0.2],
            [0.3, 0.3, 0.3, 0.3, 0.3, 0.3], [0.4, 0.4, 0.4, 0.4, 0.4, 0.4]]
# exe.backward
in_grad = [[0.2, 0.2, 0.2, 0.2, 0.2, 0.2], [0, 0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0, 0], [0.1, 0.1, 0.1, 0.1, 0.1, 0.1]]

Defined in /work/mxnet/src/operator/contrib/bounding_box.cc:L93

Parameters
  • data (Symbol) – The input

  • overlap_thresh (float, optional, default=0.5) – Overlapping(IoU) threshold to suppress object with smaller score.

  • valid_thresh (float, optional, default=0) – Filter input boxes to those whose scores greater than valid_thresh.

  • topk (int, optional, default='-1') – Apply nms to topk boxes with descending scores, -1 to no restriction.

  • coord_start (int, optional, default='2') – Start index of the consecutive 4 coordinates.

  • score_index (int, optional, default='1') – Index of the scores/confidence of boxes.

  • id_index (int, optional, default='-1') – Optional, index of the class categories, -1 to disable.

  • background_id (int, optional, default='-1') – Optional, id of the background class which will be ignored in nms.

  • force_suppress (boolean, optional, default=0) – Optional, if set false and id_index is provided, nms will only apply to boxes belongs to the same category

  • in_format ({'center', 'corner'},optional, default='corner') – The input box encoding type. “corner” means boxes are encoded as [xmin, ymin, xmax, ymax], “center” means boxes are encodes as [x, y, width, height].

  • out_format ({'center', 'corner'},optional, default='corner') – The output box encoding type. “corner” means boxes are encoded as [xmin, ymin, xmax, ymax], “center” means boxes are encodes as [x, y, width, height].

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

calibrate_entropy(hist=None, hist_edges=None, num_quantized_bins=_Null, name=None, attr=None, out=None, **kwargs)

Provide calibrated min/max for input histogram.

Note

This operator only supports forward propagation. DO NOT use it in training.

Defined in /work/mxnet/src/operator/quantization/calibrate.cc:L207

Parameters
  • hist (Symbol) – A ndarray/symbol of type float32

  • hist_edges (Symbol) – A ndarray/symbol of type float32

  • num_quantized_bins (int, optional, default='255') – The number of quantized bins.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

count_sketch(data=None, h=None, s=None, out_dim=_Null, processing_batch_size=_Null, name=None, attr=None, out=None, **kwargs)

Apply CountSketch to input: map a d-dimension data to k-dimension data”

Note

count_sketch is only available on GPU.

Assume input data has shape (N, d), sign hash table s has shape (N, d), index hash table h has shape (N, d) and mapping dimension out_dim = k, each element in s is either +1 or -1, each element in h is random integer from 0 to k-1. Then the operator computs:

\[out[h[i]] += data[i] * s[i]\]

Example:

out_dim = 5
x = [[1.2, 2.5, 3.4],[3.2, 5.7, 6.6]]
h = [[0, 3, 4]]
s = [[1, -1, 1]]
mx.contrib.ndarray.count_sketch(data=x, h=h, s=s, out_dim = 5) = [[1.2, 0, 0, -2.5, 3.4],
                                                                  [3.2, 0, 0, -5.7, 6.6]]

Defined in /work/mxnet/src/operator/contrib/count_sketch.cc:L67

Parameters
  • data (Symbol) – Input data to the CountSketchOp.

  • h (Symbol) – The index vector

  • s (Symbol) – The sign vector

  • out_dim (int, required) – The output dimension.

  • processing_batch_size (int, optional, default='32') – How many sketch vectors to process at one time.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

ctc_loss(data=None, label=None, data_lengths=None, label_lengths=None, use_data_lengths=_Null, use_label_lengths=_Null, blank_label=_Null, name=None, attr=None, out=None, **kwargs)

Connectionist Temporal Classification Loss.

Note

The existing alias contrib_CTCLoss is deprecated.

The shapes of the inputs and outputs:

  • data: (sequence_length, batch_size, alphabet_size)

  • label: (batch_size, label_sequence_length)

  • out: (batch_size)

The data tensor consists of sequences of activation vectors (without applying softmax), with i-th channel in the last dimension corresponding to i-th label for i between 0 and alphabet_size-1 (i.e always 0-indexed). Alphabet size should include one additional value reserved for blank label. When blank_label is "first", the 0-th channel is be reserved for activation of blank label, or otherwise if it is “last”, (alphabet_size-1)-th channel should be reserved for blank label.

label is an index matrix of integers. When blank_label is "first", the value 0 is then reserved for blank label, and should not be passed in this matrix. Otherwise, when blank_label is "last", the value (alphabet_size-1) is reserved for blank label.

If a sequence of labels is shorter than label_sequence_length, use the special padding value at the end of the sequence to conform it to the correct length. The padding value is 0 when blank_label is "first", and -1 otherwise.

For example, suppose the vocabulary is [a, b, c], and in one batch we have three sequences ‘ba’, ‘cbb’, and ‘abac’. When blank_label is "first", we can index the labels as {‘a’: 1, ‘b’: 2, ‘c’: 3}, and we reserve the 0-th channel for blank label in data tensor. The resulting label tensor should be padded to be:

[[2, 1, 0, 0], [3, 2, 2, 0], [1, 2, 1, 3]]

When blank_label is "last", we can index the labels as {‘a’: 0, ‘b’: 1, ‘c’: 2}, and we reserve the channel index 3 for blank label in data tensor. The resulting label tensor should be padded to be:

[[1, 0, -1, -1], [2, 1, 1, -1], [0, 1, 0, 2]]

out is a list of CTC loss values, one per example in the batch.

See Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks, A. Graves et al. for more information on the definition and the algorithm.

Defined in /work/mxnet/src/operator/nn/ctc_loss.cc:L104

Parameters
  • data (Symbol) – Input ndarray

  • label (Symbol) – Ground-truth labels for the loss.

  • data_lengths (Symbol) – Lengths of data for each of the samples. Only required when use_data_lengths is true.

  • label_lengths (Symbol) – Lengths of labels for each of the samples. Only required when use_label_lengths is true.

  • use_data_lengths (boolean, optional, default=0) – Whether the data lenghts are decided by data_lengths. If false, the lengths are equal to the max sequence length.

  • use_label_lengths (boolean, optional, default=0) – Whether the label lenghts are decided by label_lengths, or derived from padding_mask. If false, the lengths are derived from the first occurrence of the value of padding_mask. The value of padding_mask is 0 when first CTC label is reserved for blank, and -1 when last label is reserved for blank. See blank_label.

  • blank_label ({'first', 'last'},optional, default='first') – Set the label that is reserved for blank label.If “first”, 0-th label is reserved, and label values for tokens in the vocabulary are between 1 and alphabet_size-1, and the padding mask is -1. If “last”, last label value alphabet_size-1 is reserved for blank label instead, and label values for tokens in the vocabulary are between 0 and alphabet_size-2, and the padding mask is 0.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

dequantize(data=None, min_range=None, max_range=None, out_type=_Null, name=None, attr=None, out=None, **kwargs)

Dequantize the input tensor into a float tensor. min_range and max_range are scalar floats that specify the range for the output data.

When input data type is uint8, the output is calculated using the following equation:

out[i] = in[i] * (max_range - min_range) / 255.0,

When input data type is int8, the output is calculate using the following equation by keep zero centered for the quantized value:

out[i] = in[i] * MaxAbs(min_range, max_range) / 127.0,

Note

This operator only supports forward propogation. DO NOT use it in training.

Defined in /work/mxnet/src/operator/quantization/dequantize.cc:L81

Parameters
  • data (Symbol) – A ndarray/symbol of type uint8

  • min_range (Symbol) – The minimum scalar value possibly produced for the input in float32

  • max_range (Symbol) – The maximum scalar value possibly produced for the input in float32

  • out_type ({'float32'},optional, default='float32') – Output data type.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

dgl_adjacency(data=None, name=None, attr=None, out=None, **kwargs)

This operator converts a CSR matrix whose values are edge Ids to an adjacency matrix whose values are ones. The output CSR matrix always has the data value of float32.

Example

x = [[ 1, 0, 0 ],
     [ 0, 2, 0 ],
     [ 0, 0, 3 ]]
dgl_adjacency(x) =
    [[ 1, 0, 0 ],
     [ 0, 1, 0 ],
     [ 0, 0, 1 ]]

Defined in /work/mxnet/src/operator/contrib/dgl_graph.cc:L1419

Parameters
  • data (Symbol) – Input ndarray

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

dgl_csr_neighbor_non_uniform_sample(*seed_arrays, **kwargs)

This operator samples sub-graph from a csr graph via an non-uniform probability. The operator is designed for DGL.

The operator outputs four sets of NDArrays to represent the sampled results (the number of NDArrays in each set is the same as the number of seed NDArrays): 1) a set of 1D NDArrays containing the sampled vertices, 2) a set of CSRNDArrays representing the sampled edges, 3) a set of 1D NDArrays with the probability that vertices are sampled, 4) a set of 1D NDArrays indicating the layer where a vertex is sampled. The first set of 1D NDArrays have a length of max_num_vertices+1. The last element in an NDArray indicate the acutal number of vertices in a subgraph. The third and fourth set of NDArrays have a length of max_num_vertices, and the valid number of vertices is the same as the ones in the first set.

Example

shape = (5, 5)
prob = mx.nd.array([0.9, 0.8, 0.2, 0.4, 0.1], dtype=np.float32)
data_np = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20], dtype=np.int64)
indices_np = np.array([1,2,3,4,0,2,3,4,0,1,3,4,0,1,2,4,0,1,2,3], dtype=np.int64)
indptr_np = np.array([0,4,8,12,16,20], dtype=np.int64)
a = mx.nd.sparse.csr_matrix((data_np, indices_np, indptr_np), shape=shape)
seed = mx.nd.array([0,1,2,3,4], dtype=np.int64)
out = mx.nd.contrib.dgl_csr_neighbor_non_uniform_sample(a, prob, seed, num_args=3, num_hops=1, num_neighbor=2, max_num_vertices=5)

out[0]
[0 1 2 3 4 5]
<NDArray 6 @cpu(0)>

out[1].asnumpy()
array([[ 0,  1,  2,  0,  0],
       [ 5,  0,  6,  0,  0],
       [ 9, 10,  0,  0,  0],
       [13, 14,  0,  0,  0],
       [ 0, 18, 19,  0,  0]])

out[2]
[0.9 0.8 0.2 0.4 0.1]
<NDArray 5 @cpu(0)>

out[3]
[0 0 0 0 0]
<NDArray 5 @cpu(0)>

Defined in /work/mxnet/src/operator/contrib/dgl_graph.cc:L886 This function support variable length of positional input.

Parameters
  • csr_matrix (Symbol) – csr matrix

  • probability (Symbol) – probability vector

  • seed_arrays (Symbol[]) – seed vertices

  • num_hops (long, optional, default=1) – Number of hops.

  • num_neighbor (long, optional, default=2) – Number of neighbor.

  • max_num_vertices (long, optional, default=100) – Max number of vertices.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

dgl_csr_neighbor_uniform_sample(*seed_arrays, **kwargs)

This operator samples sub-graphs from a csr graph via an uniform probability. The operator is designed for DGL.

The operator outputs three sets of NDArrays to represent the sampled results (the number of NDArrays in each set is the same as the number of seed NDArrays): 1) a set of 1D NDArrays containing the sampled vertices, 2) a set of CSRNDArrays representing the sampled edges, 3) a set of 1D NDArrays indicating the layer where a vertex is sampled. The first set of 1D NDArrays have a length of max_num_vertices+1. The last element in an NDArray indicate the acutal number of vertices in a subgraph. The third set of NDArrays have a length of max_num_vertices, and the valid number of vertices is the same as the ones in the first set.

Example

shape = (5, 5)
data_np = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20], dtype=np.int64)
indices_np = np.array([1,2,3,4,0,2,3,4,0,1,3,4,0,1,2,4,0,1,2,3], dtype=np.int64)
indptr_np = np.array([0,4,8,12,16,20], dtype=np.int64)
a = mx.nd.sparse.csr_matrix((data_np, indices_np, indptr_np), shape=shape)
a.asnumpy()
seed = mx.nd.array([0,1,2,3,4], dtype=np.int64)
out = mx.nd.contrib.dgl_csr_neighbor_uniform_sample(a, seed, num_args=2, num_hops=1, num_neighbor=2, max_num_vertices=5)

out[0]
[0 1 2 3 4 5]
<NDArray 6 @cpu(0)>

out[1].asnumpy()
array([[ 0,  1,  0,  3,  0],
       [ 5,  0,  0,  7,  0],
       [ 9,  0,  0, 11,  0],
       [13,  0, 15,  0,  0],
       [17,  0, 19,  0,  0]])

out[2]
[0 0 0 0 0]
<NDArray 5 @cpu(0)>

Defined in /work/mxnet/src/operator/contrib/dgl_graph.cc:L777 This function support variable length of positional input.

Parameters
  • csr_matrix (Symbol) – csr matrix

  • seed_arrays (Symbol[]) – seed vertices

  • num_hops (long, optional, default=1) – Number of hops.

  • num_neighbor (long, optional, default=2) – Number of neighbor.

  • max_num_vertices (long, optional, default=100) – Max number of vertices.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

dgl_graph_compact(*graph_data, **kwargs)

This operator compacts a CSR matrix generated by dgl_csr_neighbor_uniform_sample and dgl_csr_neighbor_non_uniform_sample. The CSR matrices generated by these two operators may have many empty rows at the end and many empty columns. This operator removes these empty rows and empty columns.

Example

shape = (5, 5)
data_np = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20], dtype=np.int64)
indices_np = np.array([1,2,3,4,0,2,3,4,0,1,3,4,0,1,2,4,0,1,2,3], dtype=np.int64)
indptr_np = np.array([0,4,8,12,16,20], dtype=np.int64)
a = mx.nd.sparse.csr_matrix((data_np, indices_np, indptr_np), shape=shape)
seed = mx.nd.array([0,1,2,3,4], dtype=np.int64)
out = mx.nd.contrib.dgl_csr_neighbor_uniform_sample(a, seed, num_args=2, num_hops=1,
        num_neighbor=2, max_num_vertices=6)
subg_v = out[0]
subg = out[1]
compact = mx.nd.contrib.dgl_graph_compact(subg, subg_v,
        graph_sizes=(subg_v[-1].asnumpy()[0]), return_mapping=False)

compact.asnumpy()
array([[0, 0, 0, 1, 0],
       [2, 0, 3, 0, 0],
       [0, 4, 0, 0, 5],
       [0, 6, 0, 0, 7],
       [8, 9, 0, 0, 0]])

Defined in /work/mxnet/src/operator/contrib/dgl_graph.cc:L1608 This function support variable length of positional input.

Parameters
  • graph_data (Symbol[]) – Input graphs and input vertex Ids.

  • return_mapping (boolean, required) – Return mapping of vid and eid between the subgraph and the parent graph.

  • graph_sizes (tuple of <long>, required) – the number of vertices in each graph.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

dgl_subgraph(*data, **kwargs)

This operator constructs an induced subgraph for a given set of vertices from a graph. The operator accepts multiple sets of vertices as input. For each set of vertices, it returns a pair of CSR matrices if return_mapping is True: the first matrix contains edges with new edge Ids, the second matrix contains edges with the original edge Ids.

Example

x=[[1, 0, 0, 2],
  [3, 0, 4, 0],
  [0, 5, 0, 0],
  [0, 6, 7, 0]]
v = [0, 1, 2]
dgl_subgraph(x, v, return_mapping=True) =
  [[1, 0, 0],
   [2, 0, 3],
   [0, 4, 0]],
  [[1, 0, 0],
   [3, 0, 4],
   [0, 5, 0]]

Defined in /work/mxnet/src/operator/contrib/dgl_graph.cc:L1154 This function support variable length of positional input.

Parameters
  • graph (Symbol) – Input graph where we sample vertices.

  • data (Symbol[]) – The input arrays that include data arrays and states.

  • return_mapping (boolean, required) – Return mapping of vid and eid between the subgraph and the parent graph.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

div_sqrt_dim(data=None, name=None, attr=None, out=None, **kwargs)

Rescale the input by the square root of the channel dimension.

out = data / sqrt(data.shape[-1])

Defined in /work/mxnet/src/operator/contrib/transformer.cc:L881

Parameters
  • data (Symbol) – The input array.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

dynamic_reshape(data=None, shape=None, name=None, attr=None, out=None, **kwargs)

Experimental support for reshape operator with dynamic shape.

Accepts 2 inputs - data and shape. The output returns data in the new shape.

Some dimensions of the shape can take special values from the set {0, -1, -2, -3, -4}. The significance of each is explained below: - 0 copy this dimension from the input to the output shape. Example:

- input shape = (2,3,4), shape = (4,0,2), output shape = (4,3,2)
- input shape = (2,3,4), shape = (2,0,0), output shape = (2,3,4)
  • -1 infers the dimension of the output shape by using the remainder of the input dimensions keeping the size of the new array same as that of the input array. At most one dimension of shape can be -1. Example:

    - input shape = (2,3,4), shape = (6,1,-1), output shape = (6,1,4)
    - input shape = (2,3,4), shape = (3,-1,8), output shape = (3,1,8)
    - input shape = (2,3,4), shape=(-1,), output shape = (24,)
    
  • -2 copy all/remainder of the input dimensions to the output shape. Example:

    - input shape = (2,3,4), shape = (-2,), output shape = (2,3,4)
    - input shape = (2,3,4), shape = (2,-2), output shape = (2,3,4)
    - input shape = (2,3,4), shape = (-2,1,1), output shape = (2,3,4,1,1)
    
  • -3 use the product of two consecutive dimensions of the input shape as the output dimension. Example:

    - input shape = (2,3,4), shape = (-3,4), output shape = (6,4)
    - input shape = (2,3,4,5), shape = (-3,-3), output shape = (6,20)
    - input shape = (2,3,4), shape = (0,-3), output shape = (2,12)
    - input shape = (2,3,4), shape = (-3,-2), output shape = (6,4)
    
  • -4 split one dimension of the input into two dimensions passed subsequent to -4 in shape (can contain -1). Example:

    - input shape = (2,3,4), shape = (-4,1,2,-2), output shape =(1,2,3,4)
    - input shape = (2,3,4), shape = (2,-4,-1,3,-2), output shape = (2,1,3,4)
    

Example:

data = mx.nd.array(np.random.normal(0,1,(2,3,5,5)))
shape = mx.nd.array((0,-1))
out = mx.sym.contrib.dynamic_reshape(data = data, shape = shape)
// out will be of shape (2,75)

Defined in /work/mxnet/src/operator/contrib/dynamic_shape_ops.cc:L120

Parameters
  • data (Symbol) – Data

  • shape (Symbol) – Shape

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

edge_id(data=None, u=None, v=None, name=None, attr=None, out=None, **kwargs)

This operator implements the edge_id function for a graph stored in a CSR matrix (the value of the CSR stores the edge Id of the graph). output[i] = input[u[i], v[i]] if there is an edge between u[i] and v[i]], otherwise output[i] will be -1. Both u and v should be 1D vectors.

Example

x = [[ 1, 0, 0 ],
     [ 0, 2, 0 ],
     [ 0, 0, 3 ]]
u = [ 0, 0, 1, 1, 2, 2 ]
v = [ 0, 1, 1, 2, 0, 2 ]
edge_id(x, u, v) = [ 1, -1, 2, -1, -1, 3 ]
The storage type of edge_id output depends on storage types of inputs
  • edge_id(csr, default, default) = default

  • default and rsp inputs are not supported

Defined in /work/mxnet/src/operator/contrib/dgl_graph.cc:L1347

Parameters
  • data (Symbol) – Input ndarray

  • u (Symbol) – u ndarray

  • v (Symbol) – v ndarray

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

fft(data=None, compute_size=_Null, name=None, attr=None, out=None, **kwargs)

Apply 1D FFT to input”

Note

fft is only available on GPU.

Currently accept 2 input data shapes: (N, d) or (N1, N2, N3, d), data can only be real numbers. The output data has shape: (N, 2*d) or (N1, N2, N3, 2*d). The format is: [real0, imag0, real1, imag1, …].

Example:

data = np.random.normal(0,1,(3,4))
out = mx.contrib.ndarray.fft(data = mx.nd.array(data,ctx = mx.gpu(0)))

Defined in /work/mxnet/src/operator/contrib/fft.cc:L56

Parameters
  • data (Symbol) – Input data to the FFTOp.

  • compute_size (int, optional, default='128') – Maximum size of sub-batch to be forwarded at one time

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

getnnz(data=None, axis=_Null, name=None, attr=None, out=None, **kwargs)

Number of stored values for a sparse tensor, including explicit zeros.

This operator only supports CSR matrix on CPU.

Defined in /work/mxnet/src/operator/contrib/nnz.cc:L177

Parameters
  • data (Symbol) – Input

  • axis (int or None, optional, default='None') – Select between the number of values across the whole matrix, in each column, or in each row.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

gradientmultiplier(data=None, scalar=_Null, is_int=_Null, name=None, attr=None, out=None, **kwargs)

This operator implements the gradient multiplier function. In forward pass it acts as an identity transform. During backpropagation it multiplies the gradient from the subsequent level by a scalar factor lambda and passes it to the preceding layer.

Defined in /work/mxnet/src/operator/contrib/gradient_multiplier_op.cc:L77

Parameters
  • data (Symbol) – The input array.

  • scalar (double, optional, default=1) – Scalar input value

  • is_int (boolean, optional, default=1) – Indicate whether scalar input is int type

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

group_adagrad_update(weight=None, grad=None, history=None, lr=_Null, rescale_grad=_Null, clip_gradient=_Null, epsilon=_Null, name=None, attr=None, out=None, **kwargs)

Update function for Group AdaGrad optimizer.

Referenced from Adaptive Subgradient Methods for Online Learning and Stochastic Optimization, and available at http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf but uses only a single learning rate for every row of the parameter array.

Updates are applied by:

grad = clip(grad * rescale_grad, clip_gradient)
history += mean(square(grad), axis=1, keepdims=True)
div = grad / (sqrt(history) + epsilon)
weight -= div * lr

Weights are updated lazily if the gradient is sparse.

Note that non-zero values for the weight decay option are not supported.

Defined in /work/mxnet/src/operator/contrib/optimizer_op.cc:L69

Parameters
  • weight (Symbol) – Weight

  • grad (Symbol) – Gradient

  • history (Symbol) – History

  • lr (float, required) – Learning rate

  • rescale_grad (float, optional, default=1) – Rescale gradient to grad = rescale_grad*grad.

  • clip_gradient (float, optional, default=-1) – Clip gradient to the range of [-clip_gradient, clip_gradient] If clip_gradient <= 0, gradient clipping is turned off. grad = max(min(grad, clip_gradient), -clip_gradient).

  • epsilon (float, optional, default=9.99999975e-06) – Epsilon for numerical stability

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

hawkesll(lda=None, alpha=None, beta=None, state=None, lags=None, marks=None, valid_length=None, max_time=None, name=None, attr=None, out=None, **kwargs)

Computes the log likelihood of a univariate Hawkes process.

The log likelihood is calculated on point process observations represented as ragged matrices for lags (interarrival times w.r.t. the previous point), and marks (identifiers for the process ID). Note that each mark is considered independent, i.e., computes the joint likelihood of a set of Hawkes processes determined by the conditional intensity:

\[\lambda_k^*(t) = \lambda_k + \alpha_k \sum_{\{t_i < t, y_i = k\}} \beta_k \exp(-\beta_k (t - t_i))\]

where \(\lambda_k\) specifies the background intensity lda, \(\alpha_k\) specifies the branching ratio or alpha, and \(\beta_k\) the delay density parameter beta.

lags and marks are two NDArrays of shape (N, T) and correspond to the representation of the point process observation, the first dimension corresponds to the batch index, and the second to the sequence. These are “left-aligned” ragged matrices (the first index of the second dimension is the beginning of every sequence. The length of each sequence is given by valid_length, of shape (N,) where valid_length[i] corresponds to the number of valid points in lags[i, :] and marks[i, :].

max_time is the length of the observation period of the point process. That is, specifying max_time[i] = 5 computes the likelihood of the i-th sample as observed on the time interval \((0, 5]\). Naturally, the sum of all valid lags[i, :valid_length[i]] must be less than or equal to 5.

The input state specifies the memory of the Hawkes process. Invoking the memoryless property of exponential decays, we compute the memory as

\[s_k(t) = \sum_{t_i < t} \exp(-\beta_k (t - t_i)).\]

The state to be provided is \(s_k(0)\) and carries the added intensity due to past events before the current batch. \(s_k(T)\) is returned from the function where \(T\) is max_time[T].

Example:

# define the Hawkes process parameters
lda = nd.array([1.5, 2.0, 3.0]).tile((N, 1))
alpha = nd.array([0.2, 0.3, 0.4])  # branching ratios should be < 1
beta = nd.array([1.0, 2.0, 3.0])

# the "data", or observations
ia_times = nd.array([[6, 7, 8, 9], [1, 2, 3, 4], [3, 4, 5, 6], [8, 9, 10, 11]])
marks = nd.zeros((N, T)).astype(np.int32)

# starting "state" of the process
states = nd.zeros((N, K))

valid_length = nd.array([1, 2, 3, 4])  # number of valid points in each sequence
max_time = nd.ones((N,)) * 100.0  # length of the observation period

A = nd.contrib.hawkesll(
    lda, alpha, beta, states, ia_times, marks, valid_length, max_time
)

References:

  • Bacry, E., Mastromatteo, I., & Muzy, J. F. (2015). Hawkes processes in finance. Market Microstructure and Liquidity , 1(01), 1550005.

Defined in /work/mxnet/src/operator/contrib/hawkes_ll.cc:L83

Parameters
  • lda (Symbol) – Shape (N, K) The intensity for each of the K processes, for each sample

  • alpha (Symbol) – Shape (K,) The infectivity factor (branching ratio) for each process

  • beta (Symbol) – Shape (K,) The decay parameter for each process

  • state (Symbol) – Shape (N, K) the Hawkes state for each process

  • lags (Symbol) – Shape (N, T) the interarrival times

  • marks (Symbol) – Shape (N, T) the marks (process ids)

  • valid_length (Symbol) – The number of valid points in the process

  • max_time (Symbol) – the length of the interval where the processes were sampled

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

index_array(data=None, axes=_Null, name=None, attr=None, out=None, **kwargs)

Returns an array of indexes of the input array.

For an input array with shape \((d_1, d_2, ..., d_n)\), index_array returns a \((d_1, d_2, ..., d_n, n)\) array idx, where \(idx[i_1, i_2, ..., i_n, :] = [i_1, i_2, ..., i_n]\).

Additionally, when the parameter axes is specified, idx will be a \((d_1, d_2, ..., d_n, m)\) array where m is the length of axes, and the following equality will hold: \(idx[i_1, i_2, ..., i_n, j] = i_{axes[j]}\).

Examples:

x = mx.nd.ones((3, 2))

mx.nd.contrib.index_array(x) = [[[0 0]
                                 [0 1]]

                                [[1 0]
                                 [1 1]]

                                [[2 0]
                                 [2 1]]]

x = mx.nd.ones((3, 2, 2))

mx.nd.contrib.index_array(x, axes=(1, 0)) = [[[[0 0]
                                               [0 0]]

                                              [[1 0]
                                               [1 0]]]


                                             [[[0 1]
                                               [0 1]]

                                              [[1 1]
                                               [1 1]]]


                                             [[[0 2]
                                               [0 2]]

                                              [[1 2]
                                               [1 2]]]]

Defined in /work/mxnet/src/operator/contrib/index_array.cc:L117

Parameters
  • data (Symbol) – Input data

  • axes (Shape or None, optional, default=None) – The axes to include in the index array. Supports negative values.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

index_copy(old_tensor=None, index_vector=None, new_tensor=None, name=None, attr=None, out=None, **kwargs)

Copies the elements of a new_tensor into the old_tensor.

This operator copies the elements by selecting the indices in the order given in index. The output will be a new tensor containing the rest elements of old tensor and the copied elements of new tensor. For example, if index[i] == j, then the i th row of new_tensor is copied to the j th row of output.

The index must be a vector and it must have the same size with the 0 th dimension of new_tensor. Also, the 0 th dimension of old_tensor must >= the 0 th dimension of new_tensor, or an error will be raised.

Examples:

x = mx.nd.zeros((5,3))
t = mx.nd.array([[1,2,3],[4,5,6],[7,8,9]])
index = mx.nd.array([0,4,2])

mx.nd.contrib.index_copy(x, index, t)

[[1. 2. 3.]
 [0. 0. 0.]
 [7. 8. 9.]
 [0. 0. 0.]
 [4. 5. 6.]]
<NDArray 5x3 @cpu(0)>

Defined in /work/mxnet/src/operator/contrib/index_copy.cc:L206

Parameters
  • old_tensor (Symbol) – Old tensor

  • index_vector (Symbol) – Index vector

  • new_tensor (Symbol) – New tensor to be copied

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

interleaved_matmul_encdec_qk(queries=None, keys_values=None, heads=_Null, name=None, attr=None, out=None, **kwargs)

Compute the matrix multiplication between the projections of queries and keys in multihead attention use as encoder-decoder.

the inputs must be a tensor of projections of queries following the layout: (seq_length, batch_size, num_heads * head_dim)

and a tensor of interleaved projections of values and keys following the layout: (seq_length, batch_size, num_heads * head_dim * 2)

the equivalent code would be:

q_proj = mx.nd.transpose(queries, axes=(1, 2, 0, 3))
q_proj = mx.nd.reshape(q_proj, shape=(-1, 0, 0), reverse=True)
q_proj = mx.nd.contrib.div_sqrt_dim(q_proj)
tmp = mx.nd.reshape(keys_values, shape=(0, 0, num_heads, 2, -1))
k_proj = mx.nd.transpose(tmp[:,:,:,0,:], axes=(1, 2, 0, 3))
k_proj = mx.nd.reshap(k_proj, shape=(-1, 0, 0), reverse=True)
output = mx.nd.batch_dot(q_proj, k_proj, transpose_b=True)

Defined in /work/mxnet/src/operator/contrib/transformer.cc:L797

Parameters
  • queries (Symbol) – Queries

  • keys_values (Symbol) – Keys and values interleaved

  • heads (int, required) – Set number of heads

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

interleaved_matmul_encdec_valatt(keys_values=None, attention=None, heads=_Null, name=None, attr=None, out=None, **kwargs)

Compute the matrix multiplication between the projections of values and the attention weights in multihead attention use as encoder-decoder.

the inputs must be a tensor of interleaved projections of keys and values following the layout: (seq_length, batch_size, num_heads * head_dim * 2)

and the attention weights following the layout: (batch_size, seq_length, seq_length)

the equivalent code would be:

tmp = mx.nd.reshape(queries_keys_values, shape=(0, 0, num_heads, 3, -1))
v_proj = mx.nd.transpose(tmp[:,:,:,1,:], axes=(1, 2, 0, 3))
v_proj = mx.nd.reshape(v_proj, shape=(-1, 0, 0), reverse=True)
output = mx.nd.batch_dot(attention, v_proj, transpose_b=True)
output = mx.nd.reshape(output, shape=(-1, num_heads, 0, 0), reverse=True)
output = mx.nd.transpose(output, axes=(0, 2, 1, 3))
output = mx.nd.reshape(output, shape=(0, 0, -1))

Defined in /work/mxnet/src/operator/contrib/transformer.cc:L847

Parameters
  • keys_values (Symbol) – Keys and values interleaved

  • attention (Symbol) – Attention maps

  • heads (int, required) – Set number of heads

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

interleaved_matmul_selfatt_qk(queries_keys_values=None, heads=_Null, name=None, attr=None, out=None, **kwargs)

Compute the matrix multiplication between the projections of queries and keys in multihead attention use as self attention.

the input must be a single tensor of interleaved projections of queries, keys and values following the layout: (seq_length, batch_size, num_heads * head_dim * 3)

the equivalent code would be:

tmp = mx.nd.reshape(queries_keys_values, shape=(0, 0, num_heads, 3, -1))
q_proj = mx.nd.transpose(tmp[:,:,:,0,:], axes=(1, 2, 0, 3))
q_proj = mx.nd.reshape(q_proj, shape=(-1, 0, 0), reverse=True)
q_proj = mx.nd.contrib.div_sqrt_dim(q_proj)
k_proj = mx.nd.transpose(tmp[:,:,:,1,:], axes=(1, 2, 0, 3))
k_proj = mx.nd.reshape(k_proj, shape=(-1, 0, 0), reverse=True)
output = mx.nd.batch_dot(q_proj, k_proj, transpose_b=True)

Defined in /work/mxnet/src/operator/contrib/transformer.cc:L694

Parameters
  • queries_keys_values (Symbol) – Interleaved queries, keys and values

  • heads (int, required) – Set number of heads

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

interleaved_matmul_selfatt_valatt(queries_keys_values=None, attention=None, heads=_Null, name=None, attr=None, out=None, **kwargs)

Compute the matrix multiplication between the projections of values and the attention weights in multihead attention use as self attention.

the inputs must be a tensor of interleaved projections of queries, keys and values following the layout: (seq_length, batch_size, num_heads * head_dim * 3)

and the attention weights following the layout: (batch_size, seq_length, seq_length)

the equivalent code would be:

tmp = mx.nd.reshape(queries_keys_values, shape=(0, 0, num_heads, 3, -1))
v_proj = mx.nd.transpose(tmp[:,:,:,2,:], axes=(1, 2, 0, 3))
v_proj = mx.nd.reshape(v_proj, shape=(-1, 0, 0), reverse=True)
output = mx.nd.batch_dot(attention, v_proj)
output = mx.nd.reshape(output, shape=(-1, num_heads, 0, 0), reverse=True)
output = mx.nd.transpose(output, axes=(2, 0, 1, 3))
output = mx.nd.reshape(output, shape=(0, 0, -1))

Defined in /work/mxnet/src/operator/contrib/transformer.cc:L745

Parameters
  • queries_keys_values (Symbol) – Queries, keys and values interleaved

  • attention (Symbol) – Attention maps

  • heads (int, required) – Set number of heads

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

intgemm_fully_connected(data=None, weight=None, scaling=None, bias=None, num_hidden=_Null, no_bias=_Null, flatten=_Null, out_type=_Null, name=None, attr=None, out=None, **kwargs)

Multiply matrices using 8-bit integers. data * weight.

Input tensor arguments are: data weight [scaling] [bias]

data: either float32 or prepared using intgemm_prepare_data (in which case it is int8).

weight: must be prepared using intgemm_prepare_weight.

scaling: present if and only if out_type is float32. If so this is multiplied by the result before adding bias. Typically: scaling = (max passed to intgemm_prepare_weight)/127.0 if data is in float32 scaling = (max_passed to intgemm_prepare_data)/127.0 * (max passed to intgemm_prepare_weight)/127.0 if data is in int8

bias: present if and only if !no_bias. This is added to the output after scaling and has the same number of columns as the output.

out_type: type of the output.

Defined in /work/mxnet/src/operator/contrib/intgemm/intgemm_fully_connected_op.cc:L284

Parameters
  • data (Symbol) – First argument to multiplication. Tensor of float32 (quantized on the fly) or int8 from intgemm_prepare_data. If you use a different quantizer, be sure to ban -128. The last dimension must be a multiple of 64.

  • weight (Symbol) – Second argument to multiplication. Tensor of int8 from intgemm_prepare_weight. The last dimension must be a multiple of 64. The product of non-last dimensions must be a multiple of 8.

  • scaling (Symbol) – Scaling factor to apply if output type is float32.

  • bias (Symbol) – Bias term.

  • num_hidden (int, required) – Number of hidden nodes of the output.

  • no_bias (boolean, optional, default=0) – Whether to disable bias parameter.

  • flatten (boolean, optional, default=1) – Whether to collapse all but the first axis of the input data tensor.

  • out_type ({'float32', 'int32'},optional, default='float32') – Output data type.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

intgemm_maxabsolute(data=None, name=None, attr=None, out=None, **kwargs)

Compute the maximum absolute value in a tensor of float32 fast on a CPU. The tensor’s total size must be a multiple of 16 and aligned to a multiple of 64 bytes. mxnet.nd.contrib.intgemm_maxabsolute(arr) == arr.abs().max()

Defined in /work/mxnet/src/operator/contrib/intgemm/max_absolute_op.cc:L102

Parameters
  • data (Symbol) – Tensor to compute maximum absolute value of

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

intgemm_prepare_data(data=None, maxabs=None, name=None, attr=None, out=None, **kwargs)

This operator converts quantizes float32 to int8 while also banning -128.

It it suitable for preparing an data matrix for use by intgemm’s C=data * weights operation.

The float32 values are scaled such that maxabs maps to 127. Typically maxabs = maxabsolute(A).

Defined in /work/mxnet/src/operator/contrib/intgemm/prepare_data_op.cc:L112

Parameters
  • data (Symbol) – Activation matrix to be prepared for multiplication.

  • maxabs (Symbol) – Maximum absolute value to be used for scaling. (The values will be multiplied by 127.0 / maxabs.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

intgemm_prepare_weight(weight=None, maxabs=None, already_quantized=_Null, name=None, attr=None, out=None, **kwargs)

This operator converts a weight matrix in column-major format to intgemm’s internal fast representation of weight matrices. MXNet customarily stores weight matrices in column-major (transposed) format. This operator is not meant to be fast; it is meant to be run offline to quantize a model.

In other words, it prepares weight for the operation C = data * weight^T.

If the provided weight matrix is float32, it will be quantized first. The quantization function is (int8_t)(127.0 / max * weight) where multiplier is provided as argument 1 (the weight matrix is argument 0). Then the matrix will be rearranged into the CPU-dependent format.

If the provided weight matrix is already int8, the matrix will only be rearranged into the CPU-dependent format. This way one can quantize with intgemm_prepare_data (which just quantizes), store to disk in a consistent format, then at load time convert to CPU-dependent format with intgemm_prepare_weight.

The internal representation depends on register length. So AVX512, AVX2, and SSSE3 have different formats. AVX512BW and AVX512VNNI have the same representation.

Defined in /work/mxnet/src/operator/contrib/intgemm/prepare_weight_op.cc:L152

Parameters
  • weight (Symbol) – Parameter matrix to be prepared for multiplication.

  • maxabs (Symbol) – Maximum absolute value for scaling. The weights will be multipled by 127.0 / maxabs.

  • already_quantized (boolean, optional, default=0) – Is the weight matrix already quantized?

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

intgemm_take_weight(weight=None, indices=None, name=None, attr=None, out=None, **kwargs)

Index a weight matrix stored in intgemm’s weight format. The indices select the outputs of matrix multiplication, not the inner dot product dimension.

Defined in /work/mxnet/src/operator/contrib/intgemm/take_weight_op.cc:L125

Parameters
  • weight (Symbol) – Tensor already in intgemm weight format to select from

  • indices (Symbol) – indices to select on the 0th dimension of weight

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

mrcnn_mask_target(rois=None, gt_masks=None, matches=None, cls_targets=None, num_rois=_Null, num_classes=_Null, mask_size=_Null, sample_ratio=_Null, aligned=_Null, name=None, attr=None, out=None, **kwargs)

Generate mask targets for Mask-RCNN.

Parameters
  • rois (Symbol) – Bounding box coordinates, a 3D array

  • gt_masks (Symbol) – Input masks of full image size, a 4D array

  • matches (Symbol) – Index to a gt_mask, a 2D array

  • cls_targets (Symbol) – Value [0, num_class), excluding background class, a 2D array

  • num_rois (int, required) – Number of sampled RoIs.

  • num_classes (int, required) – Number of classes.

  • mask_size (Shape(tuple), required) – Size of the pooled masks height and width: (h, w).

  • sample_ratio (int, optional, default='2') – Sampling ratio of ROI align. Set to -1 to use adaptative size.

  • aligned (boolean, optional, default=0) – Center-aligned ROIAlign introduced in Detectron2. To enable, set aligned to True.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

quadratic(data=None, a=_Null, b=_Null, c=_Null, name=None, attr=None, out=None, **kwargs)

This operators implements the quadratic function.

\[f(x) = ax^2+bx+c\]

where \(x\) is an input tensor and all operations in the function are element-wise.

Example:

x = [[1, 2], [3, 4]]
y = quadratic(data=x, a=1, b=2, c=3)
y = [[6, 11], [18, 27]]
The storage type of quadratic output depends on storage types of inputs
  • quadratic(csr, a, b, 0) = csr

  • quadratic(default, a, b, c) = default

Defined in /work/mxnet/src/operator/contrib/quadratic_op.cc:L50

Parameters
  • data (Symbol) – Input ndarray

  • a (float, optional, default=0) – Coefficient of the quadratic term in the quadratic function.

  • b (float, optional, default=0) – Coefficient of the linear term in the quadratic function.

  • c (float, optional, default=0) – Constant term in the quadratic function.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

quantize(data=None, min_range=None, max_range=None, out_type=_Null, name=None, attr=None, out=None, **kwargs)

Quantize a input tensor from float to out_type, with user-specified min_range and max_range.

min_range and max_range are scalar floats that specify the range for the input data.

When out_type is uint8, the output is calculated using the following equation:

out[i] = (in[i] - min_range) * range(OUTPUT_TYPE) / (max_range - min_range) + 0.5,

where range(T) = numeric_limits<T>::max() - numeric_limits<T>::min().

When out_type is int8, the output is calculate using the following equation by keep zero centered for the quantized value:

out[i] = sign(in[i]) * min(abs(in[i] * scale + 0.5f, quantized_range),

where quantized_range = MinAbs(max(int8), min(int8)) and scale = quantized_range / MaxAbs(min_range, max_range).

Note

This operator only supports forward propagation. DO NOT use it in training.

Defined in /work/mxnet/src/operator/quantization/quantize.cc:L92

Parameters
  • data (Symbol) – A ndarray/symbol of type float32

  • min_range (Symbol) – The minimum scalar value possibly produced for the input

  • max_range (Symbol) – The maximum scalar value possibly produced for the input

  • out_type ({'int8', 'uint8'},optional, default='uint8') – Output data type.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

quantize_asym(data=None, min_calib_range=_Null, max_calib_range=_Null, name=None, attr=None, out=None, **kwargs)

Quantize a input tensor from float to uint8_t. Output scale and shift are scalar floats that specify the quantization parameters for the input data. The output is calculated using the following equation:

out[i] = in[i] * scale + shift + 0.5,

where scale = uint8_range / (max_range - min_range) and shift = numeric_limits<T>::max - max_range * scale.

Note

This operator only supports forward propagation. DO NOT use it in training.

Defined in /work/mxnet/src/operator/quantization/quantize_asym.cc:L119

Parameters
  • data (Symbol) – A ndarray/symbol of type float32

  • min_calib_range (float or None, optional, default=None) – The minimum scalar value in the form of float32. If present, it will be used to quantize the fp32 data.

  • max_calib_range (float or None, optional, default=None) – The maximum scalar value in the form of float32. If present, it will be used to quantize the fp32 data.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

quantize_v2(data=None, out_type=_Null, min_calib_range=_Null, max_calib_range=_Null, name=None, attr=None, out=None, **kwargs)

Quantize a input tensor from float to out_type, with user-specified min_calib_range and max_calib_range or the input range collected at runtime.

Output min_range and max_range are scalar floats that specify the range for the input data.

When out_type is uint8, the output is calculated using the following equation:

out[i] = (in[i] - min_range) * range(OUTPUT_TYPE) / (max_range - min_range) + 0.5,

where range(T) = numeric_limits<T>::max() - numeric_limits<T>::min().

When out_type is int8, the output is calculate using the following equation by keep zero centered for the quantized value:

out[i] = sign(in[i]) * min(abs(in[i] * scale + 0.5f, quantized_range),

where quantized_range = MinAbs(max(int8), min(int8)) and scale = quantized_range / MaxAbs(min_range, max_range).

When out_type is auto, the output type is automatically determined by min_calib_range if presented. If min_calib_range < 0.0f, the output type will be int8, otherwise will be uint8. If min_calib_range isn’t presented, the output type will be int8.

Note

This operator only supports forward propagation. DO NOT use it in training.

Defined in /work/mxnet/src/operator/quantization/quantize_v2.cc:L95

Parameters
  • data (Symbol) – A ndarray/symbol of type float32

  • out_type ({'auto', 'int8', 'uint8'},optional, default='int8') – Output data type. auto can be specified to automatically determine output type according to min_calib_range.

  • min_calib_range (float or None, optional, default=None) – The minimum scalar value in the form of float32. If present, it will be used to quantize the fp32 data into int8 or uint8.

  • max_calib_range (float or None, optional, default=None) – The maximum scalar value in the form of float32. If present, it will be used to quantize the fp32 data into int8 or uint8.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

quantized_act(data=None, min_data=None, max_data=None, act_type=_Null, name=None, attr=None, out=None, **kwargs)

Activation operator for input and output data type of int8. The input and output data comes with min and max thresholds for quantizing the float32 data into int8.

Note

This operator only supports forward propogation. DO NOT use it in training. This operator only supports relu

Defined in /work/mxnet/src/operator/quantization/quantized_activation.cc:L92

Parameters
  • data (Symbol) – Input data.

  • min_data (Symbol) – Minimum value of data.

  • max_data (Symbol) – Maximum value of data.

  • act_type ({'log_sigmoid', 'mish', 'relu', 'sigmoid', 'softrelu', 'softsign', 'tanh'}, required) – Activation function to be applied.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

quantized_batch_norm(data=None, gamma=None, beta=None, moving_mean=None, moving_var=None, min_data=None, max_data=None, eps=_Null, momentum=_Null, fix_gamma=_Null, use_global_stats=_Null, output_mean_var=_Null, axis=_Null, cudnn_off=_Null, min_calib_range=_Null, max_calib_range=_Null, name=None, attr=None, out=None, **kwargs)

BatchNorm operator for input and output data type of int8. The input and output data comes with min and max thresholds for quantizing the float32 data into int8.

Note

This operator only supports forward propogation. DO NOT use it in training.

Defined in /work/mxnet/src/operator/quantization/quantized_batch_norm.cc:L96

Parameters
  • data (Symbol) – Input data.

  • gamma (Symbol) – gamma.

  • beta (Symbol) – beta.

  • moving_mean (Symbol) – moving_mean.

  • moving_var (Symbol) – moving_var.

  • min_data (Symbol) – Minimum value of data.

  • max_data (Symbol) – Maximum value of data.

  • eps (double, optional, default=0.0010000000474974513) – Epsilon to prevent div 0. Must be no less than CUDNN_BN_MIN_EPSILON defined in cudnn.h when using cudnn (usually 1e-5)

  • momentum (float, optional, default=0.899999976) – Momentum for moving average

  • fix_gamma (boolean, optional, default=1) – Fix gamma while training

  • use_global_stats (boolean, optional, default=0) – Whether use global moving statistics instead of local batch-norm. This will force change batch-norm into a scale shift operator.

  • output_mean_var (boolean, optional, default=0) – Output the mean and inverse std

  • axis (int, optional, default='1') – Specify which shape axis the channel is specified

  • cudnn_off (boolean, optional, default=0) – Do not select CUDNN operator, if available

  • min_calib_range (float or None, optional, default=None) – The minimum scalar value in the form of float32 obtained through calibration. If present, it will be used to by quantized batch norm op to calculate primitive scale.Note: this calib_range is to calib bn output.

  • max_calib_range (float or None, optional, default=None) – The maximum scalar value in the form of float32 obtained through calibration. If present, it will be used to by quantized batch norm op to calculate primitive scale.Note: this calib_range is to calib bn output.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

quantized_batch_norm_relu(data=None, gamma=None, beta=None, moving_mean=None, moving_var=None, min_data=None, max_data=None, eps=_Null, momentum=_Null, fix_gamma=_Null, use_global_stats=_Null, output_mean_var=_Null, axis=_Null, cudnn_off=_Null, min_calib_range=_Null, max_calib_range=_Null, name=None, attr=None, out=None, **kwargs)

BatchNorm with ReLU operator for input and output data type of int8. The input and output data comes with min and max thresholds for quantizing the float32 data into int8.

Note

This operator only supports forward propogation. DO NOT use it in training.

Defined in /work/mxnet/src/operator/quantization/quantized_batch_norm_relu.cc:L96

Parameters
  • data (Symbol) – Input data.

  • gamma (Symbol) – gamma.

  • beta (Symbol) – beta.

  • moving_mean (Symbol) – moving_mean.

  • moving_var (Symbol) – moving_var.

  • min_data (Symbol) – Minimum value of data.

  • max_data (Symbol) – Maximum value of data.

  • eps (double, optional, default=0.0010000000474974513) – Epsilon to prevent div 0. Must be no less than CUDNN_BN_MIN_EPSILON defined in cudnn.h when using cudnn (usually 1e-5)

  • momentum (float, optional, default=0.899999976) – Momentum for moving average

  • fix_gamma (boolean, optional, default=1) – Fix gamma while training

  • use_global_stats (boolean, optional, default=0) – Whether use global moving statistics instead of local batch-norm. This will force change batch-norm into a scale shift operator.

  • output_mean_var (boolean, optional, default=0) – Output the mean and inverse std

  • axis (int, optional, default='1') – Specify which shape axis the channel is specified

  • cudnn_off (boolean, optional, default=0) – Do not select CUDNN operator, if available

  • min_calib_range (float or None, optional, default=None) – The minimum scalar value in the form of float32 obtained through calibration. If present, it will be used to by quantized batch norm op to calculate primitive scale.Note: this calib_range is to calib bn output.

  • max_calib_range (float or None, optional, default=None) – The maximum scalar value in the form of float32 obtained through calibration. If present, it will be used to by quantized batch norm op to calculate primitive scale.Note: this calib_range is to calib bn output.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

quantized_concat(*data, **kwargs)

Joins input arrays along a given axis.

The dimensions of the input arrays should be the same except the axis along which they will be concatenated. The dimension of the output array along the concatenated axis will be equal to the sum of the corresponding dimensions of the input arrays. All inputs with different min/max will be rescaled by using largest [min, max] pairs. If any input holds int8, then the output will be int8. Otherwise output will be uint8.

Defined in /work/mxnet/src/operator/quantization/quantized_concat.cc:L113 This function support variable length of positional input.

Parameters
  • data (Symbol[]) – List of arrays to concatenate

  • dim (int or None, optional, default='1') – the dimension to be concated.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

quantized_conv(data=None, weight=None, bias=None, min_data=None, max_data=None, min_weight=None, max_weight=None, min_bias=None, max_bias=None, kernel=_Null, stride=_Null, dilate=_Null, pad=_Null, num_filter=_Null, num_group=_Null, workspace=_Null, no_bias=_Null, cudnn_tune=_Null, cudnn_off=_Null, layout=_Null, name=None, attr=None, out=None, **kwargs)

Convolution operator for input, weight and bias data type of int8, and accumulates in type int32 for the output. For each argument, two more arguments of type float32 must be provided representing the thresholds of quantizing argument from data type float32 to int8. The final outputs contain the convolution result in int32, and min and max thresholds representing the threholds for quantizing the float32 output into int32.

Note

This operator only supports forward propogation. DO NOT use it in training.

Defined in /work/mxnet/src/operator/quantization/quantized_conv.cc:L185

Parameters
  • data (Symbol) – Input data.

  • weight (Symbol) – weight.

  • bias (Symbol) – bias.

  • min_data (Symbol) – Minimum value of data.

  • max_data (Symbol) – Maximum value of data.

  • min_weight (Symbol) – Minimum value of weight.

  • max_weight (Symbol) – Maximum value of weight.

  • min_bias (Symbol) – Minimum value of bias.

  • max_bias (Symbol) – Maximum value of bias.

  • kernel (Shape(tuple), required) – Convolution kernel size: (w,), (h, w) or (d, h, w)

  • stride (Shape(tuple), optional, default=[]) – Convolution stride: (w,), (h, w) or (d, h, w). Defaults to 1 for each dimension.

  • dilate (Shape(tuple), optional, default=[]) – Convolution dilate: (w,), (h, w) or (d, h, w). Defaults to 1 for each dimension.

  • pad (Shape(tuple), optional, default=[]) – Zero pad for convolution: (w,), (h, w) or (d, h, w). Defaults to no padding.

  • num_filter (int (non-negative), required) – Convolution filter(channel) number

  • num_group (int (non-negative), optional, default=1) – Number of group partitions.

  • workspace (long (non-negative), optional, default=1024) – Maximum temporary workspace allowed (MB) in convolution.This parameter has two usages. When CUDNN is not used, it determines the effective batch size of the convolution kernel. When CUDNN is used, it controls the maximum temporary storage used for tuning the best CUDNN kernel when limited_workspace strategy is used.

  • no_bias (boolean, optional, default=0) – Whether to disable bias parameter.

  • cudnn_tune ({None, 'fastest', 'limited_workspace', 'off'},optional, default='None') – Whether to pick convolution algo by running performance test.

  • cudnn_off (boolean, optional, default=0) – Turn off cudnn for this layer.

  • layout ({None, 'NCDHW', 'NCHW', 'NCW', 'NDHWC', 'NHWC', 'NWC'},optional, default='None') – Set layout for input, output and weight. Empty for default layout: NCW for 1d, NCHW for 2d and NCDHW for 3d.NHWC and NDHWC are only supported on GPU.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

quantized_elemwise_add(lhs=None, rhs=None, lhs_min=None, lhs_max=None, rhs_min=None, rhs_max=None, name=None, attr=None, out=None, **kwargs)

elemwise_add operator for input dataA and input dataB data type of int8, and accumulates in type int32 for the output. For each argument, two more arguments of type float32 must be provided representing the thresholds of quantizing argument from data type float32 to int8. The final outputs contain result in int32, and min and max thresholds representing the threholds for quantizing the float32 output into int32.

Note

This operator only supports forward propogation. DO NOT use it in training.

Parameters
  • lhs (Symbol) – first input

  • rhs (Symbol) – second input

  • lhs_min (Symbol) – 3rd input

  • lhs_max (Symbol) – 4th input

  • rhs_min (Symbol) – 5th input

  • rhs_max (Symbol) – 6th input

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

quantized_elemwise_mul(lhs=None, rhs=None, lhs_min=None, lhs_max=None, rhs_min=None, rhs_max=None, min_calib_range=_Null, max_calib_range=_Null, enable_float_output=_Null, name=None, attr=None, out=None, **kwargs)

Multiplies arguments int8 element-wise.

Defined in /work/mxnet/src/operator/quantization/quantized_elemwise_mul.cc:L214

Parameters
  • lhs (Symbol) – first input

  • rhs (Symbol) – second input

  • lhs_min (Symbol) – Minimum value of first input.

  • lhs_max (Symbol) – Maximum value of first input.

  • rhs_min (Symbol) – Minimum value of second input.

  • rhs_max (Symbol) – Maximum value of second input.

  • min_calib_range (float or None, optional, default=None) – The minimum scalar value in the form of float32 obtained through calibration. If present, it will be used to requantize the int8 output data.

  • max_calib_range (float or None, optional, default=None) – The maximum scalar value in the form of float32 obtained through calibration. If present, it will be used to requantize the int8 output data.

  • enable_float_output (boolean, optional, default=0) – Whether to enable float32 output

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

quantized_embedding(data=None, weight=None, min_weight=None, max_weight=None, input_dim=_Null, output_dim=_Null, dtype=_Null, sparse_grad=_Null, name=None, attr=None, out=None, **kwargs)

Maps integer indices to int8 vector representations (embeddings).

Defined in /work/mxnet/src/operator/quantization/quantized_indexing_op.cc:L134

Parameters
  • data (Symbol) – The input array to the embedding operator.

  • weight (Symbol) – The embedding weight matrix.

  • min_weight (Symbol) – Minimum value of data.

  • max_weight (Symbol) – Maximum value of data.

  • input_dim (long, required) – Vocabulary size of the input indices.

  • output_dim (long, required) – Dimension of the embedding vectors.

  • dtype ({'bfloat16', 'float16', 'float32', 'float64', 'int32', 'int64', 'int8', 'uint8'},optional, default='float32') – Data type of weight.

  • sparse_grad (boolean, optional, default=0) – Compute row sparse gradient in the backward calculation. If set to True, the grad’s storage type is row_sparse.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

quantized_flatten(data=None, min_data=None, max_data=None, name=None, attr=None, out=None, **kwargs)
Parameters
  • data (Symbol) – A ndarray/symbol of type float32

  • min_data (Symbol) – The minimum scalar value possibly produced for the data

  • max_data (Symbol) – The maximum scalar value possibly produced for the data

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

quantized_fully_connected(data=None, weight=None, bias=None, min_data=None, max_data=None, min_weight=None, max_weight=None, min_bias=None, max_bias=None, num_hidden=_Null, no_bias=_Null, flatten=_Null, name=None, attr=None, out=None, **kwargs)

Fully Connected operator for input, weight and bias data type of int8, and accumulates in type int32 for the output. For each argument, two more arguments of type float32 must be provided representing the thresholds of quantizing argument from data type float32 to int8. The final outputs contain the convolution result in int32, and min and max thresholds representing the threholds for quantizing the float32 output into int32.

Note

This operator only supports forward propogation. DO NOT use it in training.

Defined in /work/mxnet/src/operator/quantization/quantized_fully_connected.cc:L326

Parameters
  • data (Symbol) – Input data.

  • weight (Symbol) – weight.

  • bias (Symbol) – bias.

  • min_data (Symbol) – Minimum value of data.

  • max_data (Symbol) – Maximum value of data.

  • min_weight (Symbol) – Minimum value of weight.

  • max_weight (Symbol) – Maximum value of weight.

  • min_bias (Symbol) – Minimum value of bias.

  • max_bias (Symbol) – Maximum value of bias.

  • num_hidden (int, required) – Number of hidden nodes of the output.

  • no_bias (boolean, optional, default=0) – Whether to disable bias parameter.

  • flatten (boolean, optional, default=1) – Whether to collapse all but the first axis of the input data tensor.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

quantized_npi_add(lhs=None, rhs=None, lhs_min=None, lhs_max=None, rhs_min=None, rhs_max=None, name=None, attr=None, out=None, **kwargs)

elemwise_add operator for input dataA and input dataB data type of int8, and accumulates in type int32 for the output. For each argument, two more arguments of type float32 must be provided representing the thresholds of quantizing argument from data type float32 to int8. The final outputs contain result in int32, and min and max thresholds representing the threholds for quantizing the float32 output into int32.

Note

This operator only supports forward propogation. DO NOT use it in training.

Parameters
  • lhs (Symbol) – first input

  • rhs (Symbol) – second input

  • lhs_min (Symbol) – 3rd input

  • lhs_max (Symbol) – 4th input

  • rhs_min (Symbol) – 5th input

  • rhs_max (Symbol) – 6th input

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

quantized_pooling(data=None, min_data=None, max_data=None, kernel=_Null, pool_type=_Null, global_pool=_Null, cudnn_off=_Null, pooling_convention=_Null, stride=_Null, pad=_Null, p_value=_Null, count_include_pad=_Null, layout=_Null, output_size=_Null, name=None, attr=None, out=None, **kwargs)

Pooling operator for input and output data type of int8. The input and output data comes with min and max thresholds for quantizing the float32 data into int8.

Note

This operator only supports forward propogation. DO NOT use it in training. This operator only supports pool_type of avg or max.

Defined in /work/mxnet/src/operator/quantization/quantized_pooling.cc:L184

Parameters
  • data (Symbol) – Input data.

  • min_data (Symbol) – Minimum value of data.

  • max_data (Symbol) – Maximum value of data.

  • kernel (Shape(tuple), optional, default=[]) – Pooling kernel size: (y, x) or (d, y, x)

  • pool_type ({'avg', 'lp', 'max', 'sum'},optional, default='max') – Pooling type to be applied.

  • global_pool (boolean, optional, default=0) – Ignore kernel size, do global pooling based on current input feature map.

  • cudnn_off (boolean, optional, default=0) – Turn off cudnn pooling and use MXNet pooling operator.

  • pooling_convention ({'full', 'same', 'valid'},optional, default='valid') – Pooling convention to be applied.

  • stride (Shape(tuple), optional, default=[]) – Stride: for pooling (y, x) or (d, y, x). Defaults to 1 for each dimension.

  • pad (Shape(tuple), optional, default=[]) – Pad for pooling: (y, x) or (d, y, x). Defaults to no padding.

  • p_value (int or None, optional, default='None') – Value of p for Lp pooling, can be 1 or 2, required for Lp Pooling.

  • count_include_pad (boolean or None, optional, default=None) – Only used for AvgPool, specify whether to count padding elements for averagecalculation. For example, with a 5*5 kernel on a 3*3 corner of a image,the sum of the 9 valid elements will be divided by 25 if this is set to true,or it will be divided by 9 if this is set to false. Defaults to true.

  • layout ({None, 'NCDHW', 'NCHW', 'NCW', 'NDHWC', 'NHWC', 'NWC'},optional, default='None') – Set layout for input and output. Empty for default layout: NCW for 1d, NCHW for 2d and NCDHW for 3d.

  • output_size (Shape or None, optional, default=None) – Only used for Adaptive Pooling. int (output size) or a tuple of int for output (height, width).

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

quantized_reshape(data=None, min_data=None, max_data=None, shape=_Null, reverse=_Null, target_shape=_Null, keep_highest=_Null, name=None, attr=None, out=None, **kwargs)
Parameters
  • data (Symbol) – Array to be reshaped.

  • min_data (Symbol) – The minimum scalar value possibly produced for the data

  • max_data (Symbol) – The maximum scalar value possibly produced for the data

  • shape (Shape(tuple), optional, default=[]) – The target shape

  • reverse (boolean, optional, default=0) – If true then the special values are inferred from right to left

  • target_shape (Shape(tuple), optional, default=[]) – (Deprecated! Use shape instead.) Target new shape. One and only one dim can be 0, in which case it will be inferred from the rest of dims

  • keep_highest (boolean, optional, default=0) – (Deprecated! Use shape instead.) Whether keep the highest dim unchanged.If set to true, then the first dim in target_shape is ignored,and always fixed as input

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

quantized_rnn(data=None, parameters=None, state=None, state_cell=None, data_scale=None, data_shift=None, state_size=_Null, num_layers=_Null, bidirectional=_Null, mode=_Null, p=_Null, state_outputs=_Null, projection_size=_Null, lstm_state_clip_min=_Null, lstm_state_clip_max=_Null, lstm_state_clip_nan=_Null, use_sequence_length=_Null, name=None, attr=None, out=None, **kwargs)

RNN operator for input data type of uint8. The weight of each gates is converted to int8, while bias is accumulated in type float32. The hidden state and cell state are in type float32. For the input data, two more arguments of type float32 must be provided representing the thresholds of quantizing argument from data type float32 to uint8. The final outputs contain the recurrent result in float32. It only supports quantization for Vanilla LSTM network.

Note

This operator only supports forward propagation. DO NOT use it in training.

Defined in /work/mxnet/src/operator/quantization/quantized_rnn.cc:L301

Parameters
  • data (Symbol) – Input data.

  • parameters (Symbol) – weight.

  • state (Symbol) – initial hidden state of the RNN

  • state_cell (Symbol) – initial cell state for LSTM networks (only for LSTM)

  • data_scale (Symbol) – quantization scale of data.

  • data_shift (Symbol) – quantization shift of data.

  • state_size (int (non-negative), required) – size of the state for each layer

  • num_layers (int (non-negative), required) – number of stacked layers

  • bidirectional (boolean, optional, default=0) – whether to use bidirectional recurrent layers

  • mode ({'gru', 'lstm', 'rnn_relu', 'rnn_tanh'}, required) – the type of RNN to compute

  • p (float, optional, default=0) – drop rate of the dropout on the outputs of each RNN layer, except the last layer.

  • state_outputs (boolean, optional, default=0) – Whether to have the states as symbol outputs.

  • projection_size (int or None, optional, default='None') – size of project size

  • lstm_state_clip_min (double or None, optional, default=None) – Minimum clip value of LSTM states. This option must be used together with lstm_state_clip_max.

  • lstm_state_clip_max (double or None, optional, default=None) – Maximum clip value of LSTM states. This option must be used together with lstm_state_clip_min.

  • lstm_state_clip_nan (boolean, optional, default=0) – Whether to stop NaN from propagating in state by clipping it to min/max. If clipping range is not specified, this option is ignored.

  • use_sequence_length (boolean, optional, default=0) – If set to true, this layer takes in an extra input parameter sequence_length to specify variable length sequence

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

quantized_transpose(data=None, min_data=None, max_data=None, axes=_Null, name=None, attr=None, out=None, **kwargs)
Parameters
  • data (Symbol) – Array to be transposed.

  • min_data (Symbol) – The minimum scalar value possibly produced for the data

  • max_data (Symbol) – The maximum scalar value possibly produced for the data

  • axes (Shape(tuple), optional, default=[]) – Target axis order. By default the axes will be inverted.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

requantize(data=None, min_range=None, max_range=None, out_type=_Null, min_calib_range=_Null, max_calib_range=_Null, name=None, attr=None, out=None, **kwargs)

Given data that is quantized in int32 and the corresponding thresholds, requantize the data into int8 using min and max thresholds either calculated at runtime or from calibration. It’s highly recommended to pre-calucate the min and max thresholds through calibration since it is able to save the runtime of the operator and improve the inference accuracy.

Note

This operator only supports forward propogation. DO NOT use it in training.

Defined in /work/mxnet/src/operator/quantization/requantize.cc:L80

Parameters
  • data (Symbol) – A ndarray/symbol of type int32

  • min_range (Symbol) – The original minimum scalar value in the form of float32 used for quantizing data into int32.

  • max_range (Symbol) – The original maximum scalar value in the form of float32 used for quantizing data into int32.

  • out_type ({'auto', 'int8', 'uint8'},optional, default='int8') – Output data type. auto can be specified to automatically determine output type according to min_calib_range.

  • min_calib_range (float or None, optional, default=None) – The minimum scalar value in the form of float32 obtained through calibration. If present, it will be used to requantize the int32 data into int8.

  • max_calib_range (float or None, optional, default=None) – The maximum scalar value in the form of float32 obtained through calibration. If present, it will be used to requantize the int32 data into int8.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

round_ste(data=None, name=None, attr=None, out=None, **kwargs)

Straight-through-estimator of round().

In forward pass, returns element-wise rounded value to the nearest integer of the input (same as round()).

In backward pass, returns gradients of 1 everywhere (instead of 0 everywhere as in round()): \(\frac{d}{dx}{round\_ste(x)} = 1\) vs. \(\frac{d}{dx}{round(x)} = 0\). This is useful for quantized training.

Reference: Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation.

Example::

x = round_ste([-1.5, 1.5, -1.9, 1.9, 2.7]) x.backward() x = [-2., 2., -2., 2., 3.] x.grad() = [1., 1., 1., 1., 1.]

The storage type of round_ste output depends upon the input storage type:
  • round_ste(default) = default

  • round_ste(row_sparse) = row_sparse

  • round_ste(csr) = csr

Defined in /work/mxnet/src/operator/contrib/stes_op.cc:L54

Parameters
  • data (Symbol) – The input array.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

sign_ste(data=None, name=None, attr=None, out=None, **kwargs)

Straight-through-estimator of sign().

In forward pass, returns element-wise sign of the input (same as sign()).

In backward pass, returns gradients of 1 everywhere (instead of 0 everywhere as in sign()): \(\frac{d}{dx}{sign\_ste(x)} = 1\) vs. \(\frac{d}{dx}{sign(x)} = 0\). This is useful for quantized training.

Reference: Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation.

Example::

x = sign_ste([-2, 0, 3]) x.backward() x = [-1., 0., 1.] x.grad() = [1., 1., 1.]

The storage type of sign_ste output depends upon the input storage type:
  • round_ste(default) = default

  • round_ste(row_sparse) = row_sparse

  • round_ste(csr) = csr

Defined in /work/mxnet/src/operator/contrib/stes_op.cc:L80

Parameters
  • data (Symbol) – The input array.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

sldwin_atten_context(score=None, value=None, dilation=None, w=_Null, symmetric=_Null, name=None, attr=None, out=None, **kwargs)

Compute the context vector for sliding window attention, used in Longformer (https://arxiv.org/pdf/2004.05150.pdf).

In this attention pattern, given a fixed window size 2w, each token attends to w tokens on the left side if we use causal attention (setting symmetric to False), otherwise each token attends to w tokens on each side.

The shapes of the inputs are: - score :

  • (batch_size, seq_length, num_heads, w + w + 1) if symmetric is True,

  • (batch_size, seq_length, num_heads, w + 1) otherwise

  • value : (batch_size, seq_length, num_heads, num_head_units)

  • dilation : (num_heads,)

The shape of the output is: - context_vec : (batch_size, seq_length, num_heads, num_head_units)

Defined in /work/mxnet/src/operator/contrib/transformer.cc:L1045

Parameters
  • score (Symbol) – score

  • value (Symbol) – value

  • dilation (Symbol) – dilation

  • w (int, required) – The one-sided window length

  • symmetric (boolean, required) – If false, each token will only attend to itself and the previous tokens.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

sldwin_atten_mask_like(score=None, dilation=None, valid_length=None, w=_Null, symmetric=_Null, name=None, attr=None, out=None, **kwargs)

Compute the mask for the sliding window attention score, used in Longformer (https://arxiv.org/pdf/2004.05150.pdf).

In this attention pattern, given a fixed window size 2w, each token attends to w tokens on the left side if we use causal attention (setting symmetric to False), otherwise each token attends to w tokens on each side.

The shapes of the inputs are: - score :

  • (batch_size, seq_length, num_heads, w + w + 1) if symmetric is True,

  • (batch_size, seq_length, num_heads, w + 1) otherwise.

  • dilation : (num_heads,)

  • valid_length : (batch_size,)

The shape of the output is: - mask : same as the shape of score

Defined in /work/mxnet/src/operator/contrib/transformer.cc:L909

Parameters
  • score (Symbol) – sliding window attention score

  • dilation (Symbol) – dilation

  • valid_length (Symbol) – valid length

  • w (int, required) – The one-sided window length

  • symmetric (boolean, required) – If false, each token will only attend to itself and the previous tokens.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol

sldwin_atten_score(query=None, key=None, dilation=None, w=_Null, symmetric=_Null, name=None, attr=None, out=None, **kwargs)

Compute the sliding window attention score, which is used in Longformer (https://arxiv.org/pdf/2004.05150.pdf). In this attention pattern, given a fixed window size 2w, each token attends to w tokens on the left side if we use causal attention (setting symmetric to False), otherwise each token attends to w tokens on each side.

The shapes of the inputs are: - query : (batch_size, seq_length, num_heads, num_head_units) - key : (batch_size, seq_length, num_heads, num_head_units) - dilation : (num_heads,)

The shape of the output is: - score :

  • (batch_size, seq_length, num_heads, w + w + 1) if symmetric is True,

  • (batch_size, seq_length, num_heads, w + 1) otherwise.

Defined in /work/mxnet/src/operator/contrib/transformer.cc:L969

Parameters
  • query (Symbol) – query

  • key (Symbol) – key

  • dilation (Symbol) – dilation

  • w (int, required) – The one-sided window length

  • symmetric (boolean, required) – If false, each token will only attend to itself and the previous tokens.

  • name (string, optional.) – Name of the resulting symbol.

Returns

The result symbol.

Return type

Symbol