gluon.nn¶
Gluon provides a large number of build-in neural network layers in the following two modules:
Neural network layers. |
We group all layers in these two modules according to their categories.
Sequential Containers¶
Stacks Blocks sequentially. |
|
Stacks HybridBlocks sequentially. |
Concatenation Containers¶
Lays Block s concurrently. |
|
Lays HybridBlock s concurrently. |
Basic Layers¶
Just your regular densely-connected NN layer. |
|
Applies an activation function to input. |
|
Applies Dropout to the input. |
|
Flattens the input to two dimensional. |
|
Wraps an operator or an expression as a Block object. |
|
Wraps an operator or an expression as a HybridBlock object. |
|
Block that passes through the input directly. |
Convolutional Layers¶
1D convolution layer (e.g. |
|
2D convolution layer (e.g. |
|
3D convolution layer (e.g. |
|
Transposed 1D convolution layer (sometimes called Deconvolution). |
|
Transposed 2D convolution layer (sometimes called Deconvolution). |
|
Transposed 3D convolution layer (sometimes called Deconvolution). |
|
2-D Deformable Convolution v_1 (Dai, 2017). |
|
2-D Deformable Convolution v2 (Dai, 2018). |
Pixel Shuffle Layers¶
Pixel-shuffle layer for upsampling in 1 dimension. |
|
Pixel-shuffle layer for upsampling in 2 dimensions. |
|
Pixel-shuffle layer for upsampling in 3 dimensions. |
Pooling Layers¶
Max pooling operation for one dimensional data. |
|
Max pooling operation for two dimensional (spatial) data. |
|
Max pooling operation for 3D data (spatial or spatio-temporal). |
|
Average pooling operation for temporal data. |
|
Average pooling operation for spatial data. |
|
Average pooling operation for 3D data (spatial or spatio-temporal). |
|
Gloabl max pooling operation for one dimensional (temporal) data. |
|
Global max pooling operation for two dimensional (spatial) data. |
|
Global max pooling operation for 3D data (spatial or spatio-temporal). |
|
Global average pooling operation for temporal data. |
|
Global average pooling operation for spatial data. |
|
Global average pooling operation for 3D data (spatial or spatio-temporal). |
|
Pads the input tensor using the reflection of the input boundary. |
Normalization Layers¶
Batch normalization layer (Ioffe and Szegedy, 2014). |
|
Applies instance normalization to the n-dimensional input array. |
|
Applies layer normalization to the n-dimensional input array. |
|
Cross-GPU Synchronized Batch normalization (SyncBN) |
Embedding Layers¶
Turns non-negative integers (indexes/tokens) into dense vectors of fixed size. |
Advanced Activation Layers¶
Leaky version of a Rectified Linear Unit. |
|
Parametric leaky version of a Rectified Linear Unit. |
|
Exponential Linear Unit (ELU) |
|
Scaled Exponential Linear Unit (SELU) |
|
Swish Activation function (SiLU with a hyperparameter) |
|
Sigmoid Linear Units |
|
Gaussian Exponential Linear Unit (GELU) |
API Reference¶
Neural network layers.
Classes
|
Applies an activation function to input. |
|
Average pooling operation for temporal data. |
|
Average pooling operation for spatial data. |
|
Average pooling operation for 3D data (spatial or spatio-temporal). |
|
Batch normalization layer (Ioffe and Szegedy, 2014). |
|
Base class for all neural network layers and models. |
|
Lays Block s concurrently. |
|
1D convolution layer (e.g. temporal convolution). |
|
Transposed 1D convolution layer (sometimes called Deconvolution). |
|
2D convolution layer (e.g. spatial convolution over images). |
|
Transposed 2D convolution layer (sometimes called Deconvolution). |
|
3D convolution layer (e.g. spatial convolution over volumes). |
|
Transposed 3D convolution layer (sometimes called Deconvolution). |
|
2-D Deformable Convolution v_1 (Dai, 2017). |
|
Just your regular densely-connected NN layer. |
|
Applies Dropout to the input. |
|
Exponential Linear Unit (ELU) |
|
Turns non-negative integers (indexes/tokens) into dense vectors of fixed size. |
|
Flattens the input to two dimensional. |
|
Gaussian Exponential Linear Unit (GELU) |
|
Global average pooling operation for temporal data. |
|
Global average pooling operation for spatial data. |
|
Global average pooling operation for 3D data (spatial or spatio-temporal). |
|
Gloabl max pooling operation for one dimensional (temporal) data. |
|
Global max pooling operation for two dimensional (spatial) data. |
|
Global max pooling operation for 3D data (spatial or spatio-temporal). |
|
Applies group normalization to the n-dimensional input array. |
HybridBlock supports forwarding with both Symbol and NDArray. |
|
|
Lays HybridBlock s concurrently. |
|
Wraps an operator or an expression as a HybridBlock object. |
Stacks HybridBlocks sequentially. |
|
|
Block that passes through the input directly. |
|
Applies instance normalization to the n-dimensional input array. |
|
Wraps an operator or an expression as a Block object. |
|
Applies layer normalization to the n-dimensional input array. |
|
Leaky version of a Rectified Linear Unit. |
|
Max pooling operation for one dimensional data. |
|
Max pooling operation for two dimensional (spatial) data. |
|
Max pooling operation for 3D data (spatial or spatio-temporal). |
|
2-D Deformable Convolution v2 (Dai, 2018). |
|
Parametric leaky version of a Rectified Linear Unit. |
|
Pixel-shuffle layer for upsampling in 1 dimension. |
|
Pixel-shuffle layer for upsampling in 2 dimensions. |
|
Pixel-shuffle layer for upsampling in 3 dimensions. |
|
Pads the input tensor using the reflection of the input boundary. |
|
Scaled Exponential Linear Unit (SELU) |
Stacks Blocks sequentially. |
|
|
Sigmoid Linear Units |
|
Swish Activation function (SiLU with a hyperparameter) |
|
Construct block from symbol. |
|
Cross-GPU Synchronized Batch normalization (SyncBN) |
-
class
Activation
(activation, **kwargs)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Applies an activation function to input.
- Parameters
activation (str) – Name of activation function to use. See
Activation()
for available choices.
Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(x)Overrides the forward computation.
hybridize
([active, partition_if_dynamic, …])Activates or deactivates
HybridBlock
s recursively.infer_shape
(*args)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install op hook for block recursively.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
summary
(*inputs)Print the summary of the model’s output and parameters.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).- Inputs:
data: input tensor with arbitrary shape.
- Outputs:
out: output tensor with the same shape as data.
-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
hybridize
(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)¶ Activates or deactivates
HybridBlock
s recursively. Has no effect on non-hybrid children.- Parameters
active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
-
infer_shape
(*args)¶ Infers shape of Parameters from inputs.
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install op hook for block recursively.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
HybridBlock.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
class
AvgPool1D
(pool_size=2, strides=None, padding=0, layout='NCW', ceil_mode=False, count_include_pad=True, **kwargs)[source]¶ Bases:
mxnet.gluon.nn.conv_layers._Pooling
Average pooling operation for temporal data.
- Parameters
pool_size (int) – Size of the average pooling windows.
strides (int, or None) – Factor by which to downscale. E.g. 2 will halve the input size. If None, it will default to pool_size.
padding (int) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points.
layout (str, default 'NCW') – Dimension ordering of data and out (‘NCW’ or ‘NWC’). ‘N’, ‘C’, ‘W’ stands for batch, channel, and width (time) dimensions respectively. padding is applied on ‘W’ dimension.
ceil_mode (bool, default False) – When True, will use ceil instead of floor to compute the output shape.
count_include_pad (bool, default True) – When ‘False’, will exclude padding elements when computing the average value.
- Inputs:
data: 3D input tensor with shape (batch_size, in_channels, width) when layout is NCW. For other layouts shape is permuted accordingly.
- Outputs:
out: 3D output tensor with shape (batch_size, channels, out_width) when layout is NCW. out_width is calculated as:
out_width = floor((width+2*padding-pool_size)/strides)+1
When ceil_mode is True, ceil will be used instead of floor in this equation.
-
class
AvgPool2D
(pool_size=(2, 2), strides=None, padding=0, ceil_mode=False, layout='NCHW', count_include_pad=True, **kwargs)[source]¶ Bases:
mxnet.gluon.nn.conv_layers._Pooling
Average pooling operation for spatial data.
- Parameters
pool_size (int or list/tuple of 2 ints,) – Size of the average pooling windows.
strides (int, list/tuple of 2 ints, or None.) – Factor by which to downscale. E.g. 2 will halve the input size. If None, it will default to pool_size.
padding (int or list/tuple of 2 ints,) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points.
layout (str, default 'NCHW') – Dimension ordering of data and out (‘NCHW’ or ‘NHWC’). ‘N’, ‘C’, ‘H’, ‘W’ stands for batch, channel, height, and width dimensions respectively. padding is applied on ‘H’ and ‘W’ dimension.
ceil_mode (bool, default False) – When True, will use ceil instead of floor to compute the output shape.
count_include_pad (bool, default True) – When ‘False’, will exclude padding elements when computing the average value.
- Inputs:
data: 4D input tensor with shape (batch_size, in_channels, height, width) when layout is NCHW. For other layouts shape is permuted accordingly.
- Outputs:
out: 4D output tensor with shape (batch_size, channels, out_height, out_width) when layout is NCHW. out_height and out_width are calculated as:
out_height = floor((height+2*padding[0]-pool_size[0])/strides[0])+1 out_width = floor((width+2*padding[1]-pool_size[1])/strides[1])+1
When ceil_mode is True, ceil will be used instead of floor in this equation.
-
class
AvgPool3D
(pool_size=(2, 2, 2), strides=None, padding=0, ceil_mode=False, layout='NCDHW', count_include_pad=True, **kwargs)[source]¶ Bases:
mxnet.gluon.nn.conv_layers._Pooling
Average pooling operation for 3D data (spatial or spatio-temporal).
- Parameters
pool_size (int or list/tuple of 3 ints,) – Size of the average pooling windows.
strides (int, list/tuple of 3 ints, or None.) – Factor by which to downscale. E.g. 2 will halve the input size. If None, it will default to pool_size.
padding (int or list/tuple of 3 ints,) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points.
layout (str, default 'NCDHW') – Dimension ordering of data and out (‘NCDHW’ or ‘NDHWC’). ‘N’, ‘C’, ‘H’, ‘W’, ‘D’ stands for batch, channel, height, width and depth dimensions respectively. padding is applied on ‘D’, ‘H’ and ‘W’ dimension.
ceil_mode (bool, default False) – When True, will use ceil instead of floor to compute the output shape.
count_include_pad (bool, default True) – When ‘False’, will exclude padding elements when computing the average value.
- Inputs:
data: 5D input tensor with shape (batch_size, in_channels, depth, height, width) when layout is NCDHW. For other layouts shape is permuted accordingly.
- Outputs:
out: 5D output tensor with shape (batch_size, channels, out_depth, out_height, out_width) when layout is NCDHW. out_depth, out_height and out_width are calculated as:
out_depth = floor((depth+2*padding[0]-pool_size[0])/strides[0])+1 out_height = floor((height+2*padding[1]-pool_size[1])/strides[1])+1 out_width = floor((width+2*padding[2]-pool_size[2])/strides[2])+1
When ceil_mode is True, ceil will be used instead of floor in this equation.
-
class
BatchNorm
(axis=1, momentum=0.9, epsilon=1e-05, center=True, scale=True, use_global_stats=False, beta_initializer='zeros', gamma_initializer='ones', running_mean_initializer='zeros', running_variance_initializer='ones', in_channels=0, **kwargs)[source]¶ Bases:
mxnet.gluon.nn.basic_layers._BatchNorm
Batch normalization layer (Ioffe and Szegedy, 2014). Normalizes the input at each batch, i.e. applies a transformation that maintains the mean activation close to 0 and the activation standard deviation close to 1.
- Parameters
axis (int, default 1) – The axis that should be normalized. This is typically the channels (C) axis. For instance, after a Conv2D layer with layout=’NCHW’, set axis=1 in BatchNorm. If layout=’NHWC’, then set axis=3.
momentum (float, default 0.9) – Momentum for the moving average.
epsilon (float, default 1e-5) – Small float added to variance to avoid dividing by zero.
center (bool, default True) – If True, add offset of beta to normalized tensor. If False, beta is ignored.
scale (bool, default True) – If True, multiply by gamma. If False, gamma is not used. When the next layer is linear (also e.g. nn.relu), this can be disabled since the scaling will be done by the next layer.
use_global_stats (bool, default False) – If True, use global moving statistics instead of local batch-norm. This will force change batch-norm into a scale shift operator. If False, use local batch-norm.
beta_initializer (str or Initializer, default ‘zeros’) – Initializer for the beta weight.
gamma_initializer (str or Initializer, default ‘ones’) – Initializer for the gamma weight.
running_mean_initializer (str or Initializer, default ‘zeros’) – Initializer for the running mean.
running_variance_initializer (str or Initializer, default ‘ones’) – Initializer for the running variance.
in_channels (int, default 0) – Number of channels (feature maps) in input data. If not specified, initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
- Inputs:
data: input tensor with arbitrary shape.
- Outputs:
out: output tensor with the same shape as data.
-
class
Block
[source]¶ Bases:
object
Base class for all neural network layers and models. Your models should subclass this class.
Block
can be nested recursively in a tree structure. You can create and assign childBlock
as regular attributes:import mxnet as mx from mxnet.gluon import Block, nn class Model(Block): def __init__(self, **kwargs): super(Model, self).__init__(**kwargs) self.dense0 = nn.Dense(20) self.dense1 = nn.Dense(20) def forward(self, x): x = mx.npx.relu(self.dense0(x)) return mx.npx.relu(self.dense1(x)) model = Model() model.initialize(device=mx.cpu(0)) model(mx.np.zeros((10, 10), device=mx.cpu(0)))
Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.cast
(dtype)Cast this Block to use another data type.
collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.forward
(*args)Overrides to implement forward computation using
NDArray
.hybridize
([active])Please refer description of HybridBlock hybridize().
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
register_child
(block[, name])Registers block as a child of self.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install callback monitor.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
summary
(*inputs)Print the summary of the model’s output and parameters.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).Child
Block
assigned this way will be registered andcollect_params()
will collect their Parameters recursively. You can also manually register child blocks withregister_child()
.-
apply
(fn)[source]¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
cast
(dtype)[source]¶ Cast this Block to use another data type.
- Parameters
dtype (str or numpy.dtype) – The new data type.
-
collect_params
(select=None)[source]¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
forward
(*args)[source]¶ Overrides to implement forward computation using
NDArray
. Only accepts positional arguments.- Parameters
*args (list of NDArray) – Input tensors.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)[source]¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)[source]¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')[source]¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')[source]¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_child
(block, name=None)[source]¶ Registers block as a child of self.
Block
s assigned to self as attributes will be registered automatically.
-
register_forward_hook
(hook)[source]¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)[source]¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)[source]¶ Install callback monitor.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset_device
(device)[source]¶ Re-assign all Parameters to other devices.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)[source]¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)[source]¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)[source]¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)[source]¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
-
class
Concatenate
(axis=-1)[source]¶ Bases:
mxnet.gluon.nn.basic_layers.Sequential
Lays Block s concurrently.
This block feeds its input to all children blocks, and produce the output by concatenating all the children blocks’ outputs on the specified axis.
Example:
net = Concatenate() net.add(nn.Dense(10, activation='relu')) net.add(nn.Dense(20)) net.add(Identity())
Methods
add
(*blocks)Adds block on top of the stack.
apply
(fn)Applies
fn
recursively to every child block as well as self.cast
(dtype)Cast this Block to use another data type.
collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.forward
(x)Overrides to implement forward computation using
NDArray
.hybridize
([active])Activates or deactivates HybridBlock s recursively.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
register_child
(block[, name])Registers block as a child of self.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install callback monitor.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
summary
(*inputs)Print the summary of the model’s output and parameters.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).- Parameters
axis (int, default -1) – The axis on which to concatenate the outputs.
-
add
(*blocks)¶ Adds block on top of the stack.
-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
cast
(dtype)¶ Cast this Block to use another data type.
- Parameters
dtype (str or numpy.dtype) – The new data type.
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
forward
(x)[source]¶ Overrides to implement forward computation using
NDArray
. Only accepts positional arguments.- Parameters
*args (list of NDArray) – Input tensors.
-
hybridize
(active=True, **kwargs)¶ Activates or deactivates HybridBlock s recursively. Has no effect on non-hybrid children.
- Parameters
active (bool, default True) – Whether to turn hybrid on or off.
**kwargs (string) – Additional flags for hybridized operator.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_child
(block, name=None)¶ Registers block as a child of self.
Block
s assigned to self as attributes will be registered automatically.
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install callback monitor.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
Block.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
class
Conv1D
(channels, kernel_size, strides=1, padding=0, dilation=1, groups=1, layout='NCW', activation=None, use_bias=True, weight_initializer=None, bias_initializer='zeros', in_channels=0, **kwargs)[source]¶ Bases:
mxnet.gluon.nn.conv_layers._Conv
1D convolution layer (e.g. temporal convolution).
This layer creates a convolution kernel that is convolved with the layer input over a single spatial (or temporal) dimension to produce a tensor of outputs. If use_bias is True, a bias vector is created and added to the outputs. Finally, if activation is not None, it is applied to the outputs as well.
If in_channels is not specified, Parameter initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
- Parameters
channels (int) – The dimensionality of the output space, i.e. the number of output channels (filters) in the convolution.
kernel_size (int or tuple/list of 1 int) – Specifies the dimensions of the convolution window.
strides (int or tuple/list of 1 int,) – Specify the strides of the convolution.
padding (int or a tuple/list of 1 int,) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points
dilation (int or tuple/list of 1 int) – Specifies the dilation rate to use for dilated convolution.
groups (int) – Controls the connections between inputs and outputs. At groups=1, all inputs are convolved to all outputs. At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
layout (str, default 'NCW') – Dimension ordering of data and weight. Only supports ‘NCW’ layout for now. ‘N’, ‘C’, ‘W’ stands for batch, channel, and width (time) dimensions respectively. Convolution is applied on the ‘W’ dimension.
in_channels (int, default 0) – The number of input channels to this layer. If not specified, initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
activation (str) – Activation function to use. See
activation()
. If you don’t specify anything, no activation is applied (ie. “linear” activation: a(x) = x).use_bias (bool) – Whether the layer uses a bias vector.
weight_initializer (str or Initializer) – Initializer for the weight weights matrix.
bias_initializer (str or Initializer) – Initializer for the bias vector.
- Inputs:
data: 3D input tensor with shape (batch_size, in_channels, width) when layout is NCW. For other layouts shape is permuted accordingly.
- Outputs:
out: 3D output tensor with shape (batch_size, channels, out_width) when layout is NCW. out_width is calculated as:
out_width = floor((width+2*padding-dilation*(kernel_size-1)-1)/stride)+1
-
class
Conv1DTranspose
(channels, kernel_size, strides=1, padding=0, output_padding=0, dilation=1, groups=1, layout='NCW', activation=None, use_bias=True, weight_initializer=None, bias_initializer='zeros', in_channels=0, **kwargs)[source]¶ Bases:
mxnet.gluon.nn.conv_layers._Conv
Transposed 1D convolution layer (sometimes called Deconvolution).
The need for transposed convolutions generally arises from the desire to use a transformation going in the opposite direction of a normal convolution, i.e., from something that has the shape of the output of some convolution to something that has the shape of its input while maintaining a connectivity pattern that is compatible with said convolution.
If in_channels is not specified, Parameter initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
- Parameters
channels (int) – The dimensionality of the output space, i.e. the number of output channels (filters) in the convolution.
kernel_size (int or tuple/list of 1 int) – Specifies the dimensions of the convolution window.
strides (int or tuple/list of 1 int) – Specify the strides of the convolution.
padding (int or a tuple/list of 1 int,) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points
output_padding (int or a tuple/list of 1 int) – Controls the amount of implicit zero-paddings on both sides of the output for output_padding number of points for each dimension.
dilation (int or tuple/list of 1 int) – Controls the spacing between the kernel points; also known as the a trous algorithm
groups (int) – Controls the connections between inputs and outputs. At groups=1, all inputs are convolved to all outputs. At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
layout (str, default 'NCW') – Dimension ordering of data and weight. Only supports ‘NCW’ layout for now. ‘N’, ‘C’, ‘W’ stands for batch, channel, and width (time) dimensions respectively. Convolution is applied on the ‘W’ dimension.
in_channels (int, default 0) – The number of input channels to this layer. If not specified, initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
activation (str) – Activation function to use. See
activation()
. If you don’t specify anything, no activation is applied (ie. “linear” activation: a(x) = x).use_bias (bool) – Whether the layer uses a bias vector.
weight_initializer (str or Initializer) – Initializer for the weight weights matrix.
bias_initializer (str or Initializer) – Initializer for the bias vector.
- Inputs:
data: 3D input tensor with shape (batch_size, in_channels, width) when layout is NCW. For other layouts shape is permuted accordingly.
- Outputs:
out: 3D output tensor with shape (batch_size, channels, out_width) when layout is NCW. out_width is calculated as:
out_width = (width-1)*strides-2*padding+kernel_size+output_padding
-
class
Conv2D
(channels, kernel_size, strides=(1, 1), padding=(0, 0), dilation=(1, 1), groups=1, layout='NCHW', activation=None, use_bias=True, weight_initializer=None, bias_initializer='zeros', in_channels=0, **kwargs)[source]¶ Bases:
mxnet.gluon.nn.conv_layers._Conv
2D convolution layer (e.g. spatial convolution over images).
This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of outputs. If use_bias is True, a bias vector is created and added to the outputs. Finally, if activation is not None, it is applied to the outputs as well.
If in_channels is not specified, Parameter initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
- Parameters
channels (int) – The dimensionality of the output space, i.e. the number of output channels (filters) in the convolution.
kernel_size (int or tuple/list of 2 int) – Specifies the dimensions of the convolution window.
strides (int or tuple/list of 2 int,) – Specify the strides of the convolution.
padding (int or a tuple/list of 2 int,) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points
dilation (int or tuple/list of 2 int) – Specifies the dilation rate to use for dilated convolution.
groups (int) – Controls the connections between inputs and outputs. At groups=1, all inputs are convolved to all outputs. At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
layout (str, default 'NCHW') – Dimension ordering of data and weight. Only supports ‘NCHW’ and ‘NHWC’ layout for now. ‘N’, ‘C’, ‘H’, ‘W’ stands for batch, channel, height, and width dimensions respectively. Convolution is applied on the ‘H’ and ‘W’ dimensions.
in_channels (int, default 0) – The number of input channels to this layer. If not specified, initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
activation (str) – Activation function to use. See
activation()
. If you don’t specify anything, no activation is applied (ie. “linear” activation: a(x) = x).use_bias (bool) – Whether the layer uses a bias vector.
weight_initializer (str or Initializer) – Initializer for the weight weights matrix.
bias_initializer (str or Initializer) – Initializer for the bias vector.
- Inputs:
data: 4D input tensor with shape (batch_size, in_channels, height, width) when layout is NCHW. For other layouts shape is permuted accordingly.
- Outputs:
out: 4D output tensor with shape (batch_size, channels, out_height, out_width) when layout is NCHW. out_height and out_width are calculated as:
out_height = floor((height+2*padding[0]-dilation[0]*(kernel_size[0]-1)-1)/stride[0])+1 out_width = floor((width+2*padding[1]-dilation[1]*(kernel_size[1]-1)-1)/stride[1])+1
-
class
Conv2DTranspose
(channels, kernel_size, strides=(1, 1), padding=(0, 0), output_padding=(0, 0), dilation=(1, 1), groups=1, layout='NCHW', activation=None, use_bias=True, weight_initializer=None, bias_initializer='zeros', in_channels=0, **kwargs)[source]¶ Bases:
mxnet.gluon.nn.conv_layers._Conv
Transposed 2D convolution layer (sometimes called Deconvolution).
The need for transposed convolutions generally arises from the desire to use a transformation going in the opposite direction of a normal convolution, i.e., from something that has the shape of the output of some convolution to something that has the shape of its input while maintaining a connectivity pattern that is compatible with said convolution.
If in_channels is not specified, Parameter initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
- Parameters
channels (int) – The dimensionality of the output space, i.e. the number of output channels (filters) in the convolution.
kernel_size (int or tuple/list of 2 int) – Specifies the dimensions of the convolution window.
strides (int or tuple/list of 2 int) – Specify the strides of the convolution.
padding (int or a tuple/list of 2 int,) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points
output_padding (int or a tuple/list of 2 int) – Controls the amount of implicit zero-paddings on both sides of the output for output_padding number of points for each dimension.
dilation (int or tuple/list of 2 int) – Controls the spacing between the kernel points; also known as the a trous algorithm
groups (int) – Controls the connections between inputs and outputs. At groups=1, all inputs are convolved to all outputs. At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
layout (str, default 'NCHW') – Dimension ordering of data and weight. Only supports ‘NCHW’ and ‘NHWC’ layout for now. ‘N’, ‘C’, ‘H’, ‘W’ stands for batch, channel, height, and width dimensions respectively. Convolution is applied on the ‘H’ and ‘W’ dimensions.
in_channels (int, default 0) – The number of input channels to this layer. If not specified, initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
activation (str) – Activation function to use. See
activation()
. If you don’t specify anything, no activation is applied (ie. “linear” activation: a(x) = x).use_bias (bool) – Whether the layer uses a bias vector.
weight_initializer (str or Initializer) – Initializer for the weight weights matrix.
bias_initializer (str or Initializer) – Initializer for the bias vector.
- Inputs:
data: 4D input tensor with shape (batch_size, in_channels, height, width) when layout is NCHW. For other layouts shape is permuted accordingly.
- Outputs:
out: 4D output tensor with shape (batch_size, channels, out_height, out_width) when layout is NCHW. out_height and out_width are calculated as:
out_height = (height-1)*strides[0]-2*padding[0]+kernel_size[0]+output_padding[0] out_width = (width-1)*strides[1]-2*padding[1]+kernel_size[1]+output_padding[1]
-
class
Conv3D
(channels, kernel_size, strides=(1, 1, 1), padding=(0, 0, 0), dilation=(1, 1, 1), groups=1, layout='NCDHW', activation=None, use_bias=True, weight_initializer=None, bias_initializer='zeros', in_channels=0, **kwargs)[source]¶ Bases:
mxnet.gluon.nn.conv_layers._Conv
3D convolution layer (e.g. spatial convolution over volumes).
This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of outputs. If use_bias is True, a bias vector is created and added to the outputs. Finally, if activation is not None, it is applied to the outputs as well.
If in_channels is not specified, Parameter initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
- Parameters
channels (int) – The dimensionality of the output space, i.e. the number of output channels (filters) in the convolution.
kernel_size (int or tuple/list of 3 int) – Specifies the dimensions of the convolution window.
strides (int or tuple/list of 3 int,) – Specify the strides of the convolution.
padding (int or a tuple/list of 3 int,) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points
dilation (int or tuple/list of 3 int) – Specifies the dilation rate to use for dilated convolution.
groups (int) – Controls the connections between inputs and outputs. At groups=1, all inputs are convolved to all outputs. At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
layout (str, default 'NCDHW') – Dimension ordering of data and weight. Only supports ‘NCDHW’ and ‘NDHWC’ layout for now. ‘N’, ‘C’, ‘H’, ‘W’, ‘D’ stands for batch, channel, height, width and depth dimensions respectively. Convolution is applied on the ‘D’, ‘H’ and ‘W’ dimensions.
in_channels (int, default 0) – The number of input channels to this layer. If not specified, initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
activation (str) – Activation function to use. See
activation()
. If you don’t specify anything, no activation is applied (ie. “linear” activation: a(x) = x).use_bias (bool) – Whether the layer uses a bias vector.
weight_initializer (str or Initializer) – Initializer for the weight weights matrix.
bias_initializer (str or Initializer) – Initializer for the bias vector.
- Inputs:
data: 5D input tensor with shape (batch_size, in_channels, depth, height, width) when layout is NCDHW. For other layouts shape is permuted accordingly.
- Outputs:
out: 5D output tensor with shape (batch_size, channels, out_depth, out_height, out_width) when layout is NCDHW. out_depth, out_height and out_width are calculated as:
out_depth = floor((depth+2*padding[0]-dilation[0]*(kernel_size[0]-1)-1)/stride[0])+1 out_height = floor((height+2*padding[1]-dilation[1]*(kernel_size[1]-1)-1)/stride[1])+1 out_width = floor((width+2*padding[2]-dilation[2]*(kernel_size[2]-1)-1)/stride[2])+1
-
class
Conv3DTranspose
(channels, kernel_size, strides=(1, 1, 1), padding=(0, 0, 0), output_padding=(0, 0, 0), dilation=(1, 1, 1), groups=1, layout='NCDHW', activation=None, use_bias=True, weight_initializer=None, bias_initializer='zeros', in_channels=0, **kwargs)[source]¶ Bases:
mxnet.gluon.nn.conv_layers._Conv
Transposed 3D convolution layer (sometimes called Deconvolution).
The need for transposed convolutions generally arises from the desire to use a transformation going in the opposite direction of a normal convolution, i.e., from something that has the shape of the output of some convolution to something that has the shape of its input while maintaining a connectivity pattern that is compatible with said convolution.
If in_channels is not specified, Parameter initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
- Parameters
channels (int) – The dimensionality of the output space, i.e. the number of output channels (filters) in the convolution.
kernel_size (int or tuple/list of 3 int) – Specifies the dimensions of the convolution window.
strides (int or tuple/list of 3 int) – Specify the strides of the convolution.
padding (int or a tuple/list of 3 int,) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points
output_padding (int or a tuple/list of 3 int) – Controls the amount of implicit zero-paddings on both sides of the output for output_padding number of points for each dimension.
dilation (int or tuple/list of 3 int) – Controls the spacing between the kernel points; also known as the a trous algorithm.
groups (int) – Controls the connections between inputs and outputs. At groups=1, all inputs are convolved to all outputs. At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
layout (str, default 'NCDHW') – Dimension ordering of data and weight. Only supports ‘NCDHW’ and ‘NDHWC’ layout for now. ‘N’, ‘C’, ‘H’, ‘W’, ‘D’ stands for batch, channel, height, width and depth dimensions respectively. Convolution is applied on the ‘D’, ‘H’ and ‘W’ dimensions.
in_channels (int, default 0) – The number of input channels to this layer. If not specified, initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
activation (str) – Activation function to use. See
activation()
. If you don’t specify anything, no activation is applied (ie. “linear” activation: a(x) = x).use_bias (bool) – Whether the layer uses a bias vector.
weight_initializer (str or Initializer) – Initializer for the weight weights matrix.
bias_initializer (str or Initializer) – Initializer for the bias vector.
- Inputs:
data: 5D input tensor with shape (batch_size, in_channels, depth, height, width) when layout is NCDHW. For other layouts shape is permuted accordingly.
- Outputs:
out: 5D output tensor with shape (batch_size, channels, out_depth, out_height, out_width) when layout is NCDHW. out_depth, out_height and out_width are calculated as:
out_depth = (depth-1)*strides[0]-2*padding[0]+kernel_size[0]+output_padding[0] out_height = (height-1)*strides[1]-2*padding[1]+kernel_size[1]+output_padding[1] out_width = (width-1)*strides[2]-2*padding[2]+kernel_size[2]+output_padding[2]
-
class
DeformableConvolution
(channels, kernel_size=(1, 1), strides=(1, 1), padding=(0, 0), dilation=(1, 1), groups=1, num_deformable_group=1, layout='NCHW', use_bias=True, in_channels=0, activation=None, weight_initializer=None, bias_initializer='zeros', offset_weight_initializer='zeros', offset_bias_initializer='zeros', offset_use_bias=True, op_name='DeformableConvolution', adj=None)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
2-D Deformable Convolution v_1 (Dai, 2017). Normal Convolution uses sampling points in a regular grid, while the sampling points of Deformablem Convolution can be offset. The offset is learned with a separate convolution layer during the training. Both the convolution layer for generating the output features and the offsets are included in this gluon layer.
- Parameters
channels (int,) – The dimensionality of the output space i.e. the number of output channels in the convolution.
kernel_size (int or tuple/list of 2 ints, (Default value = (1,1))) – Specifies the dimensions of the convolution window.
strides (int or tuple/list of 2 ints, (Default value = (1,1))) – Specifies the strides of the convolution.
padding (int or tuple/list of 2 ints, (Default value = (0,0))) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points.
dilation (int or tuple/list of 2 ints, (Default value = (1,1))) – Specifies the dilation rate to use for dilated convolution.
groups (int, (Default value = 1)) – Controls the connections between inputs and outputs. At groups=1, all inputs are convolved to all outputs. At groups=2, the operation becomes equivalent to having two convolution layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
num_deformable_group (int, (Default value = 1)) – Number of deformable group partitions.
layout (str, (Default value = NCHW)) – Dimension ordering of data and weight. Can be ‘NCW’, ‘NWC’, ‘NCHW’, ‘NHWC’, ‘NCDHW’, ‘NDHWC’, etc. ‘N’, ‘C’, ‘H’, ‘W’, ‘D’ stands for batch, channel, height, width and depth dimensions respectively. Convolution is performed over ‘D’, ‘H’, and ‘W’ dimensions.
use_bias (bool, (Default value = True)) – Whether the layer for generating the output features uses a bias vector.
in_channels (int, (Default value = 0)) – The number of input channels to this layer. If not specified, initialization will be deferred to the first time forward is called and input channels will be inferred from the shape of input data.
activation (str, (Default value = None)) – Activation function to use. See
activation()
. If you don’t specify anything, no activation is applied (ie. “linear” activation: a(x) = x).weight_initializer (str or Initializer, (Default value = None)) – Initializer for the weight weights matrix for the convolution layer for generating the output features.
bias_initializer (str or Initializer, (Default value = zeros)) – Initializer for the bias vector for the convolution layer for generating the output features.
offset_weight_initializer (str or Initializer, (Default value = zeros)) – Initializer for the weight weights matrix for the convolution layer for generating the offset.
offset_bias_initializer (str or Initializer, (Default value = zeros),) – Initializer for the bias vector for the convolution layer for generating the offset.
offset_use_bias (bool, (Default value = True)) – Whether the layer for generating the offset uses a bias vector.
Inputs –
data: 4D input tensor with shape (batch_size, in_channels, height, width) when layout is NCHW. For other layouts shape is permuted accordingly.
Outputs –
out: 4D output tensor with shape (batch_size, channels, out_height, out_width) when layout is NCHW. out_height and out_width are calculated as:
out_height = floor((height+2*padding[0]-dilation[0]*(kernel_size[0]-1)-1)/stride[0])+1 out_width = floor((width+2*padding[1]-dilation[1]*(kernel_size[1]-1)-1)/stride[1])+1
Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(x)Overrides the forward computation.
hybridize
([active, partition_if_dynamic, …])Activates or deactivates
HybridBlock
s recursively.infer_shape
(x)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
Pre-infer the shape of offsite weight parameter based on kernel size,
Pre-infer the shape of weight parameter based on kernel size, group size and channels
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install op hook for block recursively.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
summary
(*inputs)Print the summary of the model’s output and parameters.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
hybridize
(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)¶ Activates or deactivates
HybridBlock
s recursively. Has no effect on non-hybrid children.- Parameters
active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
pre_infer_offset_weight
()[source]¶ Pre-infer the shape of offsite weight parameter based on kernel size, group size and offset channels
-
pre_infer_weight
()[source]¶ Pre-infer the shape of weight parameter based on kernel size, group size and channels
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install op hook for block recursively.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
HybridBlock.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
class
Dense
(units, activation=None, use_bias=True, flatten=True, dtype='float32', weight_initializer=None, bias_initializer='zeros', in_units=0, **kwargs)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Just your regular densely-connected NN layer.
Dense implements the operation: output = activation(dot(input, weight.T) + bias) where activation is the element-wise activation function passed as the activation argument, weight is a weights matrix created by the layer, and bias is a bias vector created by the layer (only applicable if use_bias is True).
- Parameters
units (int) – Dimensionality of the output space.
activation (str) – Activation function to use. See help on Activation layer. If you don’t specify anything, no activation is applied (ie. “linear” activation: a(x) = x).
use_bias (bool, default True) – Whether the layer uses a bias vector.
flatten (bool, default True) – Whether the input tensor should be flattened. If true, all but the first axis of input data are collapsed together. If false, all but the last axis of input data are kept the same, and the transformation applies on the last axis.
dtype (str or np.dtype, default 'float32') – Data type of output embeddings.
weight_initializer (str or Initializer) – Initializer for the kernel weights matrix.
bias_initializer (str or Initializer) – Initializer for the bias vector.
in_units (int, optional) – Size of the input data. If not specified, initialization will be deferred to the first time forward is called and in_units will be inferred from the shape of input data.
Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(x)Overrides the forward computation.
hybridize
([active, partition_if_dynamic, …])Activates or deactivates
HybridBlock
s recursively.infer_shape
(x, *args)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install op hook for block recursively.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
summary
(*inputs)Print the summary of the model’s output and parameters.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).- Inputs:
data: if flatten is True, data should be a tensor with shape (batch_size, x1, x2, …, xn), where x1 * x2 * … * xn is equal to in_units. If flatten is False, data should have shape (x1, x2, …, xn, in_units).
- Outputs:
out: if flatten is True, out will be a tensor with shape (batch_size, units). If flatten is False, out will have shape (x1, x2, …, xn, units).
-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
hybridize
(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)¶ Activates or deactivates
HybridBlock
s recursively. Has no effect on non-hybrid children.- Parameters
active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install op hook for block recursively.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
HybridBlock.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
class
Dropout
(rate, axes=(), **kwargs)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Applies Dropout to the input.
Dropout consists in randomly setting a fraction rate of input units to 0 at each update during training time, which helps prevent overfitting.
- Parameters
rate (float) – Fraction of the input units to drop. Must be a number between 0 and 1.
axes (tuple of int, default ()) – The axes on which dropout mask is shared. If empty, regular dropout is applied.
Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(x)Overrides the forward computation.
hybridize
([active, partition_if_dynamic, …])Activates or deactivates
HybridBlock
s recursively.infer_shape
(*args)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install op hook for block recursively.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
summary
(*inputs)Print the summary of the model’s output and parameters.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).- Inputs:
data: input tensor with arbitrary shape.
- Outputs:
out: output tensor with the same shape as data.
References
Dropout: A Simple Way to Prevent Neural Networks from Overfitting
-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
hybridize
(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)¶ Activates or deactivates
HybridBlock
s recursively. Has no effect on non-hybrid children.- Parameters
active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
-
infer_shape
(*args)¶ Infers shape of Parameters from inputs.
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install op hook for block recursively.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
HybridBlock.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
class
ELU
(alpha=1.0, **kwargs)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
- Exponential Linear Unit (ELU)
“Fast and Accurate Deep Network Learning by Exponential Linear Units”, Clevert et al, 2016 https://arxiv.org/abs/1511.07289 Published as a conference paper at ICLR 2016
Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(x)Overrides the forward computation.
hybridize
([active, partition_if_dynamic, …])Activates or deactivates
HybridBlock
s recursively.infer_shape
(*args)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install op hook for block recursively.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
summary
(*inputs)Print the summary of the model’s output and parameters.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).- Parameters
alpha (float) – The alpha parameter as described by Clevert et al, 2016
- Inputs:
data: input tensor with arbitrary shape.
- Outputs:
out: output tensor with the same shape as data.
-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
hybridize
(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)¶ Activates or deactivates
HybridBlock
s recursively. Has no effect on non-hybrid children.- Parameters
active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
-
infer_shape
(*args)¶ Infers shape of Parameters from inputs.
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install op hook for block recursively.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
HybridBlock.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
class
Embedding
(input_dim, output_dim, dtype='float32', weight_initializer=None, sparse_grad=False, **kwargs)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Turns non-negative integers (indexes/tokens) into dense vectors of fixed size. eg. [4, 20] -> [[0.25, 0.1], [0.6, -0.2]]
Note
if sparse_grad is set to True, the gradient w.r.t weight will be sparse. Only a subset of optimizers support sparse gradients, including SGD, AdaGrad and Adam. By default lazy updates is turned on, which may perform differently from standard updates. For more details, please check the Optimization API at: https://mxnet.apache.org/versions/master/api/python/docs/api/optimizer/index.html
Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(x)Overrides the forward computation.
hybridize
([active, partition_if_dynamic, …])Activates or deactivates
HybridBlock
s recursively.infer_shape
(*args)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install op hook for block recursively.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
summary
(*inputs)Print the summary of the model’s output and parameters.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).- Parameters
input_dim (int) – Size of the vocabulary, i.e. maximum integer index + 1.
output_dim (int) – Dimension of the dense embedding.
dtype (str or np.dtype, default 'float32') – Data type of output embeddings.
weight_initializer (Initializer) – Initializer for the embeddings matrix.
sparse_grad (bool) – If True, gradient w.r.t. weight will be a ‘row_sparse’ NDArray.
Inputs –
data: (N-1)-D tensor with shape: (x1, x2, …, xN-1).
Output –
out: N-D tensor with shape: (x1, x2, …, xN-1, output_dim).
-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
hybridize
(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)¶ Activates or deactivates
HybridBlock
s recursively. Has no effect on non-hybrid children.- Parameters
active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
-
infer_shape
(*args)¶ Infers shape of Parameters from inputs.
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install op hook for block recursively.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
HybridBlock.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
class
Flatten
(**kwargs)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Flattens the input to two dimensional.
- Inputs:
data: input tensor with arbitrary shape (N, x1, x2, …, xn)
- Output:
out: 2D tensor with shape: (N, x1 cdot x2 cdot … cdot xn)
Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(x)Overrides the forward computation.
hybridize
([active, partition_if_dynamic, …])Activates or deactivates
HybridBlock
s recursively.infer_shape
(*args)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install op hook for block recursively.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
summary
(*inputs)Print the summary of the model’s output and parameters.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
hybridize
(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)¶ Activates or deactivates
HybridBlock
s recursively. Has no effect on non-hybrid children.- Parameters
active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
-
infer_shape
(*args)¶ Infers shape of Parameters from inputs.
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install op hook for block recursively.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
HybridBlock.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
class
GELU
(approximation='erf', **kwargs)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
- Gaussian Exponential Linear Unit (GELU)
“Gaussian Error Linear Units (GELUs)”, Hendrycks et al, 2016 https://arxiv.org/abs/1606.08415
Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(x)Overrides the forward computation.
hybridize
([active, partition_if_dynamic, …])Activates or deactivates
HybridBlock
s recursively.infer_shape
(*args)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install op hook for block recursively.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
summary
(*inputs)Print the summary of the model’s output and parameters.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).- Parameters
approximation (string) – Which approximation of GELU calculation to use (erf or tanh).
Inputs –
data: input tensor with arbitrary shape.
Outputs –
out: output tensor with the same shape as data.
-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
hybridize
(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)¶ Activates or deactivates
HybridBlock
s recursively. Has no effect on non-hybrid children.- Parameters
active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
-
infer_shape
(*args)¶ Infers shape of Parameters from inputs.
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install op hook for block recursively.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
HybridBlock.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
class
GlobalAvgPool1D
(layout='NCW', **kwargs)[source]¶ Bases:
mxnet.gluon.nn.conv_layers._Pooling
Global average pooling operation for temporal data.
- Parameters
layout (str, default 'NCW') – Dimension ordering of data and out (‘NCW’ or ‘NWC’). ‘N’, ‘C’, ‘W’ stands for batch, channel, and width (time) dimensions respectively. padding is applied on ‘W’ dimension.
- Inputs:
data: 3D input tensor with shape (batch_size, in_channels, width) when layout is NCW. For other layouts shape is permuted accordingly.
- Outputs:
out: 3D output tensor with shape (batch_size, channels, 1).
-
class
GlobalAvgPool2D
(layout='NCHW', **kwargs)[source]¶ Bases:
mxnet.gluon.nn.conv_layers._Pooling
Global average pooling operation for spatial data.
- Parameters
layout (str, default 'NCHW') – Dimension ordering of data and out (‘NCHW’ or ‘NHWC’). ‘N’, ‘C’, ‘H’, ‘W’ stands for batch, channel, height, and width dimensions respectively.
- Inputs:
data: 4D input tensor with shape (batch_size, in_channels, height, width) when layout is NCHW. For other layouts shape is permuted accordingly.
- Outputs:
out: 4D output tensor with shape (batch_size, channels, 1, 1) when layout is NCHW.
-
class
GlobalAvgPool3D
(layout='NCDHW', **kwargs)[source]¶ Bases:
mxnet.gluon.nn.conv_layers._Pooling
Global average pooling operation for 3D data (spatial or spatio-temporal).
- Parameters
layout (str, default 'NCDHW') – Dimension ordering of data and out (‘NCDHW’ or ‘NDHWC’). ‘N’, ‘C’, ‘H’, ‘W’, ‘D’ stands for batch, channel, height, width and depth dimensions respectively. padding is applied on ‘D’, ‘H’ and ‘W’ dimension.
- Inputs:
data: 5D input tensor with shape (batch_size, in_channels, depth, height, width) when layout is NCDHW. For other layouts shape is permuted accordingly.
- Outputs:
out: 5D output tensor with shape (batch_size, channels, 1, 1, 1) when layout is NCDHW.
-
class
GlobalMaxPool1D
(layout='NCW', **kwargs)[source]¶ Bases:
mxnet.gluon.nn.conv_layers._Pooling
Gloabl max pooling operation for one dimensional (temporal) data.
- Parameters
layout (str, default 'NCW') – Dimension ordering of data and out (‘NCW’ or ‘NWC’). ‘N’, ‘C’, ‘W’ stands for batch, channel, and width (time) dimensions respectively. Pooling is applied on the W dimension.
- Inputs:
data: 3D input tensor with shape (batch_size, in_channels, width) when layout is NCW. For other layouts shape is permuted accordingly.
- Outputs:
out: 3D output tensor with shape (batch_size, channels, 1) when layout is NCW.
-
class
GlobalMaxPool2D
(layout='NCHW', **kwargs)[source]¶ Bases:
mxnet.gluon.nn.conv_layers._Pooling
Global max pooling operation for two dimensional (spatial) data.
- Parameters
layout (str, default 'NCHW') – Dimension ordering of data and out (‘NCHW’ or ‘NHWC’). ‘N’, ‘C’, ‘H’, ‘W’ stands for batch, channel, height, and width dimensions respectively. padding is applied on ‘H’ and ‘W’ dimension.
- Inputs:
data: 4D input tensor with shape (batch_size, in_channels, height, width) when layout is NCHW. For other layouts shape is permuted accordingly.
- Outputs:
out: 4D output tensor with shape (batch_size, channels, 1, 1) when layout is NCHW.
-
class
GlobalMaxPool3D
(layout='NCDHW', **kwargs)[source]¶ Bases:
mxnet.gluon.nn.conv_layers._Pooling
Global max pooling operation for 3D data (spatial or spatio-temporal).
- Parameters
layout (str, default 'NCDHW') – Dimension ordering of data and out (‘NCDHW’ or ‘NDHWC’). ‘N’, ‘C’, ‘H’, ‘W’, ‘D’ stands for batch, channel, height, width and depth dimensions respectively. padding is applied on ‘D’, ‘H’ and ‘W’ dimension.
- Inputs:
data: 5D input tensor with shape (batch_size, in_channels, depth, height, width) when layout is NCW. For other layouts shape is permuted accordingly.
- Outputs:
out: 5D output tensor with shape (batch_size, channels, 1, 1, 1) when layout is NCDHW.
-
class
GroupNorm
(num_groups=1, epsilon=1e-05, center=True, scale=True, beta_initializer='zeros', gamma_initializer='ones', in_channels=0)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Applies group normalization to the n-dimensional input array. This operator takes an n-dimensional input array where the leftmost 2 axis are batch and channel respectively:
\[x = x.reshape((N, num_groups, C // num_groups, ...)) axis = (2, ...) out = \frac{x - mean[x, axis]}{ \sqrt{Var[x, axis] + \epsilon}} * gamma + beta\]Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(data)Overrides the forward computation.
hybridize
([active, partition_if_dynamic, …])Activates or deactivates
HybridBlock
s recursively.infer_shape
(data, *args)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install op hook for block recursively.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
summary
(*inputs)Print the summary of the model’s output and parameters.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).- Parameters
num_groups (int, default 1) – Number of groups to separate the channel axis into.
epsilon (float, default 1e-5) – Small float added to variance to avoid dividing by zero.
center (bool, default True) – If True, add offset of beta to normalized tensor. If False, beta is ignored.
scale (bool, default True) – If True, multiply by gamma. If False, gamma is not used.
beta_initializer (str or Initializer, default ‘zeros’) – Initializer for the beta weight.
gamma_initializer (str or Initializer, default ‘ones’) – Initializer for the gamma weight.
- Inputs:
data: input tensor with shape (N, C, …).
- Outputs:
out: output tensor with the same shape as data.
References
Examples
>>> # Input of shape (2, 3, 4) >>> x = mx.np.array([[[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]], [[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]]) >>> # Group normalization is calculated with the above formula >>> layer = GroupNorm() >>> layer.initialize(device=mx.cpu(0)) >>> layer(x) [[[-1.5932543 -1.3035717 -1.0138891 -0.7242065] [-0.4345239 -0.1448413 0.1448413 0.4345239] [ 0.7242065 1.0138891 1.3035717 1.5932543]] [[-1.5932543 -1.3035717 -1.0138891 -0.7242065] [-0.4345239 -0.1448413 0.1448413 0.4345239] [ 0.7242065 1.0138891 1.3035717 1.5932543]]]
-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
hybridize
(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)¶ Activates or deactivates
HybridBlock
s recursively. Has no effect on non-hybrid children.- Parameters
active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install op hook for block recursively.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
HybridBlock.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
class
HybridBlock
[source]¶ Bases:
mxnet.gluon.block.Block
HybridBlock supports forwarding with both Symbol and NDArray.
HybridBlock is similar to Block, with a few differences:
import mxnet as mx from mxnet.gluon import HybridBlock, nn class Model(HybridBlock): def __init__(self, **kwargs): super(Model, self).__init__(**kwargs) self.dense0 = nn.Dense(20) self.dense1 = nn.Dense(20) def forward(self, x): x = mx.npx.relu(self.dense0(x)) return mx.npx.relu(self.dense1(x)) model = Model() model.initialize(device=mx.cpu(0)) model.hybridize() model(mx.np.zeros((10, 10), device=mx.cpu(0)))
Methods
cast
(dtype)Cast this Block to use another data type.
export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(x, *args)Overrides the forward computation.
hybridize
([active, partition_if_dynamic, …])Activates or deactivates
HybridBlock
s recursively.infer_shape
(*args)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_child
(block[, name])Registers block as a child of self.
register_op_hook
(callback[, monitor_all])Install op hook for block recursively.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
Forward computation in
HybridBlock
must be static to work withSymbol
s, i.e. you cannot callNDArray.asnumpy()
,NDArray.shape
,NDArray.dtype
, NDArray indexing (x[i]) etc on tensors. Also, you cannot use branching or loop logic that bases on non-constant expressions like random numbers or intermediate results, since they change the graph structure for each iteration.Before activating with
hybridize()
,HybridBlock
works just like normalBlock
. After activation,HybridBlock
will create a symbolic graph representing the forward computation and cache it. On subsequent forwards, the cached graph will be used instead offorward()
.Please see references for detailed tutorial.
References
Hybridize - A Hybrid of Imperative and Symbolic Programming
-
cast
(dtype)[source]¶ Cast this Block to use another data type.
- Parameters
dtype (str or numpy.dtype) – The new data type.
-
export
(path, epoch=0, remove_amp_cast=True)[source]¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
forward
(x, *args)[source]¶ Overrides the forward computation. Arguments must be
mxnet.numpy.ndarray
.
-
hybridize
(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)[source]¶ Activates or deactivates
HybridBlock
s recursively. Has no effect on non-hybrid children.- Parameters
active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)[source]¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
register_child
(block, name=None)[source]¶ Registers block as a child of self.
Block
s assigned to self as attributes will be registered automatically.
-
register_op_hook
(callback, monitor_all=False)[source]¶ Install op hook for block recursively.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset_ctx
(ctx)[source]¶ This function has been deprecated. Please refer to
HybridBlock.reset_device
.
-
reset_device
(device)[source]¶ Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
-
class
HybridConcatenate
(axis=-1)[source]¶ Bases:
mxnet.gluon.nn.basic_layers.HybridSequential
Lays HybridBlock s concurrently.
This block feeds its input to all children blocks, and produce the output by concatenating all the children blocks’ outputs on the specified axis.
Example:
net = HybridConcatenate() net.add(nn.Dense(10, activation='relu')) net.add(nn.Dense(20)) net.add(Identity())
Methods
add
(*blocks)Adds block on top of the stack.
apply
(fn)Applies
fn
recursively to every child block as well as self.collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(x)Overrides the forward computation.
hybridize
([active, partition_if_dynamic, …])Activates or deactivates
HybridBlock
s recursively.infer_shape
(*args)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install op hook for block recursively.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
summary
(*inputs)Print the summary of the model’s output and parameters.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).- Parameters
axis (int, default -1) – The axis on which to concatenate the outputs.
-
add
(*blocks)¶ Adds block on top of the stack.
-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
hybridize
(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)¶ Activates or deactivates
HybridBlock
s recursively. Has no effect on non-hybrid children.- Parameters
active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
-
infer_shape
(*args)¶ Infers shape of Parameters from inputs.
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install op hook for block recursively.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
HybridBlock.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
class
HybridLambda
(function)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Wraps an operator or an expression as a HybridBlock object.
- Parameters
function (str or function) –
Function used in lambda must be one of the following: 1) The name of an operator that is available in both symbol and ndarray. For example:
block = HybridLambda('tanh')
A function that conforms to
def function(F, data, *args)
. For example:block = HybridLambda(lambda F, x: F.LeakyReLU(x, slope=0.1))
Inputs –
- ** args *: one or more input data. First argument must be symbol or ndarray. Their
shapes depend on the function.
Output –
** outputs *: one or more output data. Their shapes depend on the function.
Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(x, *args)Overrides the forward computation.
hybridize
([active, partition_if_dynamic, …])Activates or deactivates
HybridBlock
s recursively.infer_shape
(*args)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install op hook for block recursively.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
summary
(*inputs)Print the summary of the model’s output and parameters.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
forward
(x, *args)[source]¶ Overrides the forward computation. Arguments must be
mxnet.numpy.ndarray
.
-
hybridize
(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)¶ Activates or deactivates
HybridBlock
s recursively. Has no effect on non-hybrid children.- Parameters
active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
-
infer_shape
(*args)¶ Infers shape of Parameters from inputs.
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install op hook for block recursively.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
HybridBlock.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
class
HybridSequential
[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Stacks HybridBlocks sequentially.
Example:
net = nn.HybridSequential() net.add(nn.Dense(10, activation='relu')) net.add(nn.Dense(20)) net.hybridize()
Methods
add
(*blocks)Adds block on top of the stack.
apply
(fn)Applies
fn
recursively to every child block as well as self.collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(x, *args)Overrides the forward computation.
hybridize
([active, partition_if_dynamic, …])Activates or deactivates
HybridBlock
s recursively.infer_shape
(*args)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install op hook for block recursively.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
summary
(*inputs)Print the summary of the model’s output and parameters.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
forward
(x, *args)[source]¶ Overrides the forward computation. Arguments must be
mxnet.numpy.ndarray
.
-
hybridize
(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)¶ Activates or deactivates
HybridBlock
s recursively. Has no effect on non-hybrid children.- Parameters
active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
-
infer_shape
(*args)¶ Infers shape of Parameters from inputs.
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install op hook for block recursively.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
HybridBlock.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
-
class
Identity
[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Block that passes through the input directly.
This block can be used in conjunction with HybridConcatenate block for residual connection.
Example:
net = HybridConcatenate() net.add(nn.Dense(10, activation='relu')) net.add(nn.Dense(20)) net.add(Identity())
Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(x)Overrides the forward computation.
hybridize
([active, partition_if_dynamic, …])Activates or deactivates
HybridBlock
s recursively.infer_shape
(*args)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install op hook for block recursively.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
summary
(*inputs)Print the summary of the model’s output and parameters.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
hybridize
(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)¶ Activates or deactivates
HybridBlock
s recursively. Has no effect on non-hybrid children.- Parameters
active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
-
infer_shape
(*args)¶ Infers shape of Parameters from inputs.
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install op hook for block recursively.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
HybridBlock.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
-
class
InstanceNorm
(axis=1, epsilon=1e-05, center=True, scale=False, beta_initializer='zeros', gamma_initializer='ones', in_channels=0, **kwargs)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Applies instance normalization to the n-dimensional input array. This operator takes an n-dimensional input array where (n>2) and normalizes the input using the following formula:
\[ \begin{align}\begin{aligned}\bar{C} = \{i \mid i \neq 0, i \neq axis\}\\out = \frac{x - mean[data, \bar{C}]}{ \sqrt{Var[data, \bar{C}]} + \epsilon} * gamma + beta\end{aligned}\end{align} \]Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(x)Overrides the forward computation.
hybridize
([active, partition_if_dynamic, …])Activates or deactivates
HybridBlock
s recursively.infer_shape
(x, *args)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install op hook for block recursively.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
summary
(*inputs)Print the summary of the model’s output and parameters.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).- Parameters
axis (int, default 1) – The axis that will be excluded in the normalization process. This is typically the channels (C) axis. For instance, after a Conv2D layer with layout=’NCHW’, set axis=1 in InstanceNorm. If layout=’NHWC’, then set axis=3. Data will be normalized along axes excluding the first axis and the axis given.
epsilon (float, default 1e-5) – Small float added to variance to avoid dividing by zero.
center (bool, default True) – If True, add offset of beta to normalized tensor. If False, beta is ignored.
scale (bool, default True) – If True, multiply by gamma. If False, gamma is not used. When the next layer is linear (also e.g. nn.relu), this can be disabled since the scaling will be done by the next layer.
beta_initializer (str or Initializer, default ‘zeros’) – Initializer for the beta weight.
gamma_initializer (str or Initializer, default ‘ones’) – Initializer for the gamma weight.
in_channels (int, default 0) – Number of channels (feature maps) in input data. If not specified, initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
- Inputs:
data: input tensor with arbitrary shape.
- Outputs:
out: output tensor with the same shape as data.
References
Instance Normalization: The Missing Ingredient for Fast Stylization
Examples
>>> # Input of shape (2,1,2) >>> x = mx.np.array([[[ 1.1, 2.2]], ... [[ 3.3, 4.4]]]) >>> # Instance normalization is calculated with the above formula >>> layer = InstanceNorm() >>> layer.initialize(device=mx.cpu(0)) >>> layer(x) [[[-0.99998355 0.99998331]] [[-0.99998319 0.99998361]]]
-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
hybridize
(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)¶ Activates or deactivates
HybridBlock
s recursively. Has no effect on non-hybrid children.- Parameters
active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install op hook for block recursively.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
HybridBlock.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
class
Lambda
(function)[source]¶ Bases:
mxnet.gluon.block.Block
Wraps an operator or an expression as a Block object.
- Parameters
function (str or function) –
Function used in lambda must be one of the following: 1) the name of an operator that is available in ndarray. For example:
block = Lambda('tanh')
a function that conforms to
def function(*args)
. For example:block = Lambda(lambda x: npx.leaky_relu(x, slope=0.1))
Inputs –
** args *: one or more input data. Their shapes depend on the function.
Output –
** outputs *: one or more output data. Their shapes depend on the function.
Methods
forward
(*args)Overrides to implement forward computation using
NDArray
.
-
class
LayerNorm
(axis=-1, epsilon=1e-05, center=True, scale=True, beta_initializer='zeros', gamma_initializer='ones', in_channels=0)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Applies layer normalization to the n-dimensional input array. This operator takes an n-dimensional input array and normalizes the input using the given axis:
\[out = \frac{x - mean[data, axis]}{ \sqrt{Var[data, axis] + \epsilon}} * gamma + beta\]Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(data)Overrides the forward computation.
hybridize
([active, partition_if_dynamic, …])Activates or deactivates
HybridBlock
s recursively.infer_shape
(data, *args)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install op hook for block recursively.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
summary
(*inputs)Print the summary of the model’s output and parameters.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).- Parameters
axis (int, default -1) – The axis that should be normalized. This is typically the axis of the channels.
epsilon (float, default 1e-5) – Small float added to variance to avoid dividing by zero.
center (bool, default True) – If True, add offset of beta to normalized tensor. If False, beta is ignored.
scale (bool, default True) – If True, multiply by gamma. If False, gamma is not used.
beta_initializer (str or Initializer, default ‘zeros’) – Initializer for the beta weight.
gamma_initializer (str or Initializer, default ‘ones’) – Initializer for the gamma weight.
in_channels (int, default 0) – Number of channels (feature maps) in input data. If not specified, initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
- Inputs:
data: input tensor with arbitrary shape.
- Outputs:
out: output tensor with the same shape as data.
References
Examples
>>> # Input of shape (2, 5) >>> x = mx.np.array([[1, 2, 3, 4, 5], [1, 1, 2, 2, 2]]) >>> # Layer normalization is calculated with the above formula >>> layer = LayerNorm() >>> layer.initialize(device=mx.cpu(0)) >>> layer(x) [[-1.41421 -0.707105 0. 0.707105 1.41421 ] [-1.2247195 -1.2247195 0.81647956 0.81647956 0.81647956]]
-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
hybridize
(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)¶ Activates or deactivates
HybridBlock
s recursively. Has no effect on non-hybrid children.- Parameters
active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install op hook for block recursively.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
HybridBlock.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
class
LeakyReLU
(alpha, **kwargs)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Leaky version of a Rectified Linear Unit.
It allows a small gradient when the unit is not active
\[\begin{split}f\left(x\right) = \left\{ \begin{array}{lr} \alpha x & : x \lt 0 \\ x & : x \geq 0 \\ \end{array} \right.\\\end{split}\]Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(x)Overrides the forward computation.
hybridize
([active, partition_if_dynamic, …])Activates or deactivates
HybridBlock
s recursively.infer_shape
(*args)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install op hook for block recursively.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
summary
(*inputs)Print the summary of the model’s output and parameters.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).- Parameters
alpha (float) – slope coefficient for the negative half axis. Must be >= 0.
- Inputs:
data: input tensor with arbitrary shape.
- Outputs:
out: output tensor with the same shape as data.
-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
hybridize
(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)¶ Activates or deactivates
HybridBlock
s recursively. Has no effect on non-hybrid children.- Parameters
active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
-
infer_shape
(*args)¶ Infers shape of Parameters from inputs.
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install op hook for block recursively.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
HybridBlock.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
class
MaxPool1D
(pool_size=2, strides=None, padding=0, layout='NCW', ceil_mode=False, **kwargs)[source]¶ Bases:
mxnet.gluon.nn.conv_layers._Pooling
Max pooling operation for one dimensional data.
- Parameters
pool_size (int) – Size of the max pooling windows.
strides (int, or None) – Factor by which to downscale. E.g. 2 will halve the input size. If None, it will default to pool_size.
padding (int) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points.
layout (str, default 'NCW') – Dimension ordering of data and out (‘NCW’ or ‘NWC’). ‘N’, ‘C’, ‘W’ stands for batch, channel, and width (time) dimensions respectively. Pooling is applied on the W dimension.
ceil_mode (bool, default False) – When True, will use ceil instead of floor to compute the output shape.
- Inputs:
data: 3D input tensor with shape (batch_size, in_channels, width) when layout is NCW. For other layouts shape is permuted accordingly.
- Outputs:
out: 3D output tensor with shape (batch_size, channels, out_width) when layout is NCW. out_width is calculated as:
out_width = floor((width+2*padding-pool_size)/strides)+1
When ceil_mode is True, ceil will be used instead of floor in this equation.
-
class
MaxPool2D
(pool_size=(2, 2), strides=None, padding=0, layout='NCHW', ceil_mode=False, **kwargs)[source]¶ Bases:
mxnet.gluon.nn.conv_layers._Pooling
Max pooling operation for two dimensional (spatial) data.
- Parameters
pool_size (int or list/tuple of 2 ints,) – Size of the max pooling windows.
strides (int, list/tuple of 2 ints, or None.) – Factor by which to downscale. E.g. 2 will halve the input size. If None, it will default to pool_size.
padding (int or list/tuple of 2 ints,) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points.
layout (str, default 'NCHW') – Dimension ordering of data and out (‘NCHW’ or ‘NHWC’). ‘N’, ‘C’, ‘H’, ‘W’ stands for batch, channel, height, and width dimensions respectively. padding is applied on ‘H’ and ‘W’ dimension.
ceil_mode (bool, default False) – When True, will use ceil instead of floor to compute the output shape.
- Inputs:
data: 4D input tensor with shape (batch_size, in_channels, height, width) when layout is NCHW. For other layouts shape is permuted accordingly.
- Outputs:
out: 4D output tensor with shape (batch_size, channels, out_height, out_width) when layout is NCHW. out_height and out_width are calculated as:
out_height = floor((height+2*padding[0]-pool_size[0])/strides[0])+1 out_width = floor((width+2*padding[1]-pool_size[1])/strides[1])+1
When ceil_mode is True, ceil will be used instead of floor in this equation.
-
class
MaxPool3D
(pool_size=(2, 2, 2), strides=None, padding=0, ceil_mode=False, layout='NCDHW', **kwargs)[source]¶ Bases:
mxnet.gluon.nn.conv_layers._Pooling
Max pooling operation for 3D data (spatial or spatio-temporal).
- Parameters
pool_size (int or list/tuple of 3 ints,) – Size of the max pooling windows.
strides (int, list/tuple of 3 ints, or None.) – Factor by which to downscale. E.g. 2 will halve the input size. If None, it will default to pool_size.
padding (int or list/tuple of 3 ints,) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points.
layout (str, default 'NCDHW') – Dimension ordering of data and out (‘NCDHW’ or ‘NDHWC’). ‘N’, ‘C’, ‘H’, ‘W’, ‘D’ stands for batch, channel, height, width and depth dimensions respectively. padding is applied on ‘D’, ‘H’ and ‘W’ dimension.
ceil_mode (bool, default False) – When True, will use ceil instead of floor to compute the output shape.
- Inputs:
data: 5D input tensor with shape (batch_size, in_channels, depth, height, width) when layout is NCW. For other layouts shape is permuted accordingly.
- Outputs:
out: 5D output tensor with shape (batch_size, channels, out_depth, out_height, out_width) when layout is NCDHW. out_depth, out_height and out_width are calculated as:
out_depth = floor((depth+2*padding[0]-pool_size[0])/strides[0])+1 out_height = floor((height+2*padding[1]-pool_size[1])/strides[1])+1 out_width = floor((width+2*padding[2]-pool_size[2])/strides[2])+1
When ceil_mode is True, ceil will be used instead of floor in this equation.
-
class
ModulatedDeformableConvolution
(channels, kernel_size=(1, 1), strides=(1, 1), padding=(0, 0), dilation=(1, 1), groups=1, num_deformable_group=1, layout='NCHW', use_bias=True, in_channels=0, activation=None, weight_initializer=None, bias_initializer='zeros', offset_weight_initializer='zeros', offset_bias_initializer='zeros', offset_use_bias=True, op_name='ModulatedDeformableConvolution', adj=None)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
2-D Deformable Convolution v2 (Dai, 2018).
The modulated deformable convolution operation is described in https://arxiv.org/abs/1811.11168
- Parameters
channels (int,) – The dimensionality of the output space i.e. the number of output channels in the convolution.
kernel_size (int or tuple/list of 2 ints, (Default value = (1,1))) – Specifies the dimensions of the convolution window.
strides (int or tuple/list of 2 ints, (Default value = (1,1))) – Specifies the strides of the convolution.
padding (int or tuple/list of 2 ints, (Default value = (0,0))) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points.
dilation (int or tuple/list of 2 ints, (Default value = (1,1))) – Specifies the dilation rate to use for dilated convolution.
groups (int, (Default value = 1)) – Controls the connections between inputs and outputs. At groups=1, all inputs are convolved to all outputs. At groups=2, the operation becomes equivalent to having two convolution layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
num_deformable_group (int, (Default value = 1)) – Number of deformable group partitions.
layout (str, (Default value = NCHW)) – Dimension ordering of data and weight. Can be ‘NCW’, ‘NWC’, ‘NCHW’, ‘NHWC’, ‘NCDHW’, ‘NDHWC’, etc. ‘N’, ‘C’, ‘H’, ‘W’, ‘D’ stands for batch, channel, height, width and depth dimensions respectively. Convolution is performed over ‘D’, ‘H’, and ‘W’ dimensions.
use_bias (bool, (Default value = True)) – Whether the layer for generating the output features uses a bias vector.
in_channels (int, (Default value = 0)) – The number of input channels to this layer. If not specified, initialization will be deferred to the first time forward is called and input channels will be inferred from the shape of input data.
activation (str, (Default value = None)) – Activation function to use. See
Activation()
. If you don’t specify anything, no activation is applied (ie. “linear” activation: a(x) = x).weight_initializer (str or Initializer, (Default value = None)) – Initializer for the weight weights matrix for the convolution layer for generating the output features.
bias_initializer (str or Initializer, (Default value = zeros)) – Initializer for the bias vector for the convolution layer for generating the output features.
offset_weight_initializer (str or Initializer, (Default value = zeros)) – Initializer for the weight weights matrix for the convolution layer for generating the offset.
offset_bias_initializer (str or Initializer, (Default value = zeros),) – Initializer for the bias vector for the convolution layer for generating the offset.
offset_use_bias (bool, (Default value = True)) – Whether the layer for generating the offset uses a bias vector.
Inputs –
data: 4D input tensor with shape (batch_size, in_channels, height, width) when layout is NCHW. For other layouts shape is permuted accordingly.
Outputs –
out: 4D output tensor with shape (batch_size, channels, out_height, out_width) when layout is NCHW. out_height and out_width are calculated as:
out_height = floor((height+2*padding[0]-dilation[0]*(kernel_size[0]-1)-1)/stride[0])+1 out_width = floor((width+2*padding[1]-dilation[1]*(kernel_size[1]-1)-1)/stride[1])+1
Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(x)Overrides the forward computation.
hybridize
([active, partition_if_dynamic, …])Activates or deactivates
HybridBlock
s recursively.infer_shape
(x)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
Pre-infer the shape of offsite weight parameter based on kernel size,
Pre-infer the shape of weight parameter based on kernel size, group size and channels
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install op hook for block recursively.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
summary
(*inputs)Print the summary of the model’s output and parameters.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
hybridize
(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)¶ Activates or deactivates
HybridBlock
s recursively. Has no effect on non-hybrid children.- Parameters
active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
pre_infer_offset_weight
()[source]¶ Pre-infer the shape of offsite weight parameter based on kernel size, group size and offset channels
-
pre_infer_weight
()[source]¶ Pre-infer the shape of weight parameter based on kernel size, group size and channels
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install op hook for block recursively.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
HybridBlock.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
class
PReLU
(alpha_initializer=<mxnet.initializer.Constant object>, in_channels=1, **kwargs)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Parametric leaky version of a Rectified Linear Unit. <https://arxiv.org/abs/1502.01852>`_ paper.
It learns a gradient when the unit is not active
\[\begin{split}f\left(x\right) = \left\{ \begin{array}{lr} \alpha x & : x \lt 0 \\ x & : x \geq 0 \\ \end{array} \right.\\\end{split}\]Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(x)Overrides the forward computation.
hybridize
([active, partition_if_dynamic, …])Activates or deactivates
HybridBlock
s recursively.infer_shape
(*args)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install op hook for block recursively.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
summary
(*inputs)Print the summary of the model’s output and parameters.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).where alpha is a learned parameter.
- Parameters
alpha_initializer (Initializer) – Initializer for the embeddings matrix.
in_channels (int, default 1) – Number of channels (alpha parameters) to learn. Can either be 1 or n where n is the size of the second dimension of the input tensor.
Inputs –
data: input tensor with arbitrary shape.
Outputs –
out: output tensor with the same shape as data.
-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
hybridize
(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)¶ Activates or deactivates
HybridBlock
s recursively. Has no effect on non-hybrid children.- Parameters
active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
-
infer_shape
(*args)¶ Infers shape of Parameters from inputs.
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install op hook for block recursively.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
HybridBlock.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
class
PixelShuffle1D
(factor)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Pixel-shuffle layer for upsampling in 1 dimension.
Pixel-shuffling is the operation of taking groups of values along the channel dimension and regrouping them into blocks of pixels along the
W
dimension, thereby effectively multiplying that dimension by a constant factor in size.For example, a feature map of shape \((fC, W)\) is reshaped into \((C, fW)\) by forming little value groups of size \(f\) and arranging them in a grid of size \(W\).
- Parameters
factor (int or 1-tuple of int) – Upsampling factor, applied to the
W
dimension.Inputs –
data: Tensor of shape
(N, f*C, W)
.
Outputs –
out: Tensor of shape
(N, C, W*f)
.
Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(x)Perform pixel-shuffling on the input.
hybridize
([active, partition_if_dynamic, …])Activates or deactivates
HybridBlock
s recursively.infer_shape
(*args)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install op hook for block recursively.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
summary
(*inputs)Print the summary of the model’s output and parameters.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).Examples
>>> pxshuf = PixelShuffle1D(2) >>> x = mx.np.zeros((1, 8, 3)) >>> pxshuf(x).shape (1, 4, 6)
-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
hybridize
(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)¶ Activates or deactivates
HybridBlock
s recursively. Has no effect on non-hybrid children.- Parameters
active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
-
infer_shape
(*args)¶ Infers shape of Parameters from inputs.
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install op hook for block recursively.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
HybridBlock.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
class
PixelShuffle2D
(factor)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Pixel-shuffle layer for upsampling in 2 dimensions.
Pixel-shuffling is the operation of taking groups of values along the channel dimension and regrouping them into blocks of pixels along the
H
andW
dimensions, thereby effectively multiplying those dimensions by a constant factor in size.For example, a feature map of shape \((f^2 C, H, W)\) is reshaped into \((C, fH, fW)\) by forming little \(f \times f\) blocks of pixels and arranging them in an \(H \times W\) grid.
Pixel-shuffling together with regular convolution is an alternative, learnable way of upsampling an image by arbitrary factors. It is reported to help overcome checkerboard artifacts that are common in upsampling with transposed convolutions (also called deconvolutions). See the paper Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network for further details.
- Parameters
factor (int or 2-tuple of int) – Upsampling factors, applied to the
H
andW
dimensions, in that order.Inputs –
data: Tensor of shape
(N, f1*f2*C, H, W)
.
Outputs –
out: Tensor of shape
(N, C, H*f1, W*f2)
.
Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(x)Perform pixel-shuffling on the input.
hybridize
([active, partition_if_dynamic, …])Activates or deactivates
HybridBlock
s recursively.infer_shape
(*args)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install op hook for block recursively.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
summary
(*inputs)Print the summary of the model’s output and parameters.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).Examples
>>> pxshuf = PixelShuffle2D((2, 3)) >>> x = mx.np.zeros((1, 12, 3, 5)) >>> pxshuf(x).shape (1, 2, 6, 15)
-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
hybridize
(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)¶ Activates or deactivates
HybridBlock
s recursively. Has no effect on non-hybrid children.- Parameters
active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
-
infer_shape
(*args)¶ Infers shape of Parameters from inputs.
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install op hook for block recursively.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
HybridBlock.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
class
PixelShuffle3D
(factor)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Pixel-shuffle layer for upsampling in 3 dimensions.
Pixel-shuffling (or voxel-shuffling in 3D) is the operation of taking groups of values along the channel dimension and regrouping them into blocks of voxels along the
D
,H
andW
dimensions, thereby effectively multiplying those dimensions by a constant factor in size.For example, a feature map of shape \((f^3 C, D, H, W)\) is reshaped into \((C, fD, fH, fW)\) by forming little \(f \times f \times f\) blocks of voxels and arranging them in a \(D \times H \times W\) grid.
Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(x)Perform pixel-shuffling on the input.
hybridize
([active, partition_if_dynamic, …])Activates or deactivates
HybridBlock
s recursively.infer_shape
(*args)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install op hook for block recursively.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
summary
(*inputs)Print the summary of the model’s output and parameters.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).Pixel-shuffling together with regular convolution is an alternative, learnable way of upsampling an image by arbitrary factors. It is reported to help overcome checkerboard artifacts that are common in upsampling with transposed convolutions (also called deconvolutions). See the paper Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network for further details.
- Parameters
factor (int or 3-tuple of int) – Upsampling factors, applied to the
D
,H
andW
dimensions, in that order.Inputs –
data: Tensor of shape
(N, f1*f2*f3*C, D, H, W)
.
Outputs –
out: Tensor of shape
(N, C, D*f1, H*f2, W*f3)
.
Examples
>>> pxshuf = PixelShuffle3D((2, 3, 4)) >>> x = mx.np.zeros((1, 48, 3, 5, 7)) >>> pxshuf(x).shape (1, 2, 6, 15, 28)
-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
hybridize
(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)¶ Activates or deactivates
HybridBlock
s recursively. Has no effect on non-hybrid children.- Parameters
active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
-
infer_shape
(*args)¶ Infers shape of Parameters from inputs.
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install op hook for block recursively.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
HybridBlock.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
class
ReflectionPad2D
(padding=0, **kwargs)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Pads the input tensor using the reflection of the input boundary.
- Parameters
padding (int) – An integer padding size
Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(x)Use pad operator in numpy extension module,
hybridize
([active, partition_if_dynamic, …])Activates or deactivates
HybridBlock
s recursively.infer_shape
(*args)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install op hook for block recursively.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
summary
(*inputs)Print the summary of the model’s output and parameters.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).- Inputs:
data: input tensor with the shape \((N, C, H_{in}, W_{in})\).
- Outputs:
out: output tensor with the shape \((N, C, H_{out}, W_{out})\), where
\[ \begin{align}\begin{aligned}H_{out} = H_{in} + 2 \cdot padding\\W_{out} = W_{in} + 2 \cdot padding\end{aligned}\end{align} \]
Examples
>>> m = nn.ReflectionPad2D(3) >>> input = mx.np.random.normal(size=(16, 3, 224, 224)) >>> output = m(input)
-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
forward
(x)[source]¶ Use pad operator in numpy extension module, which has backward support for reflect mode
-
hybridize
(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)¶ Activates or deactivates
HybridBlock
s recursively. Has no effect on non-hybrid children.- Parameters
active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
-
infer_shape
(*args)¶ Infers shape of Parameters from inputs.
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install op hook for block recursively.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
HybridBlock.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
class
SELU
(**kwargs)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
- Scaled Exponential Linear Unit (SELU)
“Self-Normalizing Neural Networks”, Klambauer et al, 2017 https://arxiv.org/abs/1706.02515
- Inputs:
data: input tensor with arbitrary shape.
- Outputs:
out: output tensor with the same shape as data.
Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(x)Overrides the forward computation.
hybridize
([active, partition_if_dynamic, …])Activates or deactivates
HybridBlock
s recursively.infer_shape
(*args)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install op hook for block recursively.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
summary
(*inputs)Print the summary of the model’s output and parameters.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
hybridize
(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)¶ Activates or deactivates
HybridBlock
s recursively. Has no effect on non-hybrid children.- Parameters
active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
-
infer_shape
(*args)¶ Infers shape of Parameters from inputs.
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install op hook for block recursively.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
HybridBlock.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
class
Sequential
[source]¶ Bases:
mxnet.gluon.block.Block
Stacks Blocks sequentially.
Example:
net = nn.Sequential() net.add(nn.Dense(10, activation='relu')) net.add(nn.Dense(20))
Methods
add
(*blocks)Adds block on top of the stack.
forward
(x, *args)Overrides to implement forward computation using
NDArray
.hybridize
([active])Activates or deactivates HybridBlock s recursively.
-
class
SiLU
(**kwargs)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
- Sigmoid Linear Units
Originally proposed “Gaussian Error Linear Units (GELUs)”, Hendrycks et al, 2016 https://arxiv.org/abs/1606.08415
Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(x)Overrides the forward computation.
hybridize
([active, partition_if_dynamic, …])Activates or deactivates
HybridBlock
s recursively.infer_shape
(*args)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install op hook for block recursively.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
summary
(*inputs)Print the summary of the model’s output and parameters.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).- Parameters
beta (float) – silu(x) = x * sigmoid(x)
- Inputs:
data: input tensor with arbitrary shape.
- Outputs:
out: output tensor with the same shape as data.
-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
hybridize
(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)¶ Activates or deactivates
HybridBlock
s recursively. Has no effect on non-hybrid children.- Parameters
active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
-
infer_shape
(*args)¶ Infers shape of Parameters from inputs.
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install op hook for block recursively.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
HybridBlock.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
class
Swish
(beta=1.0, **kwargs)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
- Swish Activation function (SiLU with a hyperparameter)
Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(x)Overrides the forward computation.
hybridize
([active, partition_if_dynamic, …])Activates or deactivates
HybridBlock
s recursively.infer_shape
(*args)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install op hook for block recursively.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
summary
(*inputs)Print the summary of the model’s output and parameters.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).- Parameters
beta (float) – swish(x) = x * sigmoid(beta*x)
- Inputs:
data: input tensor with arbitrary shape.
- Outputs:
out: output tensor with the same shape as data.
-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
hybridize
(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)¶ Activates or deactivates
HybridBlock
s recursively. Has no effect on non-hybrid children.- Parameters
active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
-
infer_shape
(*args)¶ Infers shape of Parameters from inputs.
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install op hook for block recursively.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
HybridBlock.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
class
SymbolBlock
(outputs, inputs, params=None)[source]¶ Bases:
mxnet.gluon.block.HybridBlock
Construct block from symbol. This is useful for using pre-trained models as feature extractors. For example, you may want to extract the output from fc2 layer in AlexNet.
- Parameters
Methods
cast
(dtype)Cast this Block to use another data type.
forward
(x, *args)Overrides the forward computation.
imports
(symbol_file, input_names[, …])Import model previously saved by gluon.HybridBlock.export as a gluon.SymbolBlock for use in Gluon.
infer_shape
(*args)Infers shape of Parameters from inputs.
Examples
>>> # To extract the feature from fc1 and fc2 layers of AlexNet: >>> alexnet = gluon.model_zoo.vision.alexnet(pretrained=True, device=mx.cpu()) >>> inputs = mx.sym.var('data') >>> out = alexnet(inputs) >>> internals = out.get_internals() >>> print(internals.list_outputs()) ['data', ..., 'features_9_act_fwd_output', ..., 'features_11_act_fwd_output', ...] >>> outputs = [internals['features_9_act_fwd_output'], internals['features_11_act_fwd_output']] >>> # Create SymbolBlock that shares parameters with alexnet >>> feat_model = gluon.SymbolBlock(outputs, inputs, params=alexnet.collect_params()) >>> x = mx.nd.random.normal(shape=(16, 3, 224, 224)) >>> print(feat_model(x))
-
cast
(dtype)[source]¶ Cast this Block to use another data type.
- Parameters
dtype (str or numpy.dtype) – The new data type.
-
forward
(x, *args)[source]¶ Overrides the forward computation. Arguments must be
mxnet.numpy.ndarray
.
-
static
imports
(symbol_file, input_names, param_file=None, device=None, allow_missing=False, ignore_extra=False)[source]¶ Import model previously saved by gluon.HybridBlock.export as a gluon.SymbolBlock for use in Gluon.
- Parameters
symbol_file (str) – Path to symbol file.
input_names (list of str) – List of input variable names
param_file (str, optional) – Path to parameter file.
device (Device, default None) – The device to initialize gluon.SymbolBlock on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
- Returns
gluon.SymbolBlock loaded from symbol and parameter files.
- Return type
Examples
>>> net1 = gluon.model_zoo.vision.resnet18_v1(pretrained=True) >>> net1.hybridize() >>> x = mx.nd.random.normal(shape=(1, 3, 32, 32)) >>> out1 = net1(x) >>> net1.export('net1', epoch=1) >>> >>> net2 = gluon.SymbolBlock.imports( ... 'net1-symbol.json', ['data'], 'net1-0001.params') >>> out2 = net2(x)
-
class
SyncBatchNorm
(in_channels=0, num_devices=None, momentum=0.9, epsilon=1e-05, center=True, scale=True, use_global_stats=False, beta_initializer='zeros', gamma_initializer='ones', running_mean_initializer='zeros', running_variance_initializer='ones', **kwargs)[source]¶ Bases:
mxnet.gluon.nn.basic_layers.BatchNorm
Cross-GPU Synchronized Batch normalization (SyncBN)
Standard BN 1 implementation only normalize the data within each device. SyncBN normalizes the input within the whole mini-batch. We follow the implementation described in the paper 2.
Note: Current implementation of SyncBN does not support FP16 training. For FP16 inference, use standard nn.BatchNorm instead of SyncBN.
- Parameters
in_channels (int, default 0) – Number of channels (feature maps) in input data. If not specified, initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
num_devices (int, default number of visible GPUs) –
momentum (float, default 0.9) – Momentum for the moving average.
epsilon (float, default 1e-5) – Small float added to variance to avoid dividing by zero.
center (bool, default True) – If True, add offset of beta to normalized tensor. If False, beta is ignored.
scale (bool, default True) – If True, multiply by gamma. If False, gamma is not used. When the next layer is linear (also e.g. nn.relu), this can be disabled since the scaling will be done by the next layer.
use_global_stats (bool, default False) – If True, use global moving statistics instead of local batch-norm. This will force change batch-norm into a scale shift operator. If False, use local batch-norm.
beta_initializer (str or Initializer, default ‘zeros’) – Initializer for the beta weight.
gamma_initializer (str or Initializer, default ‘ones’) – Initializer for the gamma weight.
running_mean_initializer (str or Initializer, default ‘zeros’) – Initializer for the running mean.
running_variance_initializer (str or Initializer, default ‘ones’) – Initializer for the running variance.
Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(x)Overrides the forward computation.
hybridize
([active, partition_if_dynamic, …])Activates or deactivates
HybridBlock
s recursively.infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install op hook for block recursively.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
summary
(*inputs)Print the summary of the model’s output and parameters.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).- Inputs:
data: input tensor with arbitrary shape.
- Outputs:
out: output tensor with the same shape as data.
- Reference:
- 1
Ioffe, Sergey, and Christian Szegedy. “Batch normalization: Accelerating deep network training by reducing internal covariate shift.” ICML 2015
- 2
Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, and Amit Agrawal. “Context Encoding for Semantic Segmentation.” CVPR 2018
-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
hybridize
(active=True, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None)¶ Activates or deactivates
HybridBlock
s recursively. Has no effect on non-hybrid children.- Parameters
active (bool, default True) – Whether to turn hybrid on or off.
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install op hook for block recursively.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
HybridBlock.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices. If the Block is hybridized, it will reset the _cached_op_args.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.