MXNet supports automatic differentiation with the
autograd allows you to differentiate a graph of NDArray operations
with the chain rule.
This is called define-by-run, i.e., the network is defined on-the-fly by
running forward computation. You can define exotic network structures
and differentiate them, and each iteration can have a totally different
import mxnet as mx from mxnet import autograd
autograd, we must first mark variables that require gradient and
attach gradient buffers to them:
x = mx.nd.array([[1, 2], [3, 4]]) x.attach_grad()
Now we can define the network while running forward computation by wrapping
it inside a
record (operations out of
record does not define
a graph and cannot be differentiated):
with autograd.record(): y = x * 2 z = y * x
Let’s backprop with
z.backward(), which is equivalent to
z.backward(mx.nd.ones_like(z)). When z has more than one entry,
is equivalent to
Now, let’s see if this is the expected output.
Here, y = f(x), z = f(y) = f(g(x)) which means y = 2 * x and z = 2 * x * x.
After, doing backprop with
z.backward(), we will get gradient dz/dx as follows:
dy/dx = 2, dz/dx = 4 * x
So, we should get x.grad as an array of [[4, 8],[12, 16]].