# Custom Loss Function
This tutorial shows how to define and use a customized loss function to guide training of your neural network.
Letâ€™s begin with a small regression example. We can build, train, and evaluate a regression model on the Boston Housing dataset with the following code:
```R
data(BostonHousing, package = "mlbench")
BostonHousing[, sapply(BostonHousing, is.factor)] <-
as.numeric(as.character(BostonHousing[, sapply(BostonHousing, is.factor)]))
BostonHousing <- data.frame(scale(BostonHousing))
test.ind = seq(1, 506, 5) # every 5th point is held-out for testing
train.x = data.matrix(BostonHousing[-test.ind, -14])
train.y = BostonHousing[-test.ind, 14]
test.x = data.matrix(BostonHousing[test.ind, -14])
test.y = BostonHousing[test.ind, 14]
```
```R
require(mxnet)
data <- mx.symbol.Variable("data")
label <- mx.symbol.Variable("label")
fc1 <- mx.symbol.FullyConnected(data, num_hidden = 14, name = "fc1")
tanh1 <- mx.symbol.Activation(fc1, act_type = "tanh", name = "tanh1")
fc2 <- mx.symbol.FullyConnected(tanh1, num_hidden = 1, name = "fc2")
lro <- mx.symbol.LinearRegressionOutput(fc2, name = "lro")
mx.set.seed(0)
model <- mx.model.FeedForward.create(lro, X = train.x, y = train.y,
ctx = mx.cpu(),
num.round = 5,
array.batch.size = 60,
optimizer = "rmsprop",
verbose = TRUE,
array.layout = "rowmajor",
batch.end.callback = NULL,
epoch.end.callback = NULL)
pred <- predict(model, test.x)
mse.test <- mean((test.y - pred[1,])^2)
sprintf("MSE on test data: %f",mse.test)
```
**mxnet** provides the following built-in loss functionsÂ (with corresponding appropriate output transformations):
- ``LinearRegressionOutput``: seeks to estimate conditional expectations via the mean-squared-error loss.
- ``MAERegressionOutput``: seeks to estimate conditional medians via the mean-absolute-error loss. Useful when our training data is contaminated by outliers.
- ``LogisticRegressionOutput``: seeks to estimate conditional class-probabilities via the logistic loss (first applies sigmoid transformation to output and then computes binary cross-entropy loss). Useful when we are predicting binary class labels rather than numerical target values.
However, we may wish to use a different loss function in our own application.
You can provide your own loss function by using ``mx.symbol.MakeLoss`` when constructing the network.
## Using Your Own Loss Function
We still use the same regression task and neural network architecture from the previous example. However, this time we use ``mx.symbol.MakeLoss`` to minimize a custom loss function that is applied to the ``fc2`` output from our neural network. Below, we provide an example showing how to use the [pseudo-Huber loss function](https://en.wikipedia.org/wiki/Huber_loss#Pseudo-Huber_loss_function), a popular choice for robust regression.
First, we construct the pseudo-Huber loss function below. Note that all operations in our loss function must be defined in terms of **mxnet** ``Symbol`` objects rather than R arrays, in order to allow for subsequent automatic differentiation of the loss during network training. The **mnxnet** package contains a variety of ``Symbol`` operations you can combine to form pretty much any loss function. The loss should take in model predictions and the corresponding ground truth labels (as well as other auxiliary parameters, such as the ``delta`` value used in the Huber loss).
```R
pseudoHuberLoss <- function(pred, label, delta=1) {
diff <- mx.symbol.Reshape(fc2, shape = 0) - label
(mx.symbol.sqrt(1 + mx.symbol.square(diff/delta)) - 1) * delta^2
}
```
Define our neural network archictecture which makes use of this custom loss function:
```R
data <- mx.symbol.Variable("data")
label <- mx.symbol.Variable("label")
fc1 <- mx.symbol.FullyConnected(data, num_hidden = 14, name = "fc1")
tanh1 <- mx.symbol.Activation(fc1, act_type = "tanh", name = "tanh1")
fc2 <- mx.symbol.FullyConnected(tanh1, num_hidden = 1, name = "fc2")
lro2 <- mx.symbol.MakeLoss(pseudoHuberLoss(fc2,label), name="psuedohuber")
```
Now we can train the network just as usual:
```R
mx.set.seed(0)
model2 <- mx.model.FeedForward.create(lro2, X = train.x, y = train.y,
ctx = mx.cpu(),
num.round = 5,
array.batch.size = 60,
optimizer = "rmsprop",
verbose = TRUE,
array.layout = "rowmajor",
batch.end.callback = NULL,
epoch.end.callback = NULL)
```
Finally, we can evaluate the pseudo-Huber loss of our trained network on the test data.
**Caution:** the output of ``mx.symbol.MakeLoss`` is the gradient of the loss with respect to the input data.
Thus, we cannot simply call ``predict`` on ``model2``.
Instead, here's how to get predictions from this trained model:
```R
# pred.wrong <- predict(model2, test.x) # This would produce INVALID predictions.
internals = internals(model2$symbol)
fc_symbol = internals[[match("fc2_output", outputs(internals))]]
model.huber <- list(symbol = fc_symbol,
arg.params = model2$arg.params,
aux.params = model2$aux.params)
class(model.huber) <- "MXFeedForwardModel"
pred.huber <- predict(model.huber, test.x)
losses.test <- apply(matrix(c(pred.huber[1,],test.y),nrow=2,byrow=T), MARGIN=2,
function(x,delta=1) (sqrt(1 + ((x[1]-x[2])/delta)^2) - 1) * delta^2
) # since we don't need gradients, losses are computed via R arrays rather than Symbol objects
sprintf("Pseudo-Huber Loss on test data: %f", mean(losses.test))
```