mxnet.lr_scheduler

Scheduling learning rate.

Classes

CosineScheduler(max_update[, base_lr, …])

Reduce the learning rate according to a cosine function

FactorScheduler(step[, factor, …])

Reduce the learning rate by a factor for every n steps.

LRScheduler([base_lr, warmup_steps, …])

Base class of a learning rate scheduler.

MultiFactorScheduler(step[, factor, …])

Reduce the learning rate by given a list of steps.

PolyScheduler(max_update[, base_lr, pwr, …])

Reduce the learning rate according to a polynomial of given power.

class CosineScheduler(max_update, base_lr=0.01, final_lr=0, warmup_steps=0, warmup_begin_lr=0, warmup_mode='linear')[source]

Bases: mxnet.lr_scheduler.LRScheduler

Reduce the learning rate according to a cosine function

Calculate the new learning rate by:

final_lr + (start_lr - final_lr) * (1+cos(pi * nup/max_nup))/2
if nup < max_nup, 0 otherwise.
Parameters
  • max_update (int) – maximum number of updates before the decay reaches 0

  • base_lr (float) – base learning rate

  • final_lr (float) – final learning rate after all steps

  • warmup_steps (int) – number of warmup steps used before this scheduler starts decay

  • warmup_begin_lr (float) – if using warmup, the learning rate from which it starts warming up

  • warmup_mode (string) – warmup can be done in two modes. ‘linear’ mode gradually increases lr with each step in equal increments ‘constant’ mode keeps lr at warmup_begin_lr for warmup_steps

class FactorScheduler(step, factor=1, stop_factor_lr=1e-08, base_lr=0.01, warmup_steps=0, warmup_begin_lr=0, warmup_mode='linear')[source]

Bases: mxnet.lr_scheduler.LRScheduler

Reduce the learning rate by a factor for every n steps.

It returns a new learning rate by:

base_lr * pow(factor, floor(num_update/step))
Parameters
  • step (int) – Changes the learning rate for every n updates.

  • factor (float, optional) – The factor to change the learning rate.

  • stop_factor_lr (float, optional) – Stop updating the learning rate if it is less than this value.

class LRScheduler(base_lr=0.01, warmup_steps=0, warmup_begin_lr=0, warmup_mode='linear')[source]

Bases: object

Base class of a learning rate scheduler.

A scheduler returns a new learning rate based on the number of updates that have been performed.

Parameters
  • base_lr (float, optional) – The initial learning rate.

  • warmup_steps (int) – number of warmup steps used before this scheduler starts decay

  • warmup_begin_lr (float) – if using warmup, the learning rate from which it starts warming up

  • warmup_mode (string) – warmup can be done in two modes. ‘linear’ mode gradually increases lr with each step in equal increments ‘constant’ mode keeps lr at warmup_begin_lr for warmup_steps

class MultiFactorScheduler(step, factor=1, base_lr=0.01, warmup_steps=0, warmup_begin_lr=0, warmup_mode='linear')[source]

Bases: mxnet.lr_scheduler.LRScheduler

Reduce the learning rate by given a list of steps.

Assume there exists k such that:

step[k] <= num_update and num_update < step[k+1]

Then calculate the new learning rate by:

base_lr * pow(factor, k+1)
Parameters
  • step (list of int) – The list of steps to schedule a change

  • factor (float) – The factor to change the learning rate.

  • warmup_steps (int) – number of warmup steps used before this scheduler starts decay

  • warmup_begin_lr (float) – if using warmup, the learning rate from which it starts warming up

  • warmup_mode (string) – warmup can be done in two modes. ‘linear’ mode gradually increases lr with each step in equal increments ‘constant’ mode keeps lr at warmup_begin_lr for warmup_steps

class PolyScheduler(max_update, base_lr=0.01, pwr=2, final_lr=0, warmup_steps=0, warmup_begin_lr=0, warmup_mode='linear')[source]

Bases: mxnet.lr_scheduler.LRScheduler

Reduce the learning rate according to a polynomial of given power.

Calculate the new learning rate, after warmup if any, by:

final_lr + (start_lr - final_lr) * (1-nup/max_nup)^pwr
if nup < max_nup, 0 otherwise.
Parameters
  • max_update (int) – maximum number of updates before the decay reaches final learning rate.

  • base_lr (float) – base learning rate to start from

  • pwr (int) – power of the decay term as a function of the current number of updates.

  • final_lr (float) – final learning rate after all steps

  • warmup_steps (int) – number of warmup steps used before this scheduler starts decay

  • warmup_begin_lr (float) – if using warmup, the learning rate from which it starts warming up

  • warmup_mode (string) – warmup can be done in two modes. ‘linear’ mode gradually increases lr with each step in equal increments ‘constant’ mode keeps lr at warmup_begin_lr for warmup_steps