pycmtensor.optimizers#

PyCMTensor optimizers module

Module Contents#

class pycmtensor.optimizers.Optimizer(params, name, b1=0.0, b2=0.0, m=0.0, rho=0.0, epsilon=1e-08)[source]#

Base optimizer class

Parameters

params (list) – a list of expressions.TensorVariable type objects. Used for constructing optimizer parameters.

class pycmtensor.optimizers.Adam(params: list, b1=0.9, b2=0.999, **kwargs)[source]#

Bases: Optimizer

An optimizer that implments the Adam algorithm 1

Parameters
  • params (list) – a list of Betas and/or Weights

  • b1 (float, optional) – exponential decay rate for the 1st moment estimates. Defaults to 0.9

  • b2 (float, optional) – exponential decay rate for the 2nd moment estimates. Defaults to 0.999

1

Kingma et al., 2014. Adam: A Method for Stochastic Optimization. http://arxiv.org/abs/1412.6980

update(cost, params: list, lr=0.001)[source]#

Caller to the optimizer class to generate a list of updates

Parameters
  • cost (TensorVariable) – a scalar element for the expression of the cost function where the derivatives are calculated

  • params (list) – a list of Betas and/or Weights

  • lr (float, optional) – learning rate. Defaults to 0.001

Returns

a list of tuples of (p, p_t), (m, m_t), (v, v_t), (t, t_new)

Return type

list

class pycmtensor.optimizers.Adamax(params: list, b1=0.9, b2=0.999, **kwargs)[source]#

Bases: Optimizer

An optimizer that implements the Adamax algorithm 2. It is a variant of the Adam algorithm

Parameters
  • params (list) – a list of Betas and/or Weights

  • b1 (float, optional) – exponential decay rate for the 1st moment estimates. Defaults to 0.9

  • b2 (float, optional) – exponential decay rate for the 2nd moment estimates. Defaults to 0.999

2

Kingma et al., 2014. Adam: A Method for Stochastic Optimization. http://arxiv.org/abs/1412.6980

update(cost, params: list, lr=0.001)[source]#

Caller to the optimizer class to generate a list of updates

Parameters
  • cost (TensorVariable) – a scalar element for the expression of the cost function where the derivatives are calculated

  • params (list) – a list of Betas and/or Weights

  • lr (float, optional) – learning rate. Defaults to 0.001

Returns

a list of tuples of (p, p_t), (m, m_t), (v, v_t), (t, t_new)

Return type

list

class pycmtensor.optimizers.Adadelta(params: list, rho=0.95, **kwargs)[source]#

Bases: Optimizer

An optimizer that implements the Adadelta algorithm 3

Adadelta is a stochastic gradient descent method that is based on adaptive learning rate per dimension to address two drawbacks:

  • The continual decay of learning rates throughout training

  • The need for a manually selected global learning rate

Parameters
  • params (list) – a list of Betas and/or Weights

  • rho (float, optional) – the decay rate for learning rate. Defaults to 0.95

3

Zeiler, 2012. ADADELTA: An Adaptive Learning Rate Method. http://arxiv.org/abs/1212.5701

update(cost, params: list, lr=1.0)[source]#

Caller to the optimizer class to generate a list of updates

Parameters
  • cost (TensorVariable) – a scalar element for the expression of the cost function where the derivatives are calculated

  • params (list) – a list of Betas and/or Weights

  • lr (float, optional) – learning rate. Defaults to 1.0

Returns

a list of tuples of (param, param_new), (a, a_t), (d, d_t)

Return type

list

Note

Since the Adadelta algorithm uses an adaptive learning rate, the learning rate is set to 1.0

class pycmtensor.optimizers.RMSProp(params: list, rho=0.9, **kwargs)[source]#

Bases: Optimizer

An optimizer that implements the RMSprop algorithm 4

Parameters
  • params (list) – a list of Betas and/or Weights

  • rho (float, optional) – discounting factor for the history/coming gradient. Defaults to 0.9

4

Hinton, 2012. rmsprop: Divide the gradient by a running average of its recent magnitude. http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf

update(cost, params: list, lr=0.001)[source]#

Caller to the optimizer class to generate a list of updates

Parameters
  • cost (TensorVariable) – a scalar element for the expression of the cost function where the derivatives are calculated

  • params (list) – a list of Betas and/or Weights

  • lr (float, optional) – learning rate. Defaults to 0.001

Returns

a list of tuples of (param, param_new), (a, a_t)

Return type

list

class pycmtensor.optimizers.Momentum(params: list, momentum=0.9, nesterov=True, **kwargs)[source]#

Bases: Optimizer

An optimizer that implements the Momentum algorithm 5

Parameters
  • params (list) – a list of Betas and/or Weights

  • momentum (float, optional) – acceleration factor in the relevant direction and dampens oscillations. Defaults to 0.9

  • nesterov (bool, optional) – whether to apply Nesterov momentum. Defaults to False

5

Sutskever et al., 2013. On the importance of initialization and momentum in deep learning. http://jmlr.org/proceedings/papers/v28/sutskever13.pdf

update(cost, params: list, lr=0.001)[source]#

Caller to the optimizer class to generate a list of updates

Parameters
  • cost (TensorVariable) – a scalar element for the expression of the cost function where the derivatives are calculated

  • params (list) – a list of Betas and/or Weights

  • lr (float, optional) – the learning rate. Defaults to 0.001

Returns

a list of tuples of (param, param_new), (v, v_t)

Return type

list

class pycmtensor.optimizers.AdaGrad(params: list, **kwargs)[source]#

Bases: Optimizer

An optimizer that implements the Adagrad algorithm 6

Adagrad is an optimizer with parameter-specific learning rates, which are adapted relative to how frequently a parameter gets updated during training. The more updates a parameter receives, the smaller the updates.

Parameters

params (list) – a list of Betas and/or Weights

6

Duchi et al., 2011. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. https://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf

update(cost, params: list, lr=1.0)[source]#

Caller to the optimizer class to generate a list of updates

Parameters
  • cost (TensorVariable) – a scalar element for the expression of the cost function where the derivatives are calculated

  • params (list) – a list of Betas and/or Weights

  • lr (float, optional) – the learning rate. Defaults to 1.0

Returns

a list of tuples of (param, param_new), (a, a_t)

Return type

list

class pycmtensor.optimizers.SGD(params: list, **kwargs)[source]#

Bases: Optimizer

An optimizer that implements the stochastic gradient algorithm

Parameters

params (list) – a list of Betas and/or Weights

update(cost, params: list, lr=0.001)[source]#

Caller to the optimizer class to generate a list of updates

Parameters
  • cost (TensorVariable) – a scalar element for the expression of the cost function where the derivatives are calculated

  • params (list) – a list of Betas and/or Weights

  • lr (float, optional) – the learning rate. Defaults to 0.001

Returns

a list of tuples of (param, param_new)

Return type

list