pycmtensor.optimizers
Contents
pycmtensor.optimizers#
PyCMTensor optimizers module
Module Contents#
- class pycmtensor.optimizers.Optimizer(params, name, b1=0.0, b2=0.0, m=0.0, rho=0.0, epsilon=1e-08)[source]#
Base optimizer class
- Parameters
params (list) – a list of
expressions.TensorVariabletype objects. Used for constructing optimizer parameters.
- class pycmtensor.optimizers.Adam(params: list, b1=0.9, b2=0.999, **kwargs)[source]#
Bases:
OptimizerAn optimizer that implments the Adam algorithm 1
- Parameters
params (list) – a list of
Betasand/orWeightsb1 (float, optional) – exponential decay rate for the 1st moment estimates. Defaults to
0.9b2 (float, optional) – exponential decay rate for the 2nd moment estimates. Defaults to
0.999
- 1
Kingma et al., 2014. Adam: A Method for Stochastic Optimization. http://arxiv.org/abs/1412.6980
- update(cost, params: list, lr=0.001)[source]#
Caller to the optimizer class to generate a list of updates
- Parameters
cost (TensorVariable) – a scalar element for the expression of the cost function where the derivatives are calculated
params (list) – a list of
Betasand/orWeightslr (float, optional) – learning rate. Defaults to 0.001
- Returns
a list of tuples of
(p, p_t), (m, m_t), (v, v_t), (t, t_new)- Return type
list
- class pycmtensor.optimizers.Adamax(params: list, b1=0.9, b2=0.999, **kwargs)[source]#
Bases:
OptimizerAn optimizer that implements the Adamax algorithm 2. It is a variant of the Adam algorithm
- Parameters
params (list) – a list of
Betasand/orWeightsb1 (float, optional) – exponential decay rate for the 1st moment estimates. Defaults to
0.9b2 (float, optional) – exponential decay rate for the 2nd moment estimates. Defaults to
0.999
- 2
Kingma et al., 2014. Adam: A Method for Stochastic Optimization. http://arxiv.org/abs/1412.6980
- update(cost, params: list, lr=0.001)[source]#
Caller to the optimizer class to generate a list of updates
- Parameters
cost (TensorVariable) – a scalar element for the expression of the cost function where the derivatives are calculated
params (list) – a list of
Betasand/orWeightslr (float, optional) – learning rate. Defaults to
0.001
- Returns
a list of tuples of
(p, p_t), (m, m_t), (v, v_t), (t, t_new)- Return type
list
- class pycmtensor.optimizers.Adadelta(params: list, rho=0.95, **kwargs)[source]#
Bases:
OptimizerAn optimizer that implements the Adadelta algorithm 3
Adadelta is a stochastic gradient descent method that is based on adaptive learning rate per dimension to address two drawbacks:
The continual decay of learning rates throughout training
The need for a manually selected global learning rate
- Parameters
params (list) – a list of
Betasand/orWeightsrho (float, optional) – the decay rate for learning rate. Defaults to
0.95
- 3
Zeiler, 2012. ADADELTA: An Adaptive Learning Rate Method. http://arxiv.org/abs/1212.5701
- update(cost, params: list, lr=1.0)[source]#
Caller to the optimizer class to generate a list of updates
- Parameters
cost (TensorVariable) – a scalar element for the expression of the cost function where the derivatives are calculated
params (list) – a list of
Betasand/orWeightslr (float, optional) – learning rate. Defaults to
1.0
- Returns
a list of tuples of
(param, param_new), (a, a_t), (d, d_t)- Return type
list
Note
Since the Adadelta algorithm uses an adaptive learning rate, the learning rate is set to
1.0
- class pycmtensor.optimizers.RMSProp(params: list, rho=0.9, **kwargs)[source]#
Bases:
OptimizerAn optimizer that implements the RMSprop algorithm 4
- Parameters
params (list) – a list of
Betasand/orWeightsrho (float, optional) – discounting factor for the history/coming gradient. Defaults to
0.9
- 4
Hinton, 2012. rmsprop: Divide the gradient by a running average of its recent magnitude. http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf
- update(cost, params: list, lr=0.001)[source]#
Caller to the optimizer class to generate a list of updates
- Parameters
cost (TensorVariable) – a scalar element for the expression of the cost function where the derivatives are calculated
params (list) – a list of
Betasand/orWeightslr (float, optional) – learning rate. Defaults to
0.001
- Returns
a list of tuples of
(param, param_new), (a, a_t)- Return type
list
- class pycmtensor.optimizers.Momentum(params: list, momentum=0.9, nesterov=True, **kwargs)[source]#
Bases:
OptimizerAn optimizer that implements the Momentum algorithm 5
- Parameters
params (list) – a list of
Betasand/orWeightsmomentum (float, optional) – acceleration factor in the relevant direction and dampens oscillations. Defaults to
0.9nesterov (bool, optional) – whether to apply Nesterov momentum. Defaults to
False
- 5
Sutskever et al., 2013. On the importance of initialization and momentum in deep learning. http://jmlr.org/proceedings/papers/v28/sutskever13.pdf
- update(cost, params: list, lr=0.001)[source]#
Caller to the optimizer class to generate a list of updates
- Parameters
cost (TensorVariable) – a scalar element for the expression of the cost function where the derivatives are calculated
params (list) – a list of
Betasand/orWeightslr (float, optional) – the learning rate. Defaults to
0.001
- Returns
a list of tuples of
(param, param_new), (v, v_t)- Return type
list
- class pycmtensor.optimizers.AdaGrad(params: list, **kwargs)[source]#
Bases:
OptimizerAn optimizer that implements the Adagrad algorithm 6
Adagrad is an optimizer with parameter-specific learning rates, which are adapted relative to how frequently a parameter gets updated during training. The more updates a parameter receives, the smaller the updates.
- Parameters
params (list) – a list of
Betasand/orWeights
- 6
Duchi et al., 2011. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. https://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf
- update(cost, params: list, lr=1.0)[source]#
Caller to the optimizer class to generate a list of updates
- Parameters
cost (TensorVariable) – a scalar element for the expression of the cost function where the derivatives are calculated
params (list) – a list of
Betasand/orWeightslr (float, optional) – the learning rate. Defaults to
1.0
- Returns
a list of tuples of
(param, param_new), (a, a_t)- Return type
list
- class pycmtensor.optimizers.SGD(params: list, **kwargs)[source]#
Bases:
OptimizerAn optimizer that implements the stochastic gradient algorithm
- Parameters
params (list) – a list of
Betasand/orWeights
- update(cost, params: list, lr=0.001)[source]#
Caller to the optimizer class to generate a list of updates
- Parameters
cost (TensorVariable) – a scalar element for the expression of the cost function where the derivatives are calculated
params (list) – a list of
Betasand/orWeightslr (float, optional) – the learning rate. Defaults to
0.001
- Returns
a list of tuples of
(param, param_new)- Return type
list