pycmtensor.optimizers#
PyCMTensor optimizers module
Module Contents#
- class pycmtensor.optimizers.Adam(params: list, b1: float = 0.9, b2: float = 0.999, **kwargs)[source]#
Bases:
OptimizerAn optimizer that implments the Adam algorithm [1]
- Parameters:
params (list) – a list of
TensorSharedVariableb1 (float, optional) – exponential decay rate for the 1st moment estimates. Defaults to
0.9b2 (float, optional) – exponential decay rate for the 2nd moment estimates. Defaults to
0.999
- update(cost, params: list, lr: float = 0.001)[source]#
Generate a list of updates
- Parameters:
cost (TensorVariable) – a scalar element for the expression of the cost function where the derivatives are calculated
params (list) – a list of
TensorSharedVariablelr (float, optional) – learning rate. Defaults to 0.001
- Returns:
a list of tuples of
(p, p_t), (m, m_t), (v, v_t), (t, t_new)- Return type:
list
- class pycmtensor.optimizers.Nadam(params: list, b1: float = 0.99, b2: float = 0.999, **kwargs)[source]#
Bases:
AdamAn optimizer that implements the Nesterov Adam algorithm [2]
- Parameters:
params (list) – a list of
TensorSharedVariableb1 (float, optional) – exponential decay rate for the 1st moment estimates. Defaults to
0.9b2 (float, optional) – exponential decay rate for the 2nd moment estimates. Defaults to
0.999
[2] Dozat, T., 2016. Incorporating nesterov momentum into adam.(2016). Dostupné z: http://cs229.stanford.edu/proj2015/054_report.pdf.
- update(cost, params: list, lr: float = 0.001)[source]#
Generate a list of updates
- Parameters:
cost (TensorVariable) – a scalar element for the expression of the cost function where the derivatives are calculated
params (list) – a list of
TensorSharedVariablelr (float, optional) – learning rate. Defaults to 0.001
- Returns:
a list of tuples of
(p, p_t), (m, m_t), (v, v_t), (t, t_new)- Return type:
list
- class pycmtensor.optimizers.Adamax(params: list, b1: float = 0.9, b2: float = 0.999, **kwargs)[source]#
Bases:
AdamAn optimizer that implements the Adamax algorithm [3]. It is a variant of the Adam algorithm
- Parameters:
params (list) – a list of
TensorSharedVariableb1 (float, optional) – exponential decay rate for the 1st moment estimates. Defaults to
0.9b2 (float, optional) – exponential decay rate for the 2nd moment estimates. Defaults to
0.999
[3] Kingma et al., 2014. Adam: A Method for Stochastic Optimization. http://arxiv.org/abs/1412.6980
- update(cost, params: list, lr: float = 0.001)[source]#
Caller to the optimizer class to generate a list of updates
- Parameters:
cost (TensorVariable) – a scalar element for the expression of the cost function where the derivatives are calculated
params (list) – a list of
TensorSharedVariablelr (float, optional) – learning rate. Defaults to
0.001
- Returns:
a list of tuples of
(p, p_t), (m, m_t), (v, v_t), (t, t_new)- Return type:
list
- class pycmtensor.optimizers.Adadelta(params: list, rho: float = 0.95, **kwargs)[source]#
Bases:
OptimizerAn optimizer that implements the Adadelta algorithm [4]
Adadelta is a stochastic gradient descent method that is based on adaptive learning rate per dimension to address two drawbacks:
The continual decay of learning rates throughout training
The need for a manually selected global learning rate
- Parameters:
params (list) – a list of
TensorSharedVariablerho (float, optional) – the decay rate for learning rate. Defaults to
0.95
[4] Zeiler, 2012. ADADELTA: An Adaptive Learning Rate Method. http://arxiv.org/abs/1212.5701
- update(cost, params: list, lr: float = 1.0)[source]#
Caller to the optimizer class to generate a list of updates
- Parameters:
cost (TensorVariable) – a scalar element for the expression of the cost function where the derivatives are calculated
params (list) – a list of
TensorSharedVariablelr (float, optional) – learning rate. Defaults to
1.0
- Returns:
a list of tuples of
(param, param_new), (a, a_t), (d, d_t)- Return type:
list
Note
Since the Adadelta algorithm uses an adaptive learning rate, the learning rate is set to
1.0
- class pycmtensor.optimizers.RMSProp(params: list, rho: float = 0.9, **kwargs)[source]#
Bases:
OptimizerAn optimizer that implements the RMSprop algorithm [5]
- Parameters:
params (list) – a list of
TensorSharedVariablerho (float, optional) – discounting factor for the history/coming gradient. Defaults to
0.9
[5] Hinton, 2012. rmsprop: Divide the gradient by a running average of its recent magnitude. http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf
- update(cost, params: list, lr: float = 0.001)[source]#
Caller to the optimizer class to generate a list of updates
- Parameters:
cost (TensorVariable) – a scalar element for the expression of the cost function where the derivatives are calculated
params (list) – a list of
TensorSharedVariablelr (float, optional) – learning rate. Defaults to
0.001
- Returns:
a list of tuples of
(param, param_new), (a, a_t)- Return type:
list
- class pycmtensor.optimizers.Momentum(params: list, mu: float = 0.9, **kwargs)[source]#
Bases:
OptimizerAn optimizer that implements the Momentum algorithm [6]
- Parameters:
params (list) – a list of
TensorSharedVariablemu (float, optional) – acceleration factor in the relevant direction and dampens oscillations. Defaults to
0.9
[6] Sutskever et al., 2013. On the importance of initialization and momentum in deep learning. http://jmlr.org/proceedings/papers/v28/sutskever13.pdf
- update(cost, params: list, lr: float = 0.001)[source]#
Caller to the optimizer class to generate a list of updates
- Parameters:
cost (TensorVariable) – a scalar element for the expression of the cost function where the derivatives are calculated
params (list) – a list of
TensorSharedVariablelr (float, optional) – the learning rate. Defaults to
0.001
- Returns:
a list of tuples of
(param, param_new), (v, v_t)- Return type:
list
- class pycmtensor.optimizers.NAG(params: list, mu: float = 0.99, **kwargs)[source]#
Bases:
MomentumAn optimizer that implements the Nestrov Accelerated Gradient algorithm [7]
- Parameters:
params (list) – a list of
TensorSharedVariablemu (float, optional) – acceleration factor in the relevant direction and dampens oscillations. Defaults to
0.9
[7] Sutskever et al., 2013. On the importance of initialization and momentum in deep learning. http://jmlr.org/proceedings/papers/v28/sutskever13.pdf
- update(cost, params: list, lr: float = 0.001)[source]#
Caller to the optimizer class to generate a list of updates
- Parameters:
cost (TensorVariable) – a scalar element for the expression of the cost function where the derivatives are calculated
params (list) – a list of
TensorSharedVariablelr (float, optional) – the learning rate. Defaults to
0.001
- Returns:
a list of tuples of
(param, param_new), (v, v_t)- Return type:
list
- class pycmtensor.optimizers.SGD(params: list, **kwargs)[source]#
Bases:
OptimizerAn optimizer that implements the stochastic gradient algorithm
- Parameters:
params (list) – a list of
TensorSharedVariable
- update(cost, params: list, lr: float = 0.001)[source]#
Caller to the optimizer class to generate a list of updates
- Parameters:
cost (TensorVariable) – a scalar element for the expression of the cost function where the derivatives are calculated
params (list) – a list of
TensorSharedVariablelr (float, optional) – the learning rate. Defaults to
0.001
- Returns:
a list of
(param, param_new)tuples- Return type:
list