:py:mod:`pycmtensor.optimizers`
===============================

.. py:module:: pycmtensor.optimizers

.. autoapi-nested-parse::

   PyCMTensor optimizers module


Module Contents
---------------

.. py:class:: Optimizer(params, name, b1=0.0, b2=0.0, m=0.0, rho=0.0, epsilon=1e-08)

   
   Base optimizer class

   :param params: a list of :class:`expressions.TensorVariable` type objects.
                  Used for constructing optimizer parameters.
   :type params: list


.. py:class:: Adam(params: list, b1=0.9, b2=0.999, **kwargs)

   Bases: :py:obj:`Optimizer`

   
   An optimizer that implments the Adam algorithm [#]_

   :param params: a list of :class:`Betas` and/or :class:`Weights`
   :type params: list
   :param b1: exponential decay rate for the 1st moment estimates.
              Defaults to ``0.9``
   :type b1: float, optional
   :param b2: exponential decay rate for the 2nd moment estimates.
              Defaults to ``0.999``
   :type b2: float, optional

   .. [#] Kingma et al., 2014. Adam: A Method for Stochastic Optimization. http://arxiv.org/abs/1412.6980

   .. py:method:: update(cost, params: list, lr=0.001)

      Caller to the optimizer class to generate a list of updates

      :param cost: a scalar element for the expression of the cost
                   function where the derivatives are calculated
      :type cost: TensorVariable
      :param params: a list of :class:`Betas` and/or :class:`Weights`
      :type params: list
      :param lr: learning rate. Defaults to 0.001
      :type lr: float, optional

      :returns: a list of tuples of ``(p, p_t), (m, m_t), (v, v_t), (t, t_new)``
      :rtype: list


.. py:class:: Adamax(params: list, b1=0.9, b2=0.999, **kwargs)

   Bases: :py:obj:`Optimizer`

   
   An optimizer that implements the Adamax algorithm [#]_. It is a variant of
   the Adam algorithm

   :param params: a list of :class:`Betas` and/or :class:`Weights`
   :type params: list
   :param b1: exponential decay rate for the 1st moment estimates.
              Defaults to ``0.9``
   :type b1: float, optional
   :param b2: exponential decay rate for the 2nd moment estimates.
              Defaults to ``0.999``
   :type b2: float, optional

   .. [#] Kingma et al., 2014. Adam: A Method for Stochastic Optimization. http://arxiv.org/abs/1412.6980

   .. py:method:: update(cost, params: list, lr=0.001)

      Caller to the optimizer class to generate a list of updates

      :param cost: a scalar element for the expression of the cost function where the derivatives are calculated
      :type cost: TensorVariable
      :param params: a list of :class:`Betas` and/or :class:`Weights`
      :type params: list
      :param lr: learning rate. Defaults to ``0.001``
      :type lr: float, optional

      :returns: a list of tuples of ``(p, p_t), (m, m_t), (v, v_t), (t, t_new)``
      :rtype: list


.. py:class:: Adadelta(params: list, rho=0.95, **kwargs)

   Bases: :py:obj:`Optimizer`

   
   An optimizer that implements the Adadelta algorithm [#]_

   Adadelta is a stochastic gradient descent method that is based on adaptive
   learning rate per dimension to address two drawbacks:

   - The continual decay of learning rates throughout training
   - The need for a manually selected global learning rate

   :param params: a list of :class:`Betas` and/or :class:`Weights`
   :type params: list
   :param rho: the decay rate for learning rate.
               Defaults to ``0.95``
   :type rho: float, optional

   .. [#] Zeiler, 2012. ADADELTA: An Adaptive Learning Rate Method. http://arxiv.org/abs/1212.5701

   .. py:method:: update(cost, params: list, lr=1.0)

      Caller to the optimizer class to generate a list of updates

      :param cost: a scalar element for the expression of the cost function where the derivatives are calculated
      :type cost: TensorVariable
      :param params: a list of :class:`Betas` and/or :class:`Weights`
      :type params: list
      :param lr: learning rate. Defaults to ``1.0``
      :type lr: float, optional

      :returns: a list of tuples of ``(param, param_new), (a, a_t), (d, d_t)``
      :rtype: list

      .. Note::

          Since the Adadelta algorithm uses an adaptive learning rate, the
          learning rate is set to ``1.0``


.. py:class:: RMSProp(params: list, rho=0.9, **kwargs)

   Bases: :py:obj:`Optimizer`

   
   An optimizer that implements the RMSprop algorithm [#]_

   :param params: a list of :class:`Betas` and/or :class:`Weights`
   :type params: list
   :param rho: discounting factor for the history/coming gradient.
               Defaults to ``0.9``
   :type rho: float, optional

   .. [#] Hinton, 2012. rmsprop: Divide the gradient by a running average of its recent magnitude. http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf

   .. py:method:: update(cost, params: list, lr=0.001)

      Caller to the optimizer class to generate a list of updates

      :param cost: a scalar element for the expression of the cost function where the derivatives are calculated
      :type cost: TensorVariable
      :param params: a list of :class:`Betas` and/or :class:`Weights`
      :type params: list
      :param lr: learning rate. Defaults to ``0.001``
      :type lr: float, optional

      :returns: a list of tuples of ``(param, param_new), (a, a_t)``
      :rtype: list


.. py:class:: Momentum(params: list, momentum=0.9, nesterov=True, **kwargs)

   Bases: :py:obj:`Optimizer`

   
   An optimizer that implements the Momentum algorithm [#]_

   :param params: a list of :class:`Betas` and/or :class:`Weights`
   :type params: list
   :param momentum: acceleration factor in the relevant direction
                    and dampens oscillations. Defaults to ``0.9``
   :type momentum: float, optional
   :param nesterov: whether to apply Nesterov momentum.
                    Defaults to ``False``
   :type nesterov: bool, optional

   .. [#] Sutskever et al., 2013. On the importance of initialization and momentum in deep learning. http://jmlr.org/proceedings/papers/v28/sutskever13.pdf

   .. py:method:: update(cost, params: list, lr=0.001)

      Caller to the optimizer class to generate a list of updates

      :param cost: a scalar element for the expression of the cost function where the derivatives are calculated
      :type cost: TensorVariable
      :param params: a list of :class:`Betas` and/or :class:`Weights`
      :type params: list
      :param lr: the learning rate. Defaults to ``0.001``
      :type lr: float, optional

      :returns: a list of tuples of ``(param, param_new), (v, v_t)``
      :rtype: list


.. py:class:: AdaGrad(params: list, **kwargs)

   Bases: :py:obj:`Optimizer`

   
   An optimizer that implements the Adagrad algorithm [#]_

   Adagrad is an optimizer with parameter-specific learning rates, which are
   adapted relative to how frequently a parameter gets updated during training.
   The more updates a parameter receives, the smaller the updates.

   :param params: a list of :class:`Betas` and/or :class:`Weights`
   :type params: list

   .. [#] Duchi et al., 2011. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. https://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf

   .. py:method:: update(cost, params: list, lr=1.0)

      Caller to the optimizer class to generate a list of updates

      :param cost: a scalar element for the expression of the cost function where the derivatives are calculated
      :type cost: TensorVariable
      :param params: a list of :class:`Betas` and/or :class:`Weights`
      :type params: list
      :param lr: the learning rate. Defaults to ``1.0``
      :type lr: float, optional

      :returns: a list of tuples of ``(param, param_new), (a, a_t)``
      :rtype: list


.. py:class:: SGD(params: list, **kwargs)

   Bases: :py:obj:`Optimizer`

   
   An optimizer that implements the stochastic gradient algorithm

   :param params: a list of :class:`Betas` and/or :class:`Weights`
   :type params: list

   .. py:method:: update(cost, params: list, lr=0.001)

      Caller to the optimizer class to generate a list of updates

      :param cost: a scalar element for the expression of the cost function where the derivatives are calculated
      :type cost: TensorVariable
      :param params: a list of :class:`Betas` and/or :class:`Weights`
      :type params: list
      :param lr: the learning rate. Defaults to ``0.001``
      :type lr: float, optional

      :returns: a list of tuples of ``(param, param_new)``
      :rtype: list