:py:mod:`pycmtensor.optimizers`
===============================

.. py:module:: pycmtensor.optimizers

.. autoapi-nested-parse::

   PyCMTensor optimizers module


Module Contents
---------------

.. py:class:: Adam(params: list, b1: float = 0.9, b2: float = 0.999, **kwargs)


   Bases: :py:obj:`Optimizer`

   
   An optimizer that implments the Adam algorithm [#]_

   :param params: a list of ``TensorSharedVariable``
   :type params: list
   :param b1: exponential decay rate for the 1st moment estimates.
              Defaults to ``0.9``
   :type b1: float, optional
   :param b2: exponential decay rate for the 2nd moment estimates.
              Defaults to ``0.999``
   :type b2: float, optional

   .. [#] Kingma et al., 2014. Adam: A Method for Stochastic Optimization. http://arxiv.org/abs/1412.6980

   .. py:property:: t


   .. py:property:: m_prev


   .. py:property:: v_prev


   .. py:method:: update(cost, params: list, lr: float = 0.001)

      Generate a list of updates

      :param cost: a scalar element for the expression of the cost
                   function where the derivatives are calculated
      :type cost: TensorVariable
      :param params: a list of ``TensorSharedVariable``
      :type params: list
      :param lr: learning rate. Defaults to 0.001
      :type lr: float, optional

      :returns: a list of tuples of ``(p, p_t), (m, m_t), (v, v_t), (t, t_new)``
      :rtype: list


.. py:class:: Nadam(params: list, b1: float = 0.99, b2: float = 0.999, **kwargs)


   Bases: :py:obj:`Adam`

   
   An optimizer that implements the Nesterov Adam algorithm [#]_

   :param params: a list of ``TensorSharedVariable``
   :type params: list
   :param b1: exponential decay rate for the 1st moment estimates.
              Defaults to ``0.9``
   :type b1: float, optional
   :param b2: exponential decay rate for the 2nd moment estimates.
              Defaults to ``0.999``
   :type b2: float, optional

   .. [#] Dozat, T., 2016. Incorporating nesterov momentum into adam.(2016). Dostupné z: http://cs229.stanford.edu/proj2015/054_report.pdf.

   .. py:method:: update(cost, params: list, lr: float = 0.001)

      Generate a list of updates

      :param cost: a scalar element for the expression of the cost
                   function where the derivatives are calculated
      :type cost: TensorVariable
      :param params: a list of ``TensorSharedVariable``
      :type params: list
      :param lr: learning rate. Defaults to 0.001
      :type lr: float, optional

      :returns: a list of tuples of ``(p, p_t), (m, m_t), (v, v_t), (t, t_new)``
      :rtype: list


.. py:class:: Adamax(params: list, b1: float = 0.9, b2: float = 0.999, **kwargs)


   Bases: :py:obj:`Adam`

   
   An optimizer that implements the Adamax algorithm [#]_. It is a variant of
   the Adam algorithm

   :param params: a list of ``TensorSharedVariable``
   :type params: list
   :param b1: exponential decay rate for the 1st moment estimates.
              Defaults to ``0.9``
   :type b1: float, optional
   :param b2: exponential decay rate for the 2nd moment estimates.
              Defaults to ``0.999``
   :type b2: float, optional

   .. [#] Kingma et al., 2014. Adam: A Method for Stochastic Optimization. http://arxiv.org/abs/1412.6980

   .. py:method:: update(cost, params: list, lr: float = 0.001)

      Caller to the optimizer class to generate a list of updates

      :param cost: a scalar element for the expression of the cost function where the derivatives are calculated
      :type cost: TensorVariable
      :param params: a list of ``TensorSharedVariable``
      :type params: list
      :param lr: learning rate. Defaults to ``0.001``
      :type lr: float, optional

      :returns: a list of tuples of ``(p, p_t), (m, m_t), (v, v_t), (t, t_new)``
      :rtype: list


.. py:class:: Adadelta(params: list, rho: float = 0.95, **kwargs)


   Bases: :py:obj:`Optimizer`

   
   An optimizer that implements the Adadelta algorithm [#]_

   Adadelta is a stochastic gradient descent method that is based on adaptive
   learning rate per dimension to address two drawbacks:

   - The continual decay of learning rates throughout training
   - The need for a manually selected global learning rate

   :param params: a list of ``TensorSharedVariable``
   :type params: list
   :param rho: the decay rate for learning rate.
               Defaults to ``0.95``
   :type rho: float, optional

   .. [#] Zeiler, 2012. ADADELTA: An Adaptive Learning Rate Method. http://arxiv.org/abs/1212.5701

   .. py:property:: accumulator


   .. py:property:: delta


   .. py:method:: update(cost, params: list, lr: float = 1.0)

      Caller to the optimizer class to generate a list of updates

      :param cost: a scalar element for the expression of the cost function where the derivatives are calculated
      :type cost: TensorVariable
      :param params: a list of ``TensorSharedVariable``
      :type params: list
      :param lr: learning rate. Defaults to ``1.0``
      :type lr: float, optional

      :returns: a list of tuples of ``(param, param_new), (a, a_t), (d, d_t)``
      :rtype: list

      .. Note::

          Since the Adadelta algorithm uses an adaptive learning rate, the
          learning rate is set to ``1.0``


.. py:class:: RMSProp(params: list, rho: float = 0.9, **kwargs)


   Bases: :py:obj:`Optimizer`

   
   An optimizer that implements the RMSprop algorithm [#]_

   :param params: a list of ``TensorSharedVariable``
   :type params: list
   :param rho: discounting factor for the history/coming gradient.
               Defaults to ``0.9``
   :type rho: float, optional

   .. [#] Hinton, 2012. rmsprop: Divide the gradient by a running average of its recent magnitude. http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf

   .. py:property:: accumulator


   .. py:method:: update(cost, params: list, lr: float = 0.001)

      Caller to the optimizer class to generate a list of updates

      :param cost: a scalar element for the expression of the cost function where the derivatives are calculated
      :type cost: TensorVariable
      :param params: a list of ``TensorSharedVariable``
      :type params: list
      :param lr: learning rate. Defaults to ``0.001``
      :type lr: float, optional

      :returns: a list of tuples of ``(param, param_new), (a, a_t)``
      :rtype: list


.. py:class:: Momentum(params: list, mu: float = 0.9, **kwargs)


   Bases: :py:obj:`Optimizer`

   
   An optimizer that implements the Momentum algorithm [#]_

   :param params: a list of ``TensorSharedVariable``
   :type params: list
   :param mu: acceleration factor in the relevant direction
              and dampens oscillations. Defaults to ``0.9``
   :type mu: float, optional

   .. [#] Sutskever et al., 2013. On the importance of initialization and momentum in deep learning. http://jmlr.org/proceedings/papers/v28/sutskever13.pdf

   .. py:property:: velocity


   .. py:method:: update(cost, params: list, lr: float = 0.001)

      Caller to the optimizer class to generate a list of updates

      :param cost: a scalar element for the expression of the cost function where the derivatives are calculated
      :type cost: TensorVariable
      :param params: a list of ``TensorSharedVariable``
      :type params: list
      :param lr: the learning rate. Defaults to ``0.001``
      :type lr: float, optional

      :returns: a list of tuples of ``(param, param_new), (v, v_t)``
      :rtype: list


.. py:class:: NAG(params: list, mu: float = 0.99, **kwargs)


   Bases: :py:obj:`Momentum`

   
   An optimizer that implements the Nestrov Accelerated Gradient algorithm [#]_

   :param params: a list of ``TensorSharedVariable``
   :type params: list
   :param mu: acceleration factor in the relevant direction
              and dampens oscillations. Defaults to ``0.9``
   :type mu: float, optional

   .. [#] Sutskever et al., 2013. On the importance of initialization and momentum in deep learning. http://jmlr.org/proceedings/papers/v28/sutskever13.pdf

   .. py:property:: t


   .. py:method:: update(cost, params: list, lr: float = 0.001)

      Caller to the optimizer class to generate a list of updates

      :param cost: a scalar element for the expression of the cost function where the derivatives are calculated
      :type cost: TensorVariable
      :param params: a list of ``TensorSharedVariable``
      :type params: list
      :param lr: the learning rate. Defaults to ``0.001``
      :type lr: float, optional

      :returns: a list of tuples of ``(param, param_new), (v, v_t)``
      :rtype: list


.. py:class:: SGD(params: list, **kwargs)


   Bases: :py:obj:`Optimizer`

   
   An optimizer that implements the stochastic gradient algorithm

   :param params: a list of ``TensorSharedVariable``
   :type params: list

   .. py:method:: update(cost, params: list, lr: float = 0.001)

      Caller to the optimizer class to generate a list of updates

      :param cost: a scalar element for the expression of the cost function where the derivatives are calculated
      :type cost: TensorVariable
      :param params: a list of ``TensorSharedVariable``
      :type params: list
      :param lr: the learning rate. Defaults to ``0.001``
      :type lr: float, optional

      :returns: a list of ``(param, param_new)`` tuples
      :rtype: list