:py:mod:`pycmtensor.optimizers` =============================== .. py:module:: pycmtensor.optimizers .. autoapi-nested-parse:: PyCMTensor optimizers module Module Contents --------------- .. py:class:: Optimizer(params, name, b1=0.0, b2=0.0, m=0.0, rho=0.0, epsilon=1e-08) Base optimizer class :param params: a list of :class:`expressions.TensorVariable` type objects. Used for constructing optimizer parameters. :type params: list .. py:class:: Adam(params: list, b1=0.9, b2=0.999, **kwargs) Bases: :py:obj:`Optimizer` An optimizer that implments the Adam algorithm [#]_ :param params: a list of :class:`Betas` and/or :class:`Weights` :type params: list :param b1: exponential decay rate for the 1st moment estimates. Defaults to ``0.9`` :type b1: float, optional :param b2: exponential decay rate for the 2nd moment estimates. Defaults to ``0.999`` :type b2: float, optional .. [#] Kingma et al., 2014. Adam: A Method for Stochastic Optimization. http://arxiv.org/abs/1412.6980 .. py:method:: update(cost, params: list, lr=0.001) Caller to the optimizer class to generate a list of updates :param cost: a scalar element for the expression of the cost function where the derivatives are calculated :type cost: TensorVariable :param params: a list of :class:`Betas` and/or :class:`Weights` :type params: list :param lr: learning rate. Defaults to 0.001 :type lr: float, optional :returns: a list of tuples of ``(p, p_t), (m, m_t), (v, v_t), (t, t_new)`` :rtype: list .. py:class:: Adamax(params: list, b1=0.9, b2=0.999, **kwargs) Bases: :py:obj:`Optimizer` An optimizer that implements the Adamax algorithm [#]_. It is a variant of the Adam algorithm :param params: a list of :class:`Betas` and/or :class:`Weights` :type params: list :param b1: exponential decay rate for the 1st moment estimates. Defaults to ``0.9`` :type b1: float, optional :param b2: exponential decay rate for the 2nd moment estimates. Defaults to ``0.999`` :type b2: float, optional .. [#] Kingma et al., 2014. Adam: A Method for Stochastic Optimization. http://arxiv.org/abs/1412.6980 .. py:method:: update(cost, params: list, lr=0.001) Caller to the optimizer class to generate a list of updates :param cost: a scalar element for the expression of the cost function where the derivatives are calculated :type cost: TensorVariable :param params: a list of :class:`Betas` and/or :class:`Weights` :type params: list :param lr: learning rate. Defaults to ``0.001`` :type lr: float, optional :returns: a list of tuples of ``(p, p_t), (m, m_t), (v, v_t), (t, t_new)`` :rtype: list .. py:class:: Adadelta(params: list, rho=0.95, **kwargs) Bases: :py:obj:`Optimizer` An optimizer that implements the Adadelta algorithm [#]_ Adadelta is a stochastic gradient descent method that is based on adaptive learning rate per dimension to address two drawbacks: - The continual decay of learning rates throughout training - The need for a manually selected global learning rate :param params: a list of :class:`Betas` and/or :class:`Weights` :type params: list :param rho: the decay rate for learning rate. Defaults to ``0.95`` :type rho: float, optional .. [#] Zeiler, 2012. ADADELTA: An Adaptive Learning Rate Method. http://arxiv.org/abs/1212.5701 .. py:method:: update(cost, params: list, lr=1.0) Caller to the optimizer class to generate a list of updates :param cost: a scalar element for the expression of the cost function where the derivatives are calculated :type cost: TensorVariable :param params: a list of :class:`Betas` and/or :class:`Weights` :type params: list :param lr: learning rate. Defaults to ``1.0`` :type lr: float, optional :returns: a list of tuples of ``(param, param_new), (a, a_t), (d, d_t)`` :rtype: list .. Note:: Since the Adadelta algorithm uses an adaptive learning rate, the learning rate is set to ``1.0`` .. py:class:: RMSProp(params: list, rho=0.9, **kwargs) Bases: :py:obj:`Optimizer` An optimizer that implements the RMSprop algorithm [#]_ :param params: a list of :class:`Betas` and/or :class:`Weights` :type params: list :param rho: discounting factor for the history/coming gradient. Defaults to ``0.9`` :type rho: float, optional .. [#] Hinton, 2012. rmsprop: Divide the gradient by a running average of its recent magnitude. http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf .. py:method:: update(cost, params: list, lr=0.001) Caller to the optimizer class to generate a list of updates :param cost: a scalar element for the expression of the cost function where the derivatives are calculated :type cost: TensorVariable :param params: a list of :class:`Betas` and/or :class:`Weights` :type params: list :param lr: learning rate. Defaults to ``0.001`` :type lr: float, optional :returns: a list of tuples of ``(param, param_new), (a, a_t)`` :rtype: list .. py:class:: Momentum(params: list, momentum=0.9, nesterov=True, **kwargs) Bases: :py:obj:`Optimizer` An optimizer that implements the Momentum algorithm [#]_ :param params: a list of :class:`Betas` and/or :class:`Weights` :type params: list :param momentum: acceleration factor in the relevant direction and dampens oscillations. Defaults to ``0.9`` :type momentum: float, optional :param nesterov: whether to apply Nesterov momentum. Defaults to ``False`` :type nesterov: bool, optional .. [#] Sutskever et al., 2013. On the importance of initialization and momentum in deep learning. http://jmlr.org/proceedings/papers/v28/sutskever13.pdf .. py:method:: update(cost, params: list, lr=0.001) Caller to the optimizer class to generate a list of updates :param cost: a scalar element for the expression of the cost function where the derivatives are calculated :type cost: TensorVariable :param params: a list of :class:`Betas` and/or :class:`Weights` :type params: list :param lr: the learning rate. Defaults to ``0.001`` :type lr: float, optional :returns: a list of tuples of ``(param, param_new), (v, v_t)`` :rtype: list .. py:class:: AdaGrad(params: list, **kwargs) Bases: :py:obj:`Optimizer` An optimizer that implements the Adagrad algorithm [#]_ Adagrad is an optimizer with parameter-specific learning rates, which are adapted relative to how frequently a parameter gets updated during training. The more updates a parameter receives, the smaller the updates. :param params: a list of :class:`Betas` and/or :class:`Weights` :type params: list .. [#] Duchi et al., 2011. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. https://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf .. py:method:: update(cost, params: list, lr=1.0) Caller to the optimizer class to generate a list of updates :param cost: a scalar element for the expression of the cost function where the derivatives are calculated :type cost: TensorVariable :param params: a list of :class:`Betas` and/or :class:`Weights` :type params: list :param lr: the learning rate. Defaults to ``1.0`` :type lr: float, optional :returns: a list of tuples of ``(param, param_new), (a, a_t)`` :rtype: list .. py:class:: SGD(params: list, **kwargs) Bases: :py:obj:`Optimizer` An optimizer that implements the stochastic gradient algorithm :param params: a list of :class:`Betas` and/or :class:`Weights` :type params: list .. py:method:: update(cost, params: list, lr=0.001) Caller to the optimizer class to generate a list of updates :param cost: a scalar element for the expression of the cost function where the derivatives are calculated :type cost: TensorVariable :param params: a list of :class:`Betas` and/or :class:`Weights` :type params: list :param lr: the learning rate. Defaults to ``0.001`` :type lr: float, optional :returns: a list of tuples of ``(param, param_new)`` :rtype: list