tensorflow optimizer（优化器学习小结）

发布时间：2024-03-04

阅读量：

字号：

根据官方文档，tf的optimizer类下有以下子类

: Optimizer that implements the Adadelta algorithm.
: Adagrad Dual Averaging algorithm for sparse linear models.
: Optimizer that implements the Adagrad algorithm.
: Optimizer that implements the Adam algorithm.
: Optimizer that implements the FTRL algorithm.
: Optimizer that implements the Momentum algorithm.
: Optimizer that implements the gradient descent algorithm.
: Optimizer that implements the Proximal Adagrad algorithm.
: Optimizer that implements the proximal gradient descent algorithm.
: Optimizer that implements the RMSProp algorithm.
: Class to synchronize, aggregate gradients and pass them to the optimizer.

优化器比较多，这里主要总结下GradientDescentOptimizer，ProximalGradientDescentOptimizer，SyncReplicasOptimizer三个和梯度下降相关的优化器。

优化器实现的是梯度下降算法。梯度下降原理这里不过多阐述，可以查看参考文献。

GradientDescentOptimizer初始化方法中包含三个参数
name:优化器名字
learning_rate: 学习率，控制参数的更新速度。过大过小都会影响算法的运算时间和结果，过大容易发散，过小运算时间太长。
use_locking: 默认False。变量允许并发读写操作，若为true则防止对变量的并发更新。
根据官方文档FAQ中说明：

How do variables behave when they are concurrently accessed?
Variables allow concurrent read and write operations. The value read from a variable may change if it is concurrently updated. By default, concurrent assignment operations to a variable are allowed to run with no mutual exclusion. To acquire a lock when assigning to a variable, pass use_locking=True to tf.Variable.assign.

学习率（learning_rate）变化的方法可采用指数衰减法-封装方法为：

实现算法为：
$decayed\_learning\_rate=learning\_rate *decay\_rate^{(global\_step / decay\_steps)}$

根据global_step增加，实现learning_rate呈指数衰减.
staircase字段提供了不同的衰减方式，当staircase=True 时候 global_step / decay_steps 为整数除法，衰减学习率服从阶梯函数。

核心方法：

主要的两个参数：
loss：构造优化的损失函数,类型Tensor
global_step：通常于学习率变化一起使用，可选变量，在变量更新后增加1。

样例：

样例中的学习率采用每100步骤呈一次0.9的比率阶梯性下降。其中loss 是需要编写的损失函数。

近端梯度方法：wiki中介绍说Proximal Gradient Descent是用于解决不可微凸优化问题的广义投影形式。原理可以查看这篇文章（http://papers.nips.cc/paper/3793-efficient-learning-using-forward-backward-splitting.pdf）。
该算法求解的问题是：
$min F(x)+R(x)$
其中，F(x) 凸、可导，R(X) 凸；
公式推导可以参考 http://roachsinai.github.io/2016/08/03/1Proximal_Method/

该方法初始化内容：

除了学习率以外还有，l1_regularization_strength，l2_regularization_strength两个参数。通过设置两个值来选择使用l1正则，l2正则，还是混合正则。
优化方法如下：

论文中混合正则的逻辑所述如下：

在一个典型的异步训练环境中，通常会有一些陈旧的梯度。例如，对于N个副本异步训练，梯度将独立地应用到变量N次。根据每个副本的训练速度，一些梯度可以从返回的几个步骤(平均N-1步)的变量副本中计算出来。这个优化器通过从所有副本中收集梯度，对它们进行平均，然后一次性将它们应用到变量中，从而避免了陈旧的梯度，在此之后，副本可以获取新的变量并继续执行。
使用方法：

梯度下降法 https://zh.wikipedia.org/wiki/%E6%A2%AF%E5%BA%A6%E4%B8%8B%E9%99%8D%E6%B3%95
梯度下降小结 https://www.cnblogs.com/pinard/p/5970503.html
近端梯度下降（Proximal Gradient Descent）http://papers.nips.cc/paper/3793-efficient-learning-using-forward-backward-splitting.pdf
tensorflow 1.11官方 api https://www.tensorflow.org/api_docs/python/tf/train

返回列表

TCP Optimizer下载

京东