Warm tip: This article is reproduced from serverfault.com, please click

python-如何在训练时在keras中自定义梯度计算?

(python - How can I customize the gradient computation at training time in keras?)

发布于 2020-11-28 14:34:28

我想在keras图层中实现自然渐变。这必须在已经到位的自定义渐变中发生。我希望能够在调用优化器时选择应该计算哪个实现(常规或自然梯度)。

我面临的问题是,当我nat_grad=True在训练时(而不是在图形生成时)将布尔值传递给Op时,AutoGraph并不满意。

目前,正在发生的事情的伪代码如下:

@tf.custom_gradient
def MyOp(inputs, w, nat_grad=False):
    output = w*inputs
    def grad(dy):
        if nat_grad:
            return dy, 1.0
        else:
            return -dy, -1.0
    return output, grad


class MyKerasLayer(tf.keras.layers.Layer):
    def __init__(self):
        super().__init__()
        self.nat_grad = False
        
    def build(self, input_shape):
        self.w = self.add_weight("w", dtype=tf.float32, trainable=True, initializer=tf.random_normal_initializer)
        super().build(input_shape)

    def call(self, inputs):
        return MyOp(inputs, self.w, self.nat_grad)


class MyModel(tf.keras.Sequential):
    def __init__(self, num_layers):
        super().__init__([tf.keras.Input(shape=[1], batch_size=None, dtype=tf.float32)]+[MyKerasLayer() for _ in range(num_layers)])


def optimize(model, X, Y, nat_grad:bool):
    for layer in model.layers:
        layer.nat_grad = nat_grad
    model.fit(x=X, y=Y)

    
model = MyModel(5)
model.compile(optimizer='SGD', loss=lambda x,y:x-y, metrics=[])
X = np.array([1.0, 2.0, 3.0])
Y = np.array([1.0, 2.0, 3.0])
optimize(model, X, Y, nat_grad=True)
>>> OperatorNotAllowedInGraphError: using a `tf.Tensor` as a Python `bool` is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature.

什么是正确的方法?

Questioner
Ziofil
Viewed
0
David Vander Mijnsbrugge 2020-11-29 01:26:56

Tensorflow 2.x允许将函数作为tf.graphs [1]执行。因此,装饰grad(dy)@tf.function应该工作,但因为你会碰到一个新的错误MyOp需要nat_grad作为输入它将把这个变量[2]的梯度。

@tf.custom_gradient
def MyOp(inputs, w, nat_grad=False):
    output = w*inputs
    @tf.function
    def grad(dy):
        if nat_grad:
            return dy, 1.0, 0.
        else:
            return -dy, -1.0, 0.
    return output, grad

在我看来,这不是做到这一点的方法,而是将渐变op分为两部分,然后分别命名为call

@tf.custom_gradient
def NatOp(inputs, w):
    output = w*inputs
    def grad(dy):
        return dy, 1.0     
    return output, grad

@tf.custom_gradient
def RegOp(inputs, w):
    output = w*inputs
    def grad(dy):
        return -dy, -1.0
    return output, grad

class MyKerasLayer(tf.keras.layers.Layer):
    def __init__(self):
        super().__init__()
        self.nat_grad = False
        
    def build(self, input_shape):
        self.w = self.add_weight("w", dtype=tf.float32, trainable=True, initializer=tf.random_normal_initializer)
        super().build(input_shape)

    def call(self, inputs):
        return NatOp(inputs, self.w) if self.nat_grad else RegOp(inputs, self.w)

[1] https://www.tensorflow.org/api_docs/python/tf/function

[2] https://www.tensorflow.org/api_docs/python/tf/custom_gradient