如何实现神经网络剪枝？

Question

如何实现神经网络剪枝？

pythontensorflowoptimizationdeep-learninginference

12

我在keras中训练了一个模型，现在考虑对全连接网络进行剪枝。但是我不太清楚如何剪枝各个层。

《学习高效神经网络的权重和连接》一文的作者说他们会给某一层的权重添加一个阈值掩码来进行剪枝。我可以尝试同样的方法并微调已经训练好的模型。但这样做又如何减少模型大小和计算次数呢？

- Illuminati0x5B

具体来说，您想知道如何修剪神经网络中的特定权重？例如，给定一个 W 矩阵，您想将其中一些元素设置为 0？ - gorjan

@gorjan 我的目标是减小最终模型的大小并加速推理。我不确定设置W的某些值是否会减小模型的大小。我需要一种方法来删除连接。据我所知，TensorRT和TensorFlow Lite可以做到这一点？ - Illuminati0x5B

2

你不能本质上“删除”权重。你可以将某些权重设置为0，然后将矩阵视为稀疏矩阵。然后，TF对于密集-稀疏/稀疏-稀疏矩阵乘法有一些轻微的支持，可以用于加速推理。这是一个相关的stackoverflow线程：https://dev59.com/kKPia4cB1Zd3GeqPshDT - gorjan

@gorjan 有道理。我以为这背后还有更多的东西。让我试着实现类似的东西。 - Illuminati0x5B

当然可以！作为答案，我将发布一个方法，该方法给定一个权重矩阵 w: tf.Variable 和 k: int，它将根据它们的范数删除矩阵中 k% 最小的权重（矩阵元素）。 - gorjan

2个回答

4

如果您添加一个掩码，那么只有您权重的子集会对计算做出贡献，因此您的模型将被修剪。例如，自回归模型使用掩码来屏蔽与未来数据相关的权重，以便时间步骤t的输出仅依赖于时间步骤0、1、...、t-1。

在您的情况下，由于您有一个简单的全连接层，最好使用投放法。它会在每个迭代步骤中随机关闭一些神经元，因此可以减少计算复杂度。然而，dropout 被发明的主要原因是为了解决过度拟合：通过随机关闭一些神经元，可以降低神经元之间的相互依赖性，即避免某些神经元依赖于其他神经元。此外，在每个迭代中，您的模型都会不同（活跃神经元数量不同，并且它们之间的连接也不同），因此您的最终模型可以被解释为是多个不同模型的集合，每个模型专门用于理解输入空间的特定子集（希望如此）。

- Neb

是的。但是我的目标是加速推断并减少模型大小。如果我使用掩码，仍然需要存储所有层的权重，并且仍然需要执行整个W.X+b（其中某些W_ij设置为0）。 - Illuminati0x5B

如果您的任务是减小模型大小，那么通过动态掩码是无法实现的。如果掩码是静态的，那么只需删除您不感兴趣的权重。这样，您的网络将变得更加稀疏。 - Neb

使用掩码确实可以加速计算。考虑一个过滤矩阵W的前3列的掩码。然后，您可以将其实现为W [:, 3：]。这样，计算仅在矩阵的剩余部分上完成。对于更复杂的掩码（不连续的等），您仍然会获得一些优势，因为梯度不会针对权重为0的值进行计算。 - Neb

但是，再次强调，口罩背后的原因通常不是为了加速训练。 - Neb

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- gorjan · Accepted Answer

根据评论讨论，以下是修剪神经网络某一层（即权重矩阵）的方法。该方法会基于向量范数选择出矩阵中 k% 最小的权重（即矩阵元素），并将它们设为零。这样，我们可以将相应的矩阵视为稀疏矩阵，并执行密集-稀疏矩阵乘法，如果足够多的权重被修剪，则运算速度会更快。

def weight_pruning(w: tf.Variable, k: float) -> tf.Variable:
    """Performs pruning on a weight matrix w in the following way:

    - The absolute value of all elements in the weight matrix are computed.
    - The indices of the smallest k% elements based on their absolute values are selected.
    - All elements with the matching indices are set to 0.

    Args:
        w: The weight matrix.
        k: The percentage of values (units) that should be pruned from the matrix.

    Returns:
        The unit pruned weight matrix.

    """
    k = tf.cast(tf.round(tf.size(w, out_type=tf.float32) * tf.constant(k)), dtype=tf.int32)
    w_reshaped = tf.reshape(w, [-1])
    _, indices = tf.nn.top_k(tf.negative(tf.abs(w_reshaped)), k, sorted=True, name=None)
    mask = tf.scatter_nd_update(tf.Variable(tf.ones_like(w_reshaped, dtype=tf.float32), name="mask", trainable=False), tf.reshape(indices, [-1, 1]), tf.zeros([k], tf.float32))

    return w.assign(tf.reshape(w_reshaped * mask, tf.shape(w)))

虽然上述方法修剪单个连接（权重），但下面的方法从权重矩阵中修剪整个神经元。即，该方法基于欧几里得范数选择k％最小的神经元（权重矩阵的列）并将它们设置为零。

def unit_pruning(w: tf.Variable, k: float) -> tf.Variable:
    """Performs pruning on a weight matrix w in the following way:

    - The euclidean norm of each column is computed.
    - The indices of smallest k% columns based on their euclidean norms are selected.
    - All elements in the columns that have the matching indices are set to 0.

    Args:
        w: The weight matrix.
        k: The percentage of columns that should be pruned from the matrix.

    Returns:
        The weight pruned weight matrix.

    """
    k = tf.cast(
        tf.round(tf.cast(tf.shape(w)[1], tf.float32) * tf.constant(k)), dtype=tf.int32
    )
    norm = tf.norm(w, axis=0)
    row_indices = tf.tile(tf.range(tf.shape(w)[0]), [k])
    _, col_indices = tf.nn.top_k(tf.negative(norm), k, sorted=True, name=None)
    col_indices = tf.reshape(
        tf.tile(tf.reshape(col_indices, [-1, 1]), [1, tf.shape(w)[0]]), [-1]
    )
    indices = tf.stack([row_indices, col_indices], axis=1)

    return w.assign(
        tf.scatter_nd_update(w, indices, tf.zeros(tf.shape(w)[0] * k, tf.float32))
    )

最后，这个 Github 存储库按照此处解释的修剪方法并在MNIST数据集上进行实验。