在Python中实现Adagrad

Question

在Python中实现Adagrad

pythonnumpypytorchautograd

7

我正在尝试使用Python实现Adagrad。为了学习目的，我以矩阵分解作为例子。我将使用Autograd来计算梯度。

我的主要问题是实现是否正确。

问题描述：

给定一个大小为A（M x N）的矩阵，其中有一些缺失的条目，将其分解成具有大小（M x k）和（k X N）的W和H。目标是使用Adagrad学习W和H。我将遵循此指南来实现Autograd。

注意：我非常清楚基于ALS的实现非常适合。我仅出于学习目的而使用Adagrad。

常用导入：

import autograd.numpy as np
import pandas as pd

创建待分解的矩阵

A = np.array([[3, 4, 5, 2],
                   [4, 4, 3, 3],
                   [5, 5, 4, 3]], dtype=np.float32).T

屏蔽一个条目

A[0, 0] = np.NAN

定义成本函数

def cost(W, H):
    pred = np.dot(W, H)
    mask = ~np.isnan(A)
    return np.sqrt(((pred - A)[mask].flatten() ** 2).mean(axis=None))

分解参数

rank = 2
learning_rate=0.01
n_steps = 10000

代价函数关于参数 W 和 H 的梯度

from autograd import grad, multigrad
grad_cost= multigrad(cost, argnums=[0,1])

主要的Adagrad例程（需要检查）

shape = A.shape

# Initialising W and H
H =  np.abs(np.random.randn(rank, shape[1]))
W =  np.abs(np.random.randn(shape[0], rank))

# gt_w and gt_h contain accumulation of sum of gradients
gt_w = np.zeros_like(W)
gt_h = np.zeros_like(H)

# stability factor
eps = 1e-8
print "Iteration, Cost"
for i in range(n_steps):

    if i%1000==0:
        print "*"*20
        print i,",", cost(W, H)

    # computing grad. wrt W and H
    del_W, del_H = grad_cost(W, H)

    # Adding square of gradient
    gt_w+= np.square(del_W)
    gt_h+= np.square(del_H)

    # modified learning rate
    mod_learning_rate_W = np.divide(learning_rate, np.sqrt(gt_w+eps))
    mod_learning_rate_H = np.divide(learning_rate, np.sqrt(gt_h+eps))
    W =  W-del_W*mod_learning_rate_W
    H =  H-del_H*mod_learning_rate_H

当问题收敛且我得到了一个合理的解决方案时，我在想实现是否正确。具体来说，对于梯度总和的理解以及计算自适应学习率的方式是否正确？

- Nipun Batra

1

你的实现非常好！ - Nuageux

2

如果你知道代码是可行的，只是想寻求一般的重构/效率提示，那么你应该在Code Review上发布它。很酷的代码！ - Engineero

1

@Engineero：谢谢。已发布https://codereview.stackexchange.com/questions/165371/implementing-adagrad-in-python - Nipun Batra

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- serv-inc · Accepted Answer

乍一看，你的代码与https://github.com/benbo/adagrad/blob/master/adagrad.py非常相似。

del_W, del_H = grad_cost(W, H)

匹配

grad=f_grad(w,sd,*args)

gt_w+= np.square(del_W)
gt_h+= np.square(del_H)

匹配

gti+=grad**2

mod_learning_rate_W = np.divide(learning_rate, np.sqrt(gt_w+eps))
mod_learning_rate_H = np.divide(learning_rate, np.sqrt(gt_h+eps))

匹配

adjusted_grad = grad / (fudge_factor + np.sqrt(gti))

W =  W-del_W*mod_learning_rate_W
H =  H-del_H*mod_learning_rate_H

匹配

w = w - stepsize*adjusted_grad

因此，假设adagrad.py是正确的，翻译也是正确的，那么你的代码就是正确的。（共识并不能证明你的代码是对的，但它可能是一个提示）