我有两个二维numpy数组A和B。我想使用scipy.stats.multivariate_normal来计算每行A的联合logpdf,使用每行B作为协方差矩阵。是否有一种不需要显式循环遍历行的方法?将scipy.stats.multivariate_normal直接应用于A和B会计算出A中每行的logpdf(这正是我想要的),但使用整个2D数组A作为协方差矩阵,而这不是我想要的(我需要每行B创建一个不同的协方差矩阵)。我正在寻找一种使用numpy向量化并避免显式循环遍历两个数组的解决方案。
def vectorized_gaussian_logpdf(X, means, covariances):
"""
Compute log N(x_i; mu_i, sigma_i) for each x_i, mu_i, sigma_i
Args:
X : shape (n, d)
Data points
means : shape (n, d)
Mean vectors
covariances : shape (n, d)
Diagonal covariance matrices
Returns:
logpdfs : shape (n,)
Log probabilities
"""
_, d = X.shape
constant = d * np.log(2 * np.pi)
log_determinants = np.log(np.prod(covariances, axis=1))
deviations = X - means
inverses = 1 / covariances
return -0.5 * (constant + log_determinants +
np.sum(deviations * inverses * deviations, axis=1))
def test_vectorized_gaussian_logpdf():
n = 128**2
d = 64
means = np.random.uniform(-1, 1, (n, d))
covariances = np.random.uniform(0, 2, (n, d))
X = np.random.uniform(-1, 1, (n, d))
refs = []
ref_start = time.time()
for x, mean, covariance in zip(X, means, covariances):
refs.append(scipy.stats.multivariate_normal.logpdf(x, mean, covariance))
ref_time = time.time() - ref_start
fast_start = time.time()
results = vectorized_gaussian_logpdf(X, means, covariances)
fast_time = time.time() - fast_start
print("Reference time:", ref_time)
print("Vectorized time:", fast_time)
print("Speedup:", ref_time / fast_time)
assert np.allclose(results, refs)
我获得了大约250倍的加速。(是的,我的应用需要计算16384个不同的高斯函数。)
scipy.stats.multivariate_normal( cov=B).logpdf(A)
- W.Alinumpy
函数(或scipy
)不能处理更高维度的输入,那么避免显式循环的numpy
向量化就会变得困难。 - hpaulj