使用Numpy或Scipy函数，如何从表示2D分布的2D Numpy数组中采样数据？

Question

使用Numpy或Scipy函数，如何从表示2D分布的2D Numpy数组中采样数据？

8

给定一个形状为(200,200)的2D numpy数组dist，其中数组的每个元素表示{x1, x2}对于所有x1，x2属于{0, 1, . . . , 199}的联合概率。如何使用Numpy或Scipy API从此概率分布中采样双变量数据x =（x1，x2）？

- jabberwoo

请提供输入和输出示例 - umn

您可以通过下载npy文件来检查输入数组。输出应该是一个形状为[采样大小，2]的Numpy数组。例如，如果采样大小==2，则输出可能是[[122,199],[182,28]]。实际上，这是cs294-158课程作业的一部分（我并没有参加这门课程，只是出于个人兴趣），您可以查看2二维数据。 - jabberwoo

有什么建议吗？ - jabberwoo

4个回答

3

这里有一种方法，但我相信使用scipy会有更加优美的解决方案。 numpy.random无法处理2D概率质量函数，因此您需要进行一些重塑操作。

import numpy as np

# construct a toy joint pmf
dist=np.random.random(size=(200,200)) # here's your joint pmf 
dist/=dist.sum() # it has to be normalized 

# generate the set of all x,y pairs represented by the pmf
pairs=np.indices(dimensions=(200,200)).T # here are all of the x,y pairs 

# make n random selections from the flattened pmf without replacement
# whether you want replacement depends on your application
n=50 
inds=np.random.choice(np.arange(200**2),p=dist.reshape(-1),size=n,replace=False)

# inds is the set of n randomly chosen indicies into the flattened dist array...
# therefore the random x,y selections
# come from selecting the associated elements
# from the flattened pairs array
selections = pairs.reshape(-1,2)[inds]

- kevinkayaks

如果我们有一个三维表（即多了一维），那么reshape(-1,2)会是什么样子呢？谢谢。 - Alexander Cska

2

我也不能评论，但是@applemonkey496关于获取多个样本的建议并不完全正确。除此之外，这是一个非常好的解决方案。

与其如此，

adjusted_index = np.array(zip(*adjusted_index))

在尝试将adjusted_index放入numpy数组之前，应将其转换为Python列表（numpy数组不接受压缩对象），例如：

adjusted_index = np.array(list(zip(*adjusted_index)))

- Andrew Reeves

1

好的，谢谢指出！我会编辑我的回答。 - applemonkey496

1

我无法评论，但是为了改进kevinkayaks的回答：

pairs=np.indices(dimensions=(200,200)).T
selections = pairs.reshape(-1,2)[inds]

“不需要”可以替换为“：”。

np.array([inds//m, inds%m]).T

矩阵“pairs”不再需要。

- Hv0nnus HACH

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- applemonkey496 · Accepted Answer

这个解决方案适用于任意维度的概率分布，假设它们是有效的概率分布（其内容必须总和为1等）。它将分布展平，从中进行采样，并调整随机索引以匹配原始数组形状。

# Create a flat copy of the array
flat = array.flatten()

# Then, sample an index from the 1D array with the
# probability distribution from the original array
sample_index = np.random.choice(a=flat.size, p=flat)

# Take this index and adjust it so it matches the original array
adjusted_index = np.unravel_index(sample_index, array.shape)
print(adjusted_index)

此外，要获取多个样本，请在np.random.choice调用中添加一个size关键字参数，并在打印之前修改adjusted_index：

adjusted_index = np.array(list(zip(*adjusted_index)))

这是必要的，因为带有size参数的np.random.choice会输出每个坐标维度的索引列表，所以它们被压缩成了一个坐标元组的列表。这也比简单地重复第一个代码要高效得多。