计算落在一组 x、y、z 坐标之间的值的数量。

Question

计算落在一组 x、y、z 坐标之间的值的数量。

3

我正在尝试编写一个方法，使我能够计算三维空间中落在具有三维坐标的另一个物体内的对象数量。你可以说这个物体也有一个半径，因此我试图计算在球体内部的对象数量。

我不会贴出我的当前脚本，但我将用一个例子来说明：我有一个带有3D坐标gal_pos和半径gal_rad的星系。

import numpy as np
gal_pos = np.array(
  [[ 528.1373291 ,  432.18615723,  443.8348999 ],
   [ 540.12231445,  450.08154297,  442.07891846],
   [ 590.73675537,  234.6769104 ,  296.02798462],
   [ 529.98809814,  161.75544739,  567.58203125],
   [ 552.45446777,  312.1973877 ,  375.42492676],
   [ 700.94335938,   65.46828461,  172.71842957],
   [ 463.43258667,   73.57706451,  285.4147644 ],
   [ 547.74414062,  330.9855957 ,  401.49771118],
   [ 591.89801025,  196.19670105,  274.60073853],
   [ 581.28320312,  376.70013428,  359.81851196],
   [ 520.09820557,  302.17849731,  371.68771362],
   [ 812.84539795,   97.41672516,  150.87428284],
   [ 541.6552124 ,   17.40070724,  373.07562256],
   [ 523.34509277,  302.18151855,  503.6333313 ]])

gal_rad = np.array(
  [ 1.14752779,  1.02471195,  0.79648002,  0.6085083 ,  0.78725676,
    1.07809084,  0.57744866,  0.93733404,  0.76053329,  0.68979678,
    0.61188519,  1.07989271,  0.83872035,  0.59899661])

我还有带有3D位置star_pos的星星。

star_pos = np.array(
  [[ 517.0300293 ,  264.54165649,  547.87835693],
   [ 530.37280273,  358.40835571,  455.68734741],
   [ 530.42211914,  358.20803833,  455.80908203],
   [ 530.86737061,  324.91717529,  407.96405029],
   [ 547.05175781,  333.9262085 ,  403.82403564],
   [ 530.61053467,  325.91259766,  407.04153442],
   [ 533.9979248 ,  331.18804932,  451.3795166 ],
   [ 531.20678711,  326.75308228,  406.44711304],
   [ 550.81237793,  340.88101196,  408.75830078],
   [ 519.52880859,  299.91259766,  516.25140381],
   [ 525.82739258,  301.46209717,  501.66738892],
   [ 524.87988281,  268.88357544,  510.0123291 ],
   [ 524.43371582,  299.99725342,  512.36077881],
   [ 524.40429688,  299.8979187 ,  512.57452393],
   [ 524.40765381,  299.89120483,  512.5032959 ],
   [ 545.57440186,  331.59066772,  401.20291138],
   [ 532.29016113,  306.27557373,  491.26434326],
   [ 530.77410889,  326.18057251,  407.06216431],
   [ 524.14819336,  306.60586548,  509.55993652]])

上述只是我拥有的一小部分价值的样本。

xmax_rad = gal_pos[:,0]+gal_rad
xmin_rad = gal_pos[:,0]-gal_rad

ymax_rad = gal_pos[:,1]+gal_rad
ymin_rad = gal_pos[:,1]-gal_rad

zmax_rad = gal_pos[:,2]+gal_rad
zmin_rad = gal_pos[:,2]-gal_rad

tot_pop = [] # Total population found each galaxy

Nind = [(x,y,z) for x,y,z in enumerate(star_pos) 
        if any(xmin_rad <=x<= xmax_rad) and 
        any(ymin_rad<=y<=ymax_rad) 
        and any(zmin_rad<=x<=zmax_rad)]
tot_pop.append(Nind)

print tot_pop

我正在尝试的这种方法通过分解每个元素使我感觉最有道理，但是它只适用于大小约为300的数组，但对于Nind返回ValueError：need more than 2 values to unpack。很可能是因为我的迭代无法解压缩3个对象？

我尝试过其他方法，其中我获取每个位置的数量级，但返回不正确的结果，以及通过直方图计算值，但再次返回不正确的结果（我通过在2d直方图中显示所有内容来检查）。对于我为每个星系编制索引的此方法，返回每个星系的空数组：

tot_pop = []
for k in np.arange(len(gal_pos)):
    Nind = [(x,y) for x,y in enumerate(star_pos) 
        if xmin_rad[k] <=x<= xmax_rad[k]) and 
        ymin_rad[k]<=y<=ymax_rad[k]]

    tot_pop.append(Nind)

- iron2man

相关输入的形状是什么？ - Divakar

@MSeifert @Divakar 你好，抱歉我应该包含那些信息。在我的分析中，star_pos 的形状为（128,3），实际上是一个NumPy数组。同样地，gal_pos 的形状为（14,3），也是一个NumPy数组。我马上会在我的帖子中包含一些数值！ - iron2man

2个回答

3

这里有一个几乎向量化的方法，利用高效的NumPy广播和切片来协助处理 -

# Define low and high limits
l = gal_pos - gal_rad[:,None]
h = gal_pos + gal_rad[:,None]

# Get mask of valid ones for each row of star_pos
mask = np.ones(star_pos.shape[0], dtype=bool)
for i in range(star_pos.shape[1]):
    mask &= ((l[:,i,None] <= star_pos[:,i]) & (h[:,i,None] >= star_pos[:,i])).any(0)

# Finally use the mask to select valid rows off star_pos
out = star_pos[mask]

称之为几乎向量化，因为我们仍在迭代“star_pos”的列数。但是，由于我们正在处理“X，Y，Z”数据，所以这将是“3”。因此，出于这个原因，称其几乎向量化是安全的。

对于给定示例，这是我得到的-

In [302]: out
Out[302]: array([], shape=(0, 3), dtype=float64)

所以，没有一个点的star_pos满足限制条件。

- Divakar

嗯，有趣。我在使用 any 和 | 时遇到了一些困难。我认为这不正确。我认为所有三个坐标都应该在“范围”内。 - MSeifert

@MSeifert 我想any(xmin_rad <=x<= xmax_rad)让我感到困惑。已修复。 - Divakar

现在好多了 :) 对于这种问题，我倾向于不使用完全矢量化的解决方案（通过扩展维度），因为很容易耗尽内存——因为你最终会得到两个n*m数组，如果数据集很大，即使拥有很大的RAM的计算机也可能用尽。在这种情况下，只涉及长度为300的数组，所以没有问题，加1 :) - MSeifert

@MSeifert 嗯，由于切片的原因，这个保持为2D。只是利用了我们只有XYZ（3列）的事实。 - Divakar

是的，但如果两个数组都有成千上万个元素，那么大小为第一个数组（星系）长度乘以第二个数组（恒星）长度的二维数组会使用大量RAM。 - MSeifert

@MSeifert 是的，就是这样！与任何需要预先设置的矢量化解决方案一样。 - Divakar

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- MSeifert · Accepted Answer

你可以使用 zip 来遍历星系和半径，然后使用广播和布尔索引来查找匹配项：

result = []
for galaxy, galaxy_radius in zip(gal_pos, gal_rad):
    # With broadcasting you can simply subtract the positions from the galaxy center
    # and using abs avoids checking lower and upper bound.
    rel_star_pos = abs(star_pos - galaxy)
    # Check which distances are below the radius and keep these which are
    # within the radius for x, y and z
    matches = (rel_star_pos <= galaxy_radius).all(axis=1)
    # use boolean indexing to append the stars which satisfy the above condition
    result.append(star_pos[matches])
print(result)

如果您希望追加索引（而非实际的星标坐标），则可以将append行更改为：

result.append(np.where(matches)[0])

如果您只想知道匹配数量：

result.append(np.sum(matches))

然而，根据提供的数据我没有找到任何匹配项：

[array([], shape=(0, 3), dtype=float64),
 array([], shape=(0, 3), dtype=float64),
 array([], shape=(0, 3), dtype=float64),
 array([], shape=(0, 3), dtype=float64),
 array([], shape=(0, 3), dtype=float64),
 array([], shape=(0, 3), dtype=float64),
 array([], shape=(0, 3), dtype=float64),
 array([], shape=(0, 3), dtype=float64),
 array([], shape=(0, 3), dtype=float64),
 array([], shape=(0, 3), dtype=float64),
 array([], shape=(0, 3), dtype=float64),
 array([], shape=(0, 3), dtype=float64),
 array([], shape=(0, 3), dtype=float64),
 array([], shape=(0, 3), dtype=float64)]