基于另一个数组中的数据计算一个数组中元素的平均值。

Question

基于另一个数组中的数据计算一个数组中元素的平均值。

3

我需要对X数组中的值对应的Y值求平均值...

X=np.array([  1,  1,  2,  2,  2,  2,  3,  3 ... ])

Y=np.array([ 10, 30, 15, 10, 16, 10, 15, 20 ... ])

换句话说，X数组中1值的相当于Y数组中的10和30，它们的平均值为20，2值的相当于15、10、16和10，其平均值为12.75，以此类推...

如何计算这些平均值？

- kyazgan

1

如果组从1开始升序排列，则翻译为：np.bincount(X-1, Y) / np.bincount(X-1)。 - Michael Szczesny

5个回答

4

你可以尝试使用 pandas。

import pandas as pd
import numpy as np

N = pd.DataFrame(np.transpose([X,Y]),
             columns=['X', 'Y']).groupby('X')['Y'].mean().to_numpy()
# array([20.  , 12.75, 17.5 ])

- It_is_Chris

2

为什么这么复杂？pd.Series(Y).groupby(X).mean().to_numpy() ;) - mozway

这样就清楚多了。我总是忘记你可以对数组进行分组。 - It_is_Chris

2

import numpy as np

X = np.array([  1,  1,  2,  2,  2,  2,  3,  3])

Y = np.array([ 10, 30, 15, 10, 16, 10, 15, 20])

# Only unique values
unique_vals = np.unique(X);

# Loop for every value
for val in unique_vals:
    # Search for proper indexes in Y
    idx = np.where(X == val)
    # Mean for finded indexes
    aver = np.mean(Y[idx])
    print(f"Average for {val}: {aver}")

结果：

1的平均值：20.0

2的平均值：12.75

3的平均值：17.5

- TreshUp

1

你可以使用以下代码：

你可以像下面这样使用：

import numpy as np

X=np.array([  1,  1,  2,  2,  2,  2,  3,  3])

Y=np.array([ 10, 30, 15, 10, 16, 10, 15, 20])


def groupby(a, b):
    # Get argsort indices, to be used to sort a and b in the next steps
    sidx = b.argsort(kind='mergesort')
    a_sorted = a[sidx]
    b_sorted = b[sidx]

    # Get the group limit indices (start, stop of groups)
    cut_idx = np.flatnonzero(np.r_[True,b_sorted[1:] != b_sorted[:-1],True])

    # Split input array with those start, stop ones
    out = [a_sorted[i:j] for i,j in zip(cut_idx[:-1],cut_idx[1:])]
    return out

group_by_array=groupby(Y,X)
for item in group_by_array:
    print(np.average(item))

我使用下面链接中的信息来回答这个问题：使用一个值数组将numpy分组成多个子数组。

- Hossein Biniazian

1

我认为这个解决方案应该可行：

avg_arr = []
i = 1
while i <= np.max(x):
    inds = np.where(x == i)
    my_val = np.average(y[inds[0][0]:inds[0][-1]])
    avg_arr.append(my_val)
    i+=1

虽然不是最干净的，但我能够快速测试它，而且它确实有效。

- rsenne

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- j1-lee · Accepted Answer

其中一种选择是使用线性回归的一个属性（与分类变量相关）：

import numpy as np

x = np.array([  1,  1,  2,  2,  2,  2,  3,  3 ])
y = np.array([ 10, 30, 15, 10, 16, 10, 15, 20 ])

x_dummies = x[:, None] == np.unique(x)
means = np.linalg.lstsq(x_dummies, y, rcond=None)[0]
print(means) # [20.   12.75 17.5 ]