有效地按元素对数组进行分组

3
假设我有:

lags = [0, 30, 60, 90, 120, 150, 180, np.inf]

并且

list = [[500, 800, 1000, 200, 1500], [220, 450, 350, 1070, 1780], [900, 450, 1780, 1450, 100], 
        [340, 670, 830, 1370, 1420], [850, 630, 1230, 1670, 910]]

angle = [[50, 80, 100, 20, 150], [22, 45, 35, 107, 178], [90, 45, 178, 145, 10], 
        [34, 67, 83, 137, 142], [85, 63, 123, 167, 91]]

我希望能够获取列表中的每个元素,并根据其值将其存储在不同的单独数组中,例如:
for all list.values where angles.value is less than 30
list1 = [200, 220, 100]
for all list.values where angles.value is between 30 and 60
list2 = [500, 450, 350, 450, 340] 
for all list.values where angles.value is between 60 and 90
list3 = [800, 670, 830, 850, 630]

我做了以下类似的事情:

sortlist = defaultdict(list)
ulist = np.unique(list)
uangle = np.unique(angle)
for lag in lags:
    count += 1
    for k, dummy_val in enumerate(uangle):
        if lag <= uangle[k] < lag + 1:
            sortlist[count].append(ulist[k])

我想知道是否有一种Pythonic/有效的方法来提高性能。

你在意每个输出列表中元素的顺序吗? - Divakar
@Divakar:我不在意顺序。 - user2554925
@Divakar,刚刚已经删除了它。如果我想要它,我将不得不添加'='。 - user2554925
@user7436576 已更新所有滞后值的数值。 - Mohammad Yusuf
3个回答

5
这里有一个向量化的方法 -
an = angle.ravel()
sidx = an.argsort()
cut_idx = np.searchsorted(an[sidx], lags)
out = np.split(list1.ravel()[sidx], cut_idx[1:-1])

示例输入、输出 -

In [97]: lags = np.array([0, 30, 60, 90, 120, 150, 180, np.inf])
    ...: 
    ...: list1 = np.array([[500, 800, 1000, 200, 1500], \
    ...:                   [220, 450, 350, 1070, 1780], \
    ...:                   [900, 450, 1780, 1450, 100], 
    ...:                   [340, 670, 830, 1370, 1420], \
    ...:                   [850, 630, 1230, 1670, 910]])
    ...: 
    ...: angle = np.array([[50, 80, 100, 20, 150],\
    ...:                   [22, 45, 35, 107, 178],\
    ...:                   [90, 45, 178, 145, 10], 
    ...:                   [34, 67, 83, 137, 142],\
    ...:                   [85, 63, 123, 167, 91]])
    ...: 

In [99]: out
Out[99]: 
[array([100, 200, 220]),            # <----- 0 to 30
 array([340, 350, 450, 450, 500]),  # <----- 30 to 60
 array([630, 670, 800, 830, 850]),  # <----- 60 to 90
 array([ 900,  910, 1000, 1070]),   # <----- 90 to 120
 array([1230, 1370, 1420, 1450]),   # <----- 120 to 150
 array([1500, 1670, 1780, 1780]),   # <----- 150 to 180
 array([], dtype=int64)]            # <----- 180 to Inf

你为什么要执行 cut_idx[1:-1] - Mohammad Yusuf
@MYGz 嗯,np.split 只需要分割数组的中间索引,即重叠区间的位置。因此,我们不需要起始和结束索引。 - Divakar

4

使用numpy:

import numpy as np

lags = [0, 30, 60, 90, 120, 150, 180, np.inf]
alist = np.array([[500, 800, 1000, 200, 1500], [220, 450, 350, 1070, 1780], [900, 450, 1780, 1450, 100], 
        [340, 670, 830, 1370, 1420], [850, 630, 1230, 1670, 910]])
angle = np.array([[50, 80, 100, 20, 150], [22, 45, 35, 107, 178], [90, 45, 178, 145, 10], 
        [34, 67, 83, 137, 142], [85, 63, 123, 167, 91]])

i=0
while i<len(lags)-1:
    print alist[(lags[i] <= angle) & (angle < lags[i+1] )]
    i+=1
    

输出:

[200, 220, 100]
[500, 450, 350, 450, 340]
[800, 670, 830, 850, 630]
[1000, 1070, 900, 910]
[1450, 1370, 1420, 1230]
[1500, 1780, 1780, 1670]
[]
    

angle<lags[i]会创建一个布尔索引,用于掩盖alist中不需要的值。


使用zip()和列表推导式:

import numpy as np

lags = [0, 30, 60, 90, 120, 150, 180, np.inf]

alist = [[500, 800, 1000, 200, 1500], [220, 450, 350, 1070, 1780], [900, 450, 1780, 1450, 100], 
        [340, 670, 830, 1370, 1420], [850, 630, 1230, 1670, 910]]

angle = [[50, 80, 100, 20, 150], [22, 45, 35, 107, 178], [90, 45, 178, 145, 10], 
        [34, 67, 83, 137, 142], [85, 63, 123, 167, 91]]

i=0
while i<len(lags)-1:
    print [b[0] for a in zip(alist, angle) for b in zip(*a) if lags[i]<= b[1] < lags[i+1]]
    i+=1
    

输出:

[200, 220, 100]
[500, 450, 350, 450, 340]
[800, 670, 830, 850, 630]
[1000, 1070, 900, 910]
[1450, 1370, 1420, 1230]
[1500, 1780, 1780, 1670]
[]

2

基于纯Python实现的解决方案(不使用numpy)

您可以将值存储在一个dict中而不是单独的变量中(使用collections.defaultdict更好)。

您可以创建一个函数来根据角度返回组:

def get_group_from_angle(angle):
    group = ''
    if angle < 30:
        group = 'a'
    elif 30 < angle < 60:
        group = 'b' 
    elif 60 < angle < 90:
        group = 'c'
    return group

接下来在for循环中使用上述函数,创建所需的dict

from collections import defaultdict
my_dict = defaultdict(list)

#  `alist` and `angle` are variables holding values as mentioned in Question

for ll, aa in zip(list, angle):
    for l, a in zip(ll, aa):
        my_dict[get_group_from_angle(a)].append(l)
< p > my_dict 最终持有的值为:

{ 
    'a': [200, 220, 100], 
    'b': [500, 450, 350, 450, 340],
    'c': [800, 670, 830, 850, 630], 
    '': [1000, 1500, 1070, 1780, 900, 1780, 1450, 1370, 1420, 1230, 1670, 910]
    # ^ number whose angle is not present in any specified range 
}

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接