Python：在倒序排序列表中第一个小于阈值的元素的索引

Question

Python：在倒序排序列表中第一个小于阈值的元素的索引

4

类似的问题已经在一个排序的列表中被问过了，链接在这里，但该解决方案使用了bisect，对于逆序排序的列表不起作用。

假设我有一个列表，按中间元素排序，并按相反顺序排列。

my_list = [[3,0.99,1], [2,0.98,54], [10,.85,4], [1,0.7,10], [12,0.69,31], [12,0.65,43], [1.56,0] ....]

我希望对一个单独排序的列表中的中间元素应用一系列阈值。

threshold = [0.97, 0.90, 0.83, 0.6]

我正在尝试查找第一个小于阈值的元素的索引值。在上面的例子中，它应该返回：

index_list = [2, 2, 3, 6]

建议采用最快的方式完成这项任务？

- R.Bahl

5个回答

1

使用numpy，我认为它比纯Python实现看起来更干净，并且几乎肯定更快：


import numpy as np
arr = np.array([[3,0.99,1], [2,0.98,54], [10,.85,4], [1,0.7,10], [12,0.69,31], [12,0.65,43], [10,0.50, 24]])
thresholds = [0.97, 0.90, 0.83, 0.60]
idx = [np.min(np.where(arr[:,1] < i)) for i in thresholds if np.where(arr[:,1] < i)[0].size > 0]
print idx
[2, 2, 3, 6]

- reptilicus

Numpy的确运行良好，但我观察到的是，虽然它比纯Python更快，但将列表转换为Numpy数组所涉及的开销可能很快成为瓶颈，特别是如果数据很大。 - R.Bahl

有没有办法只返回最后一个元素的索引，即如果没有小于阈值的元素，则返回len？目前Numpy出现了错误。 - R.Bahl

你可以添加一个if语句来测试np.where()语句是否捕获到了什么。我在上面的语句中进行了编辑作为示例... - reptilicus

0

请尝试以下方法：

threshold = [0.97, 0.90, 0.83, 0.6]
my_list = [[3,0.99,1], [2,0.98,54], [10,.85,4], [1,0.7,10], [12,0.69,31], [12,0.65,43], [1,.56,0]]
threshold = [0.97, 0.90, 0.83, 0.6]

index_list = []
ti = 0
for i, item in enumerate(my_list):
    if item[1] >= threshold[ti]:
        continue
    while ti < len(threshold) and item[1] < threshold[ti]:
        index_list.append(i)
        ti += 1

- Andrew Clark

可以使用itertools.dropwhile和itertools.cycle。 - Joel Cornett

0

我认为你应该获取键并反转。然后二分是可以的。

from bisect import bisect_left

keys = [vals[1] for vals in my_list]
keys.reverse()
mylen = len(my_list)
[mylen-bisect_left(keys,t) for t in threshold]

如果您已经安装了numpy：

my_array = np.array([[3,0.99,1], [2,0.98,54], [10,.85,4], [1,0.7,10], [12,0.69,31], [12,0.65,43], [10,0.50, 24]])
thresholds = [0.97, 0.90, 0.83, 0.60]

my_array.shape[0]-arr[::-1,1].searchsorted(threshold)

- Phil Cooper

0

import bisect
my_list_2  = sorted(my_list, key=lambda x:x[1])
for x in threshold:
    len(my_list) - bisect.bisect([z[1] for z in my_list_2], x)

- iruvar

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- xvatar · Accepted Answer

根据@gnibbler的优秀回答，您可以重写代码以适应您的需求。

我稍微修改了@gnibbler的代码，使其适用于您的情况。

优化在于，由于您的阈值也已排序，因此我们无需每次搜索整个列表，而是从上次结果索引开始。

def reverse_binary_search(a, x, lo=0, hi=None):
    if lo < 0:
        raise ValueError('lo must be non-negative')
    if hi is None:
        hi = len(a)
    while lo < hi: 
        mid = (lo+hi)/2
        if x > a[mid][4]:
            hi = mid 
        else:
            lo = mid+1
    return lo

my_list = [[3,0.99,1], [2,0.98,54], [10,.85,4], [1,0.7,10], [12,0.69,31], [12,0.65,43], [1.56,0]]
threshold = [0.97, 0.90, 0.83, 0.6]

index_list = []
last_index = 0
for t in threshold:
    last_index = reverse_binary_search(my_list, t, last_index) # next time start search from last_index
    index_list.append(last_index)

感谢@PhilCooper提供的宝贵建议。这是使用他建议的生成器的代码：

def reverse_binary_search(a, threshold):
    lo = 0
    for t in threshold:
        if lo < 0:
            raise ValueError('lo must be non-negative')
        hi = len(a)
        while lo < hi: 
            mid = (lo+hi)/2
            if t > a[mid][6]:
                hi = mid 
            else:
                lo = mid+1
        yield lo

my_list = [[3,0.99,1], [2,0.98,54], [10,.85,4], [1,0.7,10], [12,0.69,31], [12,0.65,43], [1.56,0]]
threshold = [0.97, 0.90, 0.83, 0.6]

index_list = list(reverse_binary_search(my_list, threshold))