Python - 根据另一个列表将列表拆分为子列表

Question

Python - 根据另一个列表将列表拆分为子列表

5

我有两个列表：l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]和l2 = [0.5, 1.0, 1.5, 2.0]。我想将l1拆分为子列表，这些子列表由l2的两个索引之间的元素定义。例如，l1将等于[[0,0.002, 0.3], [0.5, 0.6, 0.9], [1.3], [1.9]]。

以下是我的解决方案：

l3 = []
b=0
for i in l2:
    temp = []
    for p in l1:
        if b <= p < i:
        temp.append(p)
    l3.append(temp)
    b+=0.5

这个解决方案在我的代码中是一个巨大的瓶颈。是否有更快的方法来解决这个问题？

- jaydh

所以这些是“桶”。这是一个直方图！ - Peter Wood

1

@PeterWood 或者哈希映射！或者区间树！有太多的可能性了！ - Maciej Gol

4个回答

3

作为一种快速的方式，您可以使用numpy对于大型列表来说是最有效的：

>>> np.split(l1,np.searchsorted(l1,l2))
[array([ 0.   ,  0.002,  0.3  ]), array([ 0.5,  0.6,  0.9]), array([ 1.3]), array([ 1.9]), array([], dtype=float64)]

np.searchsorted函数可以在l1保持已排序状态（使用默认排序）的情况下，找出l2中元素在l1中对应的索引；而np.split函数则可根据索引列表将列表进行分割。

以下是在大1000倍的列表上与优选答案进行比较的基准测试：

from timeit import timeit

s1="""

def partition(values, indices):
    idx = 0
    for index in indices:
        sublist = []
        while idx < len(values) and values[idx] < index:
            sublist.append(values[idx])
            idx += 1
        if sublist:
            yield sublist

l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]*1000
l2 = [0.5, 1.0, 1.5, 2.0]
list(partition(l1, l2))

"""

s2="""
l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]*1000
l2 = [0.5, 1.0, 1.5, 2.0]
np.split(l1,np.searchsorted(l1,l2))
   """

print '1st: ' ,timeit(stmt=s1, number=10000)
print '2nd : ',timeit(stmt=s2, number=10000,setup="import numpy as np")

结果：

1st:  17.5872459412
2nd :  10.3306460381

- Mazdak

1

你应该将创建 l1 和 l2（以及定义 partition() 函数）的过程从 timeit 测试中移出。 - Martijn Pieters

我在1000次重复中得到了1.43与1.1的结果。对于一个纯Python实现来说，这已经不错了。 - Martijn Pieters

1

啊，有个问题。你不能简单地将 l1 乘以1000，因为我的解决方案需要 l1 排序。哪怕你确实按照正确的方式对列表进行排序，这对我来说也没有帮助，因为这会使我的解决方案稍微慢一些，因为它会产生更多的结果。 - Martijn Pieters

回答了自己的问题，我的解决方案使用纯Python更快。 - Padraic Cunningham

1

@PadraicCunningham 大10000000000000000倍！:-D - Mazdak

显示剩余4条评论

1

def split_l(a,b):
    it = iter(b)
    start, sub = next(it), []
    for ele in a:
        if ele >= start:
            yield sub
            sub, start = [], next(it)
        sub.append(ele)
    yield sub

print(list(split_l(l1,l2)))
[[0, 0.002, 0.3], [0.5, 0.6, 0.9], [1.3], [1.9]]

使用Kasras的输入，这个方法比接受的答案和numpy的解决方案都要好：

In [14]: l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]*1000

In [15]: l1.sort()

In [16]: l2 = [0.5, 1.0, 1.5, 2.0]

In [17]: timeit list(partition(l1,l2))
1000 loops, best of 3: 1.53 ms per loop

In [18]: timeit list(split_l(l1,l2))
1000 loops, best of 3: 703 µs per loop

In [19]: timeit np.split(l1,np.searchsorted(l1,l2))
1000 loops, best of 3: 802 µs per loop

In [20]: list(split_l(l1,l2))  == list(partition(l1,l2))
Out[20]: True

创建一个本地引用来追加，可以进一步减少工作量：

def split_l(a, b):
    it = iter(b)
    start, sub = next(it), []
    append = sub.append
    for ele in a:
        if start <= ele:
            yield sub
            start, sub = next(it), []
            append = sub.append
        append(ele)
    yield sub

运行时间略长于numpy解决方案：

In [47]: l1.sort()

In [48]: timeit list(split_l(l1,l2))
1000 loops, best of 3: 498 µs per loop

In [49]: timeit list(partition(l1,l2))
1000 loops, best of 3: 1.73 ms per loop

In [50]: timeit np.split(l1,np.searchsorted(l1,l2))
1000 loops, best of 3: 812 µs per loop

- Padraic Cunningham

0

l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]

l2 = [0.5, 1.0, 1.5, 2.0]


  def partition(values, indices):

    temp = []
    p_list = []


    for j in range(len(indices)):
        for i in range(len(values)):
            if indices[j] > values[i]:
                temp.append(values[i])

        p_list.append(temp)

        # added to the partition values are truncated from the list
        values = values[len(temp):]

        temp = []

    print(p_list)

partition(l1, l2)

[[0, 0.002, 0.3], [0.5, 0.6, 0.9], [1.3], [1.9]]

分割（l1，l2）

[[0，0.002，0.3]，[0.5，0.6，0.9]，[1.3]，[1.9]]

- LetzerWille

这和原帖的解决方案一样糟糕；你在这里使用了一个O(N*M)的二次算法。 - Martijn Pieters

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Martijn Pieters · Accepted Answer

你的列表已经排序，所以这里不需要进行双重循环。

以下代码基于两个输入列表生成子列表：

def partition(values, indices):
    idx = 0
    for index in indices:
        sublist = []
        while idx < len(values) and values[idx] < index:
            sublist.append(values[idx])
            idx += 1
        if sublist:
            yield sublist

您可以使用partition(l1,l2)迭代遍历单个子列表，或调用list()一次性生成整个列表。

>>> l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9] 
>>> l2 = [0.5, 1.0, 1.5, 2.0]
>>> list(partition(l1, l2))
[[0, 0.002, 0.3], [0.5, 0.6, 0.9], [1.3], [1.9]]