如何批量循环遍历Python列表？

Question

如何批量循环遍历Python列表？

21

一个文件包含10000行，每行有一个条目。我需要分批次（小块）地处理该文件。

file = open("data.txt", "r")
data = file.readlines()
file.close()

total_count = len(data) # equals to ~10000 or less
max_batch = 50 # loop through 'data' with 50 entries at max in each loop.

for i in range(total_count):
     batch = data[i:i+50] # first 50 entries
     result = process_data(batch) # some time consuming processing on 50 entries
     if result == True:
           # add to DB that 50 entries are processed successfully!
     else:
           return 0 # quit the operation
           # later start again from the point it failed.
           # say 51st or 2560th or 9950th entry

如何在下一次循环中选择第51到100个条目以此类推？

如果操作不成功并在中途停止，则需要根据数据库记录仅从失败的批次开始循环。

我无法编写正确的逻辑。我应该保留两个列表吗？还是其他什么方法？

- uwy59998

3

range(0, total_count, 50) - Shane

你的输入范围需要是 [i*50:(i+1)*50]，此外为什么要等待一个批次完成 - 你可以把 process_data 变成一个线程 - 参考这个链接 https://www.tutorialspoint.com/python/python_multithreading.htm - serup

5个回答

18

你很接近了。

chunks = (total_count - 1) // 50 + 1
for i in range(chunks):
     batch = data[i*50:(i+1)*50]

- wvdz

3

如果您不希望在“total_count % 50 == 0”时出现空批次，那么似乎这个解决方案是错误的。这种情况下，划分的批次数量应为“(total_count - 1) // 50 + 1”。 - AndersTornkvist

@AndersTornkvist：好发现！已编辑。 - wvdz

5

def chunk_list(datas, chunksize):
    """Split list into the chucks

    Params:
        datas     (list): data that want to split into the chunk
        chunksize (int) : how much maximum data in each chunks

    Returns:
        chunks (obj): the chunk of list
    """

    for i in range(0, len(datas), chunksize):
        yield datas[i:i + chunksize]

参考: https://www.codegrepper.com/code-examples/python/python+function+to+split+lists+into+batches

- Galuh Ramaditya

2

我非常喜欢使用funcy。这个函数可以帮你将列表分成块：https://funcy.readthedocs.io/en/stable/seqs.html#chunks

- Dan Fuller

1

使用Python 3.12，你可以使用`itertools.batched`（文档）函数：

for batch in itertools.batched(data, 50):
    result = process_data(batch) # some time consuming processing on 50 entries
    if result == True:
          # add to DB that 50 entries are processed successfully!
    else:
          return 0 # quit the operation

- ndclt

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- talsegal · Accepted Answer

l = [1,2,3,4,5,6,7,8,9,10]
batch_size = 3    

for i in range(0, len(l), batch_size):
    print(l[i:i+batch_size])
    # more logic here

>>> [1,2,3]
>>> [4,5,6]
>>> [7,8,9]
>>> [10]

我认为这是最直接和可读的方法。如果您需要重新尝试某个批次，您可以在循环内重试（串行），也可以为每个批次打开一个线程-这取决于应用程序...