使用openpyxl中的优化读取器和ws.iter_rows迭代一系列行。

11

我需要读取一个10x5324单元格的xlsx文件

这是我尝试做的要点:

from openpyxl import load_workbook
filename = 'file_path'

wb = load_workbook(filename)
ws = wb.get_sheet_by_name('LOG')

col = {'Time':0 ...}

for i in ws.columns[col['Time']][1:]:
    print i.value.hour
代码运行时间比预期要长得多(我正在执行操作,而不是打印),一段时间后我变得不耐烦并取消了它。 有什么建议可以在优化读取器中解决这个问题吗? 我需要迭代一系列的行,而不是遍历所有的行。 这是我尝试过的,但是它是错误的:
wb = load_workbook(filename, use_iterators = True)
ws = wb.get_sheet_by_name('LOG')
for i in ws.iter_rows[1:]:
    print i[col['Time']].value.hour

有没有不使用range函数的方法可以实现它?

我猜一个方法是:

for i in ws.iter_rows[1:]:
    if i.row == startrow:
        continue
    print i[col['Time']].value.hour
    if i.row == endrow:
        break

但有没有更优雅的解决方案呢? (顺便说一下,那个也不起作用)

2个回答

22

最简单的带下限的解决方案可能是这样的:

# Your code:
from openpyxl import load_workbook
filename = 'file_path'
wb = load_workbook(filename, use_iterators=True)
ws = wb.get_sheet_by_name('LOG')

# Solution 1:
for row in ws.iter_rows(row_offset=1):
    # code to execute per row...

以下是另一种执行您所描述操作的方式,使用enumerate函数:

# Solution 2:
start, stop = 1, 100    # This will allow you to set a lower and upper limit
for index, row in enumerate(ws.iter_rows()):
    if start < index < stop:
        # code to execute per row...

索引变量保持计算当前行数,因此可以将其用于range或xrange的位置。这种方法非常直接,与range或切片不同,可以与迭代器一起使用,而且如果需要,也可以仅使用下限。干杯!


1
可以在解决方案2中添加一个else语句来跳出for循环,一旦超过stop行,就停止迭代所有行。 - Craig Waddington

5

文档中得知:

Note: When a worksheet is created in memory, it contains no cells. They are created when first accessed. This way we don’t create objects that would never be accessed, thus reducing the memory footprint.

Warning: Because of this feature, scrolling through cells instead of accessing them directly will create them all in memory, even if you don’t assign them a value. Something like

>>> for i in xrange(0,100):
...             for j in xrange(0,100):
...                     ws.cell(row = i, column = j)

will create 100x100 cells in memory, for nothing.

However, there is a way to clean all those unwanted cells, we’ll see that later.

我认为访问列或行属性会导致许多单元格被加载到内存中。我建议只尝试直接访问您需要的单元格。
例如:
col_name = 'A'
start_row = 1
end_row = 99

range_expr = "{col}{start_row}:{col}{end_row}".format(
    col=col_name, start_row=start_row, end_row=end_row)

for (time_cell,) in ws.iter_rows(range_string=range_expr):
    print time_cell.value.hour

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接