使用openpyxl操作现有的Excel表格

Question

使用openpyxl操作现有的Excel表格

pythonopenpyxl

11

我目前正在磨练我的Python/Excel技能，并在使用openpyxl时遇到了问题。

我想打开一个工作簿，替换现有表格中的行并再次保存工作簿。

理想情况下，我希望首先能够删除表格中的所有行（但保留表格结构）。

我的初始工作簿包含一个名为“inputData”的工作表，在其中我有一个名为“Data”的表格，其中包含A、B、C三列和2行数据。

我还有一个名为“input.csv”的CSV文件，其中包含相同的三列和4行数据。

当我运行代码时，数据被写入工作表，但表格结构没有扩展以包括两行新数据。

有什么办法可以使用openpyxl更改命名表结构的数据源吗？

import csv
from openpyxl import load_workbook
from openpyxl.worksheet.table import Table, TableStyleInfo

wb = load_workbook(filename = 'workbook.xlsx')
ws = wb["inputData"]

with open('input.csv', newline='', encoding='utf-8-sig') as f:
    reader = csv.reader(f, delimiter=';')
    for i, row in enumerate(reader):
        if not i == 0:
            for j, cell in enumerate(row): 
                ws.cell(row=i+1, column=j+1).value = cell

wb.save('output.xlsx')

- Henrik Poulsen

你能更好地解释一下你想做什么吗？你所说的表结构/数据源是什么意思？是指“ws”吗？你是想删除一行还是添加一行？代码似乎只是在更改单元格 - 这是你想要的吗？ - kabanus

在Excel中，您可以直接将数据输入工作表，或创建表结构https://support.office.com/en-us/article/overview-of-excel-tables-7ab0bb7d-3a9e-4b56-a3c9-6c94334e492c，我希望使用Python来操作现有的表格，而不仅仅是向平面工作表添加行。 - Henrik Poulsen

你需要自己更改表的结构。这应该是可行的，但没有文档记录。 - Charlie Clark

所以您想要向“ws”（例如）添加行，然后将更改后的表保存到原始文档中？ - kabanus

4个回答

6

在2020年使用openpyxl==3.0.5时遇到了这个问题，希望分享我的解决方案，以便其他人也可以从中受益。

目标：从data.csv中读入新数据并添加到现有的file.xlsx文件中，使得公式仍然有效。列名保持不变。

输入：

带有一个表格上的公式和另一个表格上的数据的XLSX模板文件（file.xlsx）
新数据的data.csv文件

输出：XLSX文件，其中包含新数据和更新后用于公式的表格范围。

"""python imports"""
import openpyxl
import pandas

"""Load input workbook"""
wb = openpyxl.load_workbook(filename='file.xlsx')

"""Activate worksheet named 'data'."""
ws = wb['data']

"""Read in new data from data1.csv to pandas.dataframe"""
new_dataframe = pandas.read_csv("data1.csv")

"""Iterate over dataframe rows and write values to worksheet"""
for i, row in new_dataframe.iterrows():
    # ws.append leaves first line empty
    # to get data written from first row,  need to use writing to cell
    if i == 0:
        for c, value in enumerate(row, start=1):
            ws.cell(row=2, column=c).value = value
    else:
        current_row = [row.col1, row.col2, row.col3]
        ws.append(current_row)

"""Change table range"""
ws.tables['MyTable'].ref = "A1:E5"

"""Save workbook"""
wb.save(filename='file.xlsx')

回答 Ethan 的问题：如何仅更改范围：

# Find right table
my_table = ws.tables['Template']
# Change range
my_table.ref = ref="A7:{}{}".format(column_string(maxRef[1], maxRef[0]))
# change style
my_table.tableStyleInfo = my_style
wb.save('WorkbookName')

- Varje

1

首先，感谢您的帖子。我尝试扩展现有Excel文件中的现有表格（模板的副本）。我无法将表格扩展到实际放置数据的范围内（备注：某些表格元素包含需要保留的公式）。

我所做的是打开Excel文件，将数据复制并粘贴到正确的工作表和正确的单元格中。这按预期工作。不起作用的是扩展表格的范围，它最初仅覆盖第一行（除标题外）。

使用上述代码，我能够识别表格并尝试复制样式：

for i, table in enumerate(ws._tables):
        if table.name == 'Template':
            tableRef = i
            mystyle = table.tableStyleInfo
resTable = Table(displayName="Template", ref="A7:{}{}".format(colnum_string(maxRef[1]), maxRef[0]))
resTable.tableStyleInfo = mystyle
ws._tables[tableRef] = resTable

由于它无法正常工作，我可能会忽略某些内容。表格没有扩展。非常感谢您的帮助。

为了更好地理解问题：
表头是A7：BA7
第一行（空行），一些带有公式和格式的元素：A8：BA8
复制数据后的最终数据范围（例如，计算是正确的）：A8：BA255

- Ethan

0

这是我写的一个小函数，用于快速向命名表和单元格添加数据。

它期望这些表格为空，并且应该在创建、保存 Excel 模板报告后，使用 Python 后填充数据时使用。

from typing import Any, Optional, Union
import pandas
import openpyxl

def find_worksheet_and_table(woorkbook, table_name):
    
    for worksheet in workbook.worksheets:
        for table in worksheet.tables.values():
            if table.name == table_name:
                return worksheet, table
    
    raise KeyError(f'Unable to find table with name "{table_name}" in workbook!')
            
def populate_table(workbook, table_name, dataframe):
    
    """Modifies table and inserts values. Formulas are kepts as is."""
    
    worksheet, table = find_worksheet_and_table(workbook, table_name)
    
    table_start, table_end = table.ref.split(':')
    table_start_row = int(''.join(x for x in table_start if x.isdigit()))
    table_start_column = column_index_from_string(''.join(x for x in table_start if not x.isdigit()))
    table_end_column_letter = ''.join(x for x in table_end if not x.isdigit())
    new_table_ref = table_start + ':' + table_end_column_letter + str(table_start_row + len(dataframe))
    table_first_cell = worksheet.cell(row=table_start_row, column=table_start_column)

    table.ref = new_table_ref

    for col_ix, column in enumerate(table.tableColumns):
        if column.name in dataframe.columns:
            for row_ix, row_value in enumerate(dataframe[column.name]):
                cell_to_fill = table_first_cell.offset(column=col_ix, row=row_ix+1)
                cell_to_fill.value = row_value
        elif column.calculatedColumnFormula:
            for ix in range(len(dataframe)):
                cell_to_fill = table_first_cell.offset(column=col_ix, row=ix+1)
                cell_to_fill.value = '='+table.tableColumns[col_ix].calculatedColumnFormula.attr_text


def populate_excel(
    input_path: Union[str, pathlib.Path],
    output_path: Union[str, pathlib.Path],
    named_cell_using_single_values: Optional[dict[str, Any]],
    table_using_dataframe : Optional[dict[str, pandas.DataFrame]],
):
    
    """
    input_path: Excel file to read.
    output_path: Save location for output Excel file.
    named_cell_using_single_values: Dict of named cells with value to populate them with.
    table_using_dataframe: Dict of named table with dataframes to populate them with.
    
    Function to insert data into Excel using named cells and named tables.
    
    """
    
    workbook = openpyxl.load_workbook(input_path)
    
    if table_using_dataframe:
        for table_name, dataframe in table_using_dataframe.items():
            populate_table(workbook, table_name, dataframe)
            
    if named_cell_using_single_values:
        
        for named_cell, value in named_cell_using_single_values.items():
            if not named_cell in workbook.defined_names:
                raise KeyError(f"Unable to find named cell {named_cell} in workbook!")
        
            worksheet_to_update, coordinate_to_update = next(workbook.defined_names[named_cell].destinations)
            cell_to_update = workbook[worksheet_to_update][coordinate_to_update]
            cell_to_update.value = value
        
    workbook.save(output_path)
    workbook.close()
        
    
        

dataframe_to_write = pandas.DataFrame(data = {'Column1': range(500), 'Column2': range(500), 'Column3': range(500)})

populate_excel(
    'input.xlsx', 
    'output.xlsx', 
    {'my_named_cell': 'value for named cell!', 'my_named_cell2': 20},
    {'MyTable': dataframe_to_write}
)

- hirolau

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Henrik Poulsen · Accepted Answer

我找到了我的问题的答案。

我可以从openpyxl中访问表格，更改引用(范围)，然后再次保存回来。

这使我能够在同一张表格中输入更多数据，并让其他工作表中的公式考虑到新数据。

当我需要将大量数据推入现有的Excel表格而不破坏未来引用时，这将成为一个非常有用的功能。

import csv
from openpyxl import load_workbook
from openpyxl.worksheet.table import Table, TableStyleInfo
tableName = 'Data'

style = TableStyleInfo(name="TableStyleMedium9", showFirstColumn=False,
                       showLastColumn=False, showRowStripes=True, showColumnStripes=False)

def colnum_string(n):
    string = ""
    while n > 0:
        n, remainder = divmod(n - 1, 26)
        string = chr(65 + remainder) + string
    return string

wb = load_workbook(filename = 'workbook.xlsx')
ws = wb["inputData"]

with open('input.csv', newline='', encoding='utf-8-sig') as f:
    reader = csv.reader(f, delimiter=';')
    for i, row in enumerate(reader):
        for j, cell in enumerate(row): 
            if not i == 0:
                ws.cell(row=i+1, column=j+1).value = float(cell)
            else:
                ws.cell(row=i+1, column=j+1).value = cell

            maxRef = [i,j]

for i, table in enumerate(ws._tables):
    if table.name == tableName:
        tableRef = i

resTable = Table(displayName="Data", ref="A1:{}{}".format(colnum_string(maxRef[0]), maxRef[1]))
resTable.tableStyleInfo = style

ws._tables[tableRef] = resTable

wb.save('output.xlsx')