如何在使用Python的OpenPyXL包写入数据后保持样式格式不变?

9
我正在使用openpyxl库包来读写现有的Excel文件test.xlsx中的一些数据。
在向其写入一些数据之前,文件的内容如下所示:

enter image description here

  • A1单元格包含高棉Unicode字符,英文字符以粗体显示。

  • A3单元格使用了Lemons1字体,英文字符以斜体显示。

我使用以下脚本将数据“这是我”写入该Excel文件的B2单元格:
from openpyxl import load_workbook
import os
FILENAME1 = os.path.dirname(__file__)+'/test.xlsx'
from flask import make_response
from openpyxl.writer.excel import save_virtual_workbook
from app import app

@app.route('/testexel', methods=['GET'])
def testexel():
    with app.app_context():
        try:
            filename = 'test'
            workbook = load_workbook(FILENAME1, keep_links=False)
            sheet = workbook['Sheet1']
            sheet['B2']='It is me'

            response = make_response(save_virtual_workbook(workbook))
            response.headers['Cache-Control'] = 'no-cache'
            response.headers["Content-Disposition"] = "attachment; filename=%s.xlsx" % filename
            response.headers["Content-type"] = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet; charset=utf-8"

            return response

        except Exception as e:

            raise

然后生成的Excel文件格式被修改成这样,而我从未想过它会变成这个样子:enter image description here
在向文件写入数据之前,格式风格与原始文件相比有很大的不同:
- 单元格A1中的所有数据都是粗体,采用英文字符的样式格式。 - 单元格B3中的英文字符变成了普通样式,并且字体已更改为字体limons1,取自其前面的柬埔寨字符。
我尝试实现的目标是,在向文件写入附加数据的同时,保持文件现有内容的格式(样式和字体)不变。
请问我的脚本有什么问题,如何才能在运行上述脚本后保持现有的样式和字体不变?谢谢。

1
问题是重复的,答案没有改变。 - Charlie Clark
@CharlieClark 如果不可能的话,您能提供另一种解决方案吗?可以是另一个包吗?非常感谢您的解决方案。谢谢。 - Houy Narun
@HouyNarun 如果您将新数据写入现有内容之后的单元格(例如B4单元格),会发生什么? - Yvonne Aburrow
看起来他们可能需要给你另一个Excel文件。 :( - Yvonne Aburrow
@YvonneAburrow 是的,我的任务是将数据写入给定的Excel文件,同时保持其他内容不变。 - Houy Narun
显示剩余6条评论
4个回答

14

Excel文件(扩展名为.xlsx)实际上是zip存档。 (您实际上可以使用7-zip或某些类似的程序打开Excel文件。)因此,Excel文件包含一堆xml文件,其中存储了数据。 Openpyxl所做的就是在打开Excel文件时从这些xml文件中读取数据,并在保存Excel文件时创建带有xml文件的zip存档。简单地说,openpyxl会读取一些xml文件,然后解析该数据,然后您可以使用openpyxl库中的函数来更改和添加数据,最后当您保存工作簿时,openpyxl将创建xml文件,将数据写入它们,并将它们保存为zip存档(即Excel文件)。这些XML文件包含存储在Excel文件中的所有数据(一个XML文件包含Excel文件中的公式,另一个将包含样式,在其他文件中将包含有关Excel主题的数据等)。我们只关心存储在两个XML文件中的Excel文件中的字符串:

  • sharedStrings.xml

    This file contains all strings in excel file and formatting of those strings, here is an example:

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
     <sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1" uniqueCount="1">
         <si>
             <r>
                 <rPr>
                     <b/>
                     <sz val="22"/>
                     <color theme="1"/>
                     <rFont val="Calibri"/>
                     <family val="2"/>
                     <scheme val="minor"/>
                 </rPr>
                 <t>Hello</t>
             </r>
             <r>
                 <rPr>
                     <sz val="22"/>
                     <color theme="1"/>
                     <rFont val="Calibri"/>
                     <family val="2"/>
                     <scheme val="minor"/>
                 </rPr>
                 <t xml:space="preserve"></t>
             </r>
         </si>
     </sst>
    
  • sheet1.xml

    This file contains position of your strings (which cell contains which string). (There will be one file for each sheet in you excel file, but let's say you have only one sheet in your file for purpose of this example.) Here is an example:

     <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
     <worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x14ac xr xr2 xr3" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac" xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision" xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2" xmlns:xr3="http://schemas.microsoft.com/office/spreadsheetml/2016/revision3" xr:uid="{00000000-0001-0000-0000-000000000000}">
         <dimension ref="A1:C3"/>
         <sheetViews>
             <sheetView tabSelected="1" zoomScaleNormal="100" workbookViewId="0">
                 <selection activeCell="A3" sqref="A3"/>
             </sheetView>
         </sheetViews>
         <sheetFormatPr defaultRowHeight="15" x14ac:dyDescent="0.25"/>
         <cols>
             <col min="1" max="1" width="20.140625" customWidth="1"/>
             <col min="2" max="2" width="10.7109375" customWidth="1"/>
         </cols>
         <sheetData>
             <row r="1" spans="1:3" ht="60.75" customHeight="1" x14ac:dyDescent="0.45">
                 <c r="A1" s="4" t="s">
                     <v>0</v>
                 </c>
             </row>
             <row r="2" spans="1:3" ht="19.5" customHeight="1" x14ac:dyDescent="0.35">
                 <c r="A2" s="1"/>
                 <c r="B2" s="3"/>
             </row>
             <row r="3" spans="1:3" ht="62.25" customHeight="1" x14ac:dyDescent="0.5">
                 <c r="A3" s="5" t="s">
                     <v>1</v>
                 </c>
                 <c r="C3" s="2"/>
             </row>
         </sheetData>
         <pageMargins left="0.75" right="0.75" top="1" bottom="1" header="0.5" footer="0.5"/>
         <pageSetup paperSize="9" orientation="portrait" r:id="rId1"/>
     </worksheet>
    

如果你使用openpyxl打开这个Excel文件并保存(不更改任何数据),那么sharedStrings.xml将会如下所示:

<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" uniqueCount="2">
    <si>
        <t>Hello &#6031;</t>
    </si>
    <si>
        <t>&#6030; sike</t>
    </si>
</sst>

正如您所看到的,您将失去所有单元格(字符串)的原始格式,并且您将获得某种合并格式的单元格(因此,如果单元格中的某些字符是粗体,而另一些字符不是,则保存文件时,整个单元格将是粗体或整个单元格将是普通)。现在人们要求开发人员实现这种富文本选项(link1link2),但他们说这将很难实现。我同意这不容易做到,但我们可以做一些更简单的事情:当我们打开Excel文件时,我们可以从 sharedStrings.xml 中获取数据,然后在我们想要保存Excel文件时使用该xml代码,但仅适用于在我们打开文件时存在的单元格。这可能不容易理解,因此让我们看以下示例:

假设您有这样的Excel文件: enter image description here

对于这个Excel文件,sharedStrings.xml将是这样的:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1" uniqueCount="1">
    <si>
        <r>
            <rPr>
                <b/>
                <sz val="22"/>
                <color theme="1"/>
                <rFont val="Calibri"/>
                <family val="2"/>
                <scheme val="minor"/>
            </rPr>
            <t>Hello</t>
        </r>
        <r>
            <rPr>
                <sz val="22"/>
                <color theme="1"/>
                <rFont val="Calibri"/>
                <family val="2"/>
                <scheme val="minor"/>
            </rPr>
            <t xml:space="preserve"></t>
        </r>
    </si>
</sst>

如果您运行以下Python代码:
from openpyxl import load_workbook
workbook = load_workbook(FILENAME1, keep_links=False)
sheet = workbook.active
sheet['A2'] = 'It is me'
workbook.save('out.xlsx')

文件 out.xlsx 的样子将会是这样:

enter image description here

对于 out.xlsx 文件,其 sharedStrings.xml 将如下所示:

<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" uniqueCount="2">
    <si>
        <t>Hello &#6031;</t>
    </si>
    <si>
        <t>It is me</t>
    </si>
</sst>

所以我们想要做的是使用这段 XML 代码:

<si>
    <r>
        <rPr>
            <b/>
            <sz val="22"/>
            <color theme="1"/>
            <rFont val="Calibri"/>
            <family val="2"/>
            <scheme val="minor"/>
        </rPr>
        <t>Hello</t>
    </r>
    <r>
        <rPr>
            <sz val="22"/>
            <color theme="1"/>
            <rFont val="Calibri"/>
            <family val="2"/>
            <scheme val="minor"/>
        </rPr>
        <t xml:space="preserve"> ត</t>
    </r>
</si>

对于包含Hello ត的旧单元格A1和以下XML代码:

<si>
    <t>It is me</t>
</si>

对于包含It is me的新单元格A2。

因此,我们可以将这些XML部分组合起来,以获得如下所示的XML文件:

<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" uniqueCount="2">
    <si>
        <r>
            <rPr>
                <b/>
                <sz val="22"/>
                <color theme="1"/>
                <rFont val="Calibri"/>
                <family val="2"/>
                <scheme val="minor"/>
            </rPr>
            <t>Hello</t>
        </r>
        <r>
            <rPr>
                <sz val="22"/>
                <color theme="1"/>
                <rFont val="Calibri"/>
                <family val="2"/>
                <scheme val="minor"/>
            </rPr>
            <t xml:space="preserve"> ត</t>
        </r>
    </si>
    <si>
        <t>It is me</t>
    </si>
</sst>

我编写了一些函数来实现这个功能。 (代码很多,但大部分只是从openpyxl复制过来的。如果您更改了openpyxl库,您可以用10或20行代码完成此操作,但这永远不是一个好主意,所以我宁愿复制整个函数并更改需要更改的那个小部分。)
您可以将以下代码保存在单独的文件extendedopenpyxl.py中:
from openpyxl import load_workbook as openpyxlload_workbook
from openpyxl.reader.excel import _validate_archive, _find_workbook_part
from openpyxl.reader.worksheet import _get_xml_iter
from openpyxl.xml.functions import fromstring, iterparse, safe_iterator, tostring, Element, xmlfile, SubElement
from openpyxl.xml.constants import ARC_CONTENT_TYPES, SHEET_MAIN_NS, SHARED_STRINGS, ARC_ROOT_RELS, ARC_APP, ARC_CORE, ARC_THEME, ARC_SHARED_STRINGS, ARC_STYLE, ARC_WORKBOOK, ARC_WORKBOOK_RELS
from openpyxl.packaging.manifest import Manifest
from openpyxl.packaging.relationship import get_dependents, get_rels_path
from openpyxl.packaging.workbook import WorkbookParser
from openpyxl.packaging.extended import ExtendedProperties
from openpyxl.utils import coordinate_to_tuple
from openpyxl.cell.text import Text
from openpyxl.writer.excel import ExcelWriter as openpyxlExcelWriter
from openpyxl.writer.workbook import write_root_rels, write_workbook_rels, write_workbook
from openpyxl.writer.theme import write_theme
from openpyxl.writer.etree_worksheet import get_rows_to_write
from openpyxl.styles.stylesheet import write_stylesheet
from zipfile import ZipFile, ZIP_DEFLATED
from operator import itemgetter
from io import BytesIO
from xml.etree.ElementTree import tostring as xml_tostring
from xml.etree.ElementTree import register_namespace
from lxml.etree import fromstring as lxml_fromstring

register_namespace('', 'http://schemas.openxmlformats.org/spreadsheetml/2006/main')

def get_value_cells(workbook):
    value_cells = []
    for idx, worksheet in enumerate(workbook.worksheets, 1):
        all_rows = get_rows_to_write(worksheet)
        for row_idx, row in all_rows:
            row = sorted(row, key=itemgetter(0))
            for col, cell in row:
                if cell._value is not None:
                    if cell.data_type == 's':
                        value_cells.append((worksheet.title,(cell.row, cell.col_idx)))
    return value_cells

def check_if_lxml(element):
    if type(element).__module__ == 'xml.etree.ElementTree':
        string = xml_tostring(element)
        el = lxml_fromstring(string)
        return el
    return element

def write_string_table(workbook):
    string_table = workbook.shared_strings
    workbook_data = workbook.new_interal_value_workbook_data
    data_strings = workbook.new_interal_value_data_strings
    value_cells = get_value_cells(workbook)
    out = BytesIO()
    i = 0
    with xmlfile(out) as xf:
        with xf.element("sst", xmlns=SHEET_MAIN_NS, uniqueCount="%d" % len(string_table)):            
            for i, key in enumerate(string_table):
                sheetname, coordinates = value_cells[i]
                if coordinates in workbook_data[sheetname]:
                    value = workbook_data[sheetname][coordinates]
                    xml_el = data_strings[value]
                    el = check_if_lxml(xml_el)
                else:
                    el = Element('si')
                    text = SubElement(el, 't')
                    text.text = key
                    if key.strip() != key:
                        text.set(PRESERVE_SPACE, 'preserve')
                xf.write(el)

    return  out.getvalue()


class ExcelWriter(openpyxlExcelWriter):

    def write_data(self):
        """Write the various xml files into the zip archive."""
        # cleanup all worksheets
        archive = self._archive

        archive.writestr(ARC_ROOT_RELS, write_root_rels(self.workbook))
        props = ExtendedProperties()
        archive.writestr(ARC_APP, tostring(props.to_tree()))

        archive.writestr(ARC_CORE, tostring(self.workbook.properties.to_tree()))
        if self.workbook.loaded_theme:
            archive.writestr(ARC_THEME, self.workbook.loaded_theme)
        else:
            archive.writestr(ARC_THEME, write_theme())

        self._write_worksheets()
        self._write_chartsheets()
        self._write_images()
        self._write_charts()

        string_table_out = write_string_table(self.workbook)
        self._archive.writestr(ARC_SHARED_STRINGS, string_table_out)
        self._write_external_links()

        stylesheet = write_stylesheet(self.workbook)
        archive.writestr(ARC_STYLE, tostring(stylesheet))

        archive.writestr(ARC_WORKBOOK, write_workbook(self.workbook))
        archive.writestr(ARC_WORKBOOK_RELS, write_workbook_rels(self.workbook))

        self._merge_vba()

        self.manifest._write(archive, self.workbook)

        return

    def save(self, filename):
        self.write_data()
        self._archive.close()
        return
    

def get_coordinates(cell, row_count, col_count):
    coordinate = cell.get('r')
    if coordinate:
        row, column = coordinate_to_tuple(coordinate)
    else:
        row, column = row_count, col_count
    return row, column

def parse_cell(cell):
    VALUE_TAG = '{%s}v' % SHEET_MAIN_NS
    value = cell.find(VALUE_TAG)
    if value is not None:
        value = int(value.text)
    return value

def parse_row(row, row_count):
    CELL_TAG = '{%s}c' % SHEET_MAIN_NS
    if row.get('r'):
        row_count = int(row.get('r'))
    else:
        row_count += 1
    col_count = 0
    data = dict()
    for cell in safe_iterator(row, CELL_TAG):
        col_count += 1
        value = parse_cell(cell)
        if value is not None:
            coordinates = get_coordinates(cell, row_count, col_count)
            data[coordinates] = value
    return data


def parse_sheet(xml_source):
    dispatcher = ['{%s}mergeCells' % SHEET_MAIN_NS, '{%s}col' % SHEET_MAIN_NS, '{%s}row' % SHEET_MAIN_NS, '{%s}conditionalFormatting' % SHEET_MAIN_NS, '{%s}legacyDrawing' % SHEET_MAIN_NS, '{%s}sheetProtection' % SHEET_MAIN_NS, '{%s}extLst' % SHEET_MAIN_NS, '{%s}hyperlink' % SHEET_MAIN_NS, '{%s}tableParts' % SHEET_MAIN_NS]
    row_count = 0
    stream = _get_xml_iter(xml_source)
    it = iterparse(stream, tag=dispatcher)
    row_tag = '{%s}row' % SHEET_MAIN_NS
    data = dict()
    for _, element in it:
        tag_name = element.tag
        if tag_name == row_tag:
            row_data = parse_row(element, row_count)
            data.update(row_data)
            element.clear()
    return data


def get_workbook_parser(archive):
    src = archive.read(ARC_CONTENT_TYPES)
    root = fromstring(src)
    package = Manifest.from_tree(root)
    wb_part = _find_workbook_part(package)
    workbook_part_name = wb_part.PartName[1:]
    parser = WorkbookParser(archive, workbook_part_name)
    parser.parse()
    return parser, package

def get_data_strings(xml_source):
    STRING_TAG = '{%s}si' % SHEET_MAIN_NS
    strings = []
    src = _get_xml_iter(xml_source)
    for _, node in iterparse(src):
        if node.tag == STRING_TAG:
            strings.append(node)

    return strings

def load_workbook(filename, *args, **kwargs):
    workbook = openpyxlload_workbook(filename, *args, **kwargs)

    archive = _validate_archive(filename)
    parser, package = get_workbook_parser(archive)

    workbook_data = dict()
    for sheet, rel in parser.find_sheets():
        sheet_name = sheet.name
        worksheet_path = rel.target
        fh = archive.open(worksheet_path)
        sheet_data = parse_sheet(fh)
        workbook_data[sheet_name] = sheet_data

    data_strings = []
    ct = package.find(SHARED_STRINGS)
    if ct is not None:
        strings_path = ct.PartName[1:]
        strings_source = archive.read(strings_path)
        data_strings = get_data_strings(strings_source)

    workbook.new_interal_value_workbook_data = workbook_data
    workbook.new_interal_value_data_strings = data_strings
    return workbook

def save_workbook(workbook, filename,):
    archive = ZipFile(filename, 'w', ZIP_DEFLATED, allowZip64=True)
    writer = ExcelWriter(workbook, archive)
    writer.save(filename)
    return True

def save_virtual_workbook(workbook,):
    temp_buffer = BytesIO()
    archive = ZipFile(temp_buffer, 'w', ZIP_DEFLATED, allowZip64=True)
    writer = ExcelWriter(workbook, archive)
    try:
        writer.write_data()
    finally:
        archive.close()
    virtual_workbook = temp_buffer.getvalue()
    temp_buffer.close()
    return virtual_workbook

现在,如果您运行此代码:

from extendedopenpyxl import load_workbook, save_workbook

workbook = load_workbook(FILENAME1, keep_links=False)
sheet = workbook['Sheet']
sheet['A2'] = 'It is me'
save_workbook(workbook, 'out.xlsx')

当我在上面例子中使用的Excel文件上运行这段代码时,我得到了以下结果:

enter image description here

正如您所看到的,单元格A1中的文本格式与原来一样(Hello加粗,不加粗)。

编辑West's comment之后)

如果您使用的是比2.5.14版本更高的openpyxl版本,则上面的代码将无法工作,因为openpyxl完全改变了它在excel文件中存储值的方式。 我已经修复了extendedopenpyxl.py中的部分代码,并且以下代码应该可以在新版本的openpyxl上工作(我在3.0.6版本上测试过):

from openpyxl.reader.excel import ExcelReader, _validate_archive
from openpyxl.xml.constants import SHEET_MAIN_NS, SHARED_STRINGS, ARC_SHARED_STRINGS, ARC_APP, ARC_CORE, ARC_THEME, ARC_STYLE, ARC_ROOT_RELS, ARC_WORKBOOK, ARC_WORKBOOK_RELS
from openpyxl.xml.functions import iterparse, xmlfile, tostring
from openpyxl.utils import coordinate_to_tuple
import openpyxl.cell._writer
from zipfile import ZipFile, ZIP_DEFLATED
from openpyxl.writer.excel import ExcelWriter
from io import BytesIO
from xml.etree.ElementTree import register_namespace
from xml.etree.ElementTree import tostring as xml_tostring
from lxml.etree import fromstring as lxml_fromstring
from openpyxl.worksheet._writer import WorksheetWriter
from openpyxl.workbook._writer import WorkbookWriter
from openpyxl.packaging.extended import ExtendedProperties
from openpyxl.styles.stylesheet import write_stylesheet
from openpyxl.packaging.relationship import Relationship
from openpyxl.cell._writer import write_cell
from openpyxl.drawing.spreadsheet_drawing import SpreadsheetDrawing
from openpyxl import LXML
from openpyxl.packaging.manifest import DEFAULT_OVERRIDE, Override, Manifest


DEFAULT_OVERRIDE.append(Override("/" + ARC_SHARED_STRINGS, SHARED_STRINGS))


def to_integer(value):
    if type(value) == int:
        return value
    if type(value) == str:
        try:
            num = int(value)
            return num
        except ValueError:
            num = float(value)
            if num.is_integer():
                return int(num)
    raise ValueError('Value {} is not an integer.'.format(value))
    return

def parse_cell(cell):
    VALUE_TAG = '{%s}v' % SHEET_MAIN_NS
    data_type = cell.get('t', 'n')
    value = None
    if data_type == 's':
        value = cell.findtext(VALUE_TAG, None) or None
        if value is not None:
            value = int(value)
    return value

def get_coordinates(cell, row_counter, col_counter):
    coordinate = cell.get('r')
    if coordinate:
        row, column = coordinate_to_tuple(coordinate)
    else:
        row, column = row_counter, col_counter
    return row, column

def parse_row(row, row_counter):
    row_counter = to_integer(row.get('r', row_counter + 1))
    col_counter = 0
    data = dict()
    for cell in row:
        col_counter += 1
        value = parse_cell(cell)
        if value is not None:
            coordinates = get_coordinates(cell, row_counter, col_counter)
            data[coordinates] = value
            col_counter = coordinates[1]
    return data, row_counter

def parse_sheet(xml_source):
    ROW_TAG = '{%s}row' % SHEET_MAIN_NS
    row_counter = 0
    it = iterparse(xml_source)
    data = dict()
    for _, element in it:
        tag_name = element.tag
        if tag_name == ROW_TAG:
            pass
            row_data, row_counter = parse_row(element, row_counter)
            data.update(row_data)
            element.clear()
    return data

def extended_archive_open(archive, name):
    with archive.open(name,) as src:
        namespaces = {node[0]: node[1] for _, node in
                      iterparse(src, events=['start-ns'])}
    for key, value in namespaces.items():
        register_namespace(key, value)
    return archive.open(name,)

def get_data_strings(xml_source):
    STRING_TAG = '{%s}si' % SHEET_MAIN_NS
    strings = []
    for _, node in iterparse(xml_source):
        if node.tag == STRING_TAG:
            strings.append(node)
    return strings
    
def load_workbook(filename, read_only=False, keep_vba=False,
                  data_only=False, keep_links=True):
    reader = ExcelReader(filename, read_only, keep_vba,
                        data_only, keep_links)
    reader.read()

    archive = _validate_archive(filename)
    
    workbook_data = dict()
    for sheet, rel in reader.parser.find_sheets():
        if rel.target not in reader.valid_files or "chartsheet" in rel.Type:
            continue
        fh = archive.open(rel.target)
        sheet_data = parse_sheet(fh)
        workbook_data[sheet.name] = sheet_data

    data_strings = []
    ct = reader.package.find(SHARED_STRINGS)
    if ct is not None:
        strings_path = ct.PartName[1:]
        with extended_archive_open(archive, strings_path) as src:
            data_strings = get_data_strings(src)

    archive.close()

    workbook = reader.wb
    workbook._extended_value_workbook_data = workbook_data
    workbook._extended_value_data_strings = data_strings
    return workbook

def check_if_lxml(element):
    if type(element).__module__ == 'xml.etree.ElementTree':
        string = xml_tostring(element)
        el = lxml_fromstring(string)
        return el
    return element

def write_string_table(workbook):
    workbook_data = workbook._extended_value_workbook_data
    data_strings = workbook._extended_value_data_strings
    out = BytesIO()
    with xmlfile(out) as xf:
        with xf.element("sst", xmlns=SHEET_MAIN_NS, uniqueCount="%d" % len(data_strings)):
            for sheet in workbook_data:
                for coordinates, value in workbook_data[sheet].items():
                    xml_el = data_strings[value]
                    el = check_if_lxml(xml_el)
                    xf.write(el)
    return out.getvalue()

def check_cell(cell):
    if cell.data_type != 's':
        return False
    if cell._comment is not None:
        return False
    if cell.hyperlink:
        return False
    return True

def extended_write_cell(xf, worksheet, cell, styled=None):
        workbook_data = worksheet.parent._extended_value_workbook_data
        for sheet in workbook_data.values():
            if (cell.row, cell.column) in sheet and check_cell(cell):
                attributes = {'r': cell.coordinate, 't': cell.data_type}
                if styled:
                    attributes['s'] = '%d' % cell.style_id
                if LXML:
                    with xf.element('c', attributes):
                        with xf.element('v'):
                            xf.write('%.16g' % sheet[(cell.row, cell.column)])                    
                else:
                    el = Element('c', attributes)
                    cell_content = SubElement(el, 'v')
                    cell_content.text = '%.16g' % sheet[(cell.row, cell.column)]
                    xf.write(el)
                break
        else:
            write_cell(xf, worksheet, cell, styled)
        return

class ExtendedWorksheetWriter(WorksheetWriter):

    def write_row(self, xf, row, row_idx):
        attrs = {'r': f"{row_idx}"}
        dims = self.ws.row_dimensions
        attrs.update(dims.get(row_idx, {}))

        with xf.element("row", attrs):

            for cell in row:
                if cell._comment is not None:
                    comment = CommentRecord.from_cell(cell)
                    self.ws._comments.append(comment)
                if (
                    cell._value is None
                    and not cell.has_style
                    and not cell._comment
                    ):
                    continue
                extended_write_cell(xf, self.ws, cell, cell.has_style)
        return


class ExtendedWorkbookWriter(WorkbookWriter):

    def write_rels(self, *args, **kwargs):
        styles =  Relationship(type='sharedStrings', Target='sharedStrings.xml')
        self.rels.append(styles)
        return super().write_rels(*args, **kwargs) 

class ExtendedExcelWriter(ExcelWriter):

    def __init__(self, workbook, archive):
        self._archive = archive
        self.workbook = workbook
        self.manifest = Manifest(Override = DEFAULT_OVERRIDE)
        self.vba_modified = set()
        self._tables = []
        self._charts = []
        self._images = []
        self._drawings = []
        self._comments = []
        self._pivots = []
        return

    def write_data(self):
        archive = self._archive
        props = ExtendedProperties()
        archive.writestr(ARC_APP, tostring(props.to_tree()))
        archive.writestr(ARC_CORE, tostring(self.workbook.properties.to_tree()))
        if self.workbook.loaded_theme:
            archive.writestr(ARC_THEME, self.workbook.loaded_theme)
        else:
            archive.writestr(ARC_THEME, theme_xml)
        self._write_worksheets()
        self._write_chartsheets()
        self._write_images()
        self._write_charts()

        if self.workbook._extended_value_workbook_data \
                and self.workbook._extended_value_data_strings:
            string_table_out = write_string_table(self.workbook)
            self._archive.writestr(ARC_SHARED_STRINGS, string_table_out)
        
        self._write_external_links()
        stylesheet = write_stylesheet(self.workbook)
        archive.writestr(ARC_STYLE, tostring(stylesheet))
        
        writer = ExtendedWorkbookWriter(self.workbook)
        
        archive.writestr(ARC_ROOT_RELS, writer.write_root_rels())
        archive.writestr(ARC_WORKBOOK, writer.write())
        archive.writestr(ARC_WORKBOOK_RELS, writer.write_rels())
        self._merge_vba()
        self.manifest._write(archive, self.workbook)
        return

    def write_worksheet(self, ws):
        ws._drawing = SpreadsheetDrawing()
        ws._drawing.charts = ws._charts
        ws._drawing.images = ws._images
        if self.workbook.write_only:
            if not ws.closed:
                ws.close()
            writer = ws._writer
        else:
            writer = ExtendedWorksheetWriter(ws)
            writer.write()

        ws._rels = writer._rels
        self._archive.write(writer.out, ws.path[1:])
        self.manifest.append(ws)
        writer.cleanup()
        return

def save_workbook(workbook, filename):
    archive = ZipFile(filename, 'w', ZIP_DEFLATED, allowZip64=True)
    writer = ExtendedExcelWriter(workbook, archive)
    writer.save()
    return True

@West,我已经编辑了我的答案,如果你使用新的代码替换extendedopenpyxl.py文件,你的程序应该可以正常工作。 - ands
你能在第46行的value = int(value)之前添加一行代码print([value], type(value)),并告诉我它打印出了什么吗? - ands
字符串类型。该值是第一个工作表中单元格B1的文本值。例如,单元格B1中的值为“税额”,错误会将该值选择为“税”。您在您的端上成功测试了解决方案吗? - West
据我所知,sheet1.xml 中的值标签(<v>0</v>)应该只包含整数。是的,我在这个文件上进行了测试。你能否创建一个文件副本,删除除单元格 B1 以外的所有内容,然后测试 Python 代码,如果显示相同的错误,则将该文件发送给我? - ands
我修复了值标签的问题。(我忘记检查只有包含字符串的单元格内的值标签才会被考虑。)我还删除了Zipfile.seek()方法。你能否检查更新后的 extendedopenpyxl.py 代码是否正常运行? - ands
显示剩余3条评论

2
根据这个问题的答案,您可以使用openpyxl格式化Excel中的单元格。
那里给出的答案只将目标单元格更改为粗体,但也许您可以将字体更改回lemons1
from openpyxl.workbook import Workbook
from openpyxl.styles import Font
wb = Workbook()
ws = wb.active
ws['B3'] = "Hello"
ws['B3'].font =  Font(name='lemons1', size=14)
wb.save("FontDemo.xlsx")

然而,根据文档,你只能将样式应用于整个单元格,而不能应用于单元格的一部分。因此,您需要将高棉字符放在一个单元格中,将英文字符放在另一个单元格中。


除此之外,您是否有任何推荐的替代解决方案,例如其他包、库,可以实现上述结果?非常感谢。 - Houy Narun
抱歉,直到看到你的帖子我才第一次听说这个软件包。你可能想要调查与Google电子表格交互的API? - Yvonne Aburrow
@Yvonne,你可以帮忙看一下这个问题吗?https://stackoverflow.com/questions/59888367/how-to-keep-style-format-unchanged-after-writing-data-using-openxlsx-in-r/59889723#59889723 - user10072460
我看了一下,但那个答案似乎很有道理。 - Yvonne Aburrow

0

@ands的回答几乎是正确的,但还有一些遗漏。

sharedStrings.xml管理所有工作表中所有富文本的列表。

这意味着所有工作表都使用一个字符串列表。

而sheet.xml中的123是共享字符串列表中索引为123的位置。

openpyxl包括读取共享字符串的代码,但写入部分尚未完成。

源代码中有关于写共享字符串的注释。 https://foss.heptapod.net/openpyxl/openpyxl/-/blob/branch/3.0/openpyxl/writer/excel.py

        self._write_worksheets()
        self._write_chartsheets()
        self._write_images()
        self._write_charts()

        #self._archive.writestr(ARC_SHARED_STRINGS,
                              #write_string_table(self.workbook.shared_strings))
        self._write_external_links()

        stylesheet = write_stylesheet(self.workbook)
        archive.writestr(ARC_STYLE, tostring(stylesheet))

我更新了@ands的代码,并将其适用于具有多个工作表的文件。

from openpyxl.reader.excel import ExcelReader, _validate_archive
from openpyxl.xml.constants import SHEET_MAIN_NS, SHARED_STRINGS, ARC_SHARED_STRINGS, ARC_APP, ARC_CORE, ARC_THEME, ARC_STYLE, ARC_ROOT_RELS, ARC_WORKBOOK, ARC_WORKBOOK_RELS
from openpyxl.xml.functions import iterparse, xmlfile, tostring
from openpyxl.utils import coordinate_to_tuple
import openpyxl.cell._writer
from zipfile import ZipFile, ZIP_DEFLATED
from openpyxl.writer.excel import ExcelWriter
from io import BytesIO
from xml.etree.ElementTree import register_namespace
from xml.etree.ElementTree import tostring as xml_tostring
from lxml.etree import fromstring as lxml_fromstring
from openpyxl.worksheet._writer import WorksheetWriter
from openpyxl.workbook._writer import WorkbookWriter
from openpyxl.packaging.extended import ExtendedProperties
from openpyxl.styles.stylesheet import write_stylesheet
from openpyxl.packaging.relationship import Relationship
from openpyxl.cell._writer import write_cell
from openpyxl.drawing.spreadsheet_drawing import SpreadsheetDrawing
from openpyxl import LXML
from openpyxl.packaging.manifest import DEFAULT_OVERRIDE, Override, Manifest
import openpyxl

DEFAULT_OVERRIDE.append(Override("/" + ARC_SHARED_STRINGS, SHARED_STRINGS))


def to_integer(value):
    if type(value) == int:
        return value
    if type(value) == str:
        try:
            num = int(value)
            return num
        except ValueError:
            num = float(value)
            if num.is_integer():
                return int(num)
    raise ValueError('Value {} is not an integer.'.format(value))
    return

def parse_cell(cell):
    VALUE_TAG = '{%s}v' % SHEET_MAIN_NS
    data_type = cell.get('t', 'n')
    value = None
    if data_type == 's':
        value = cell.findtext(VALUE_TAG, None) or None
        if value is not None:
            value = int(value)
    return value

def get_coordinates(cell, row_counter, col_counter):
    coordinate = cell.get('r')
    if coordinate:
        row, column = coordinate_to_tuple(coordinate)
    else:
        row, column = row_counter, col_counter
    return row, column

def parse_row(row, row_counter):
    row_counter = to_integer(row.get('r', row_counter + 1))
    col_counter = 0
    data = dict()
    for cell in row:
        col_counter += 1
        value = parse_cell(cell)
        if value is not None:
            coordinates = get_coordinates(cell, row_counter, col_counter)
            data[coordinates] = value
            col_counter = coordinates[1]
    return data, row_counter

def parse_sheet(xml_source):
    ROW_TAG = '{%s}row' % SHEET_MAIN_NS
    row_counter = 0
    it = iterparse(xml_source)
    data = dict()
    for _, element in it:
        tag_name = element.tag
        if tag_name == ROW_TAG:
            pass
            row_data, row_counter = parse_row(element, row_counter)
            data.update(row_data)
            element.clear()
    return data

def extended_archive_open(archive, name):
    with archive.open(name,) as src:
        namespaces = {node[0]: node[1] for _, node in
                      iterparse(src, events=['start-ns'])}
    for key, value in namespaces.items():
        register_namespace(key, value)
    return archive.open(name,)

def get_data_strings(xml_source):
    STRING_TAG = '{%s}si' % SHEET_MAIN_NS
    strings = []
    for _, node in iterparse(xml_source):
        if node.tag == STRING_TAG:
            strings.append(node)
    return strings
    
def load_workbook(filename, read_only=False, keep_vba=False,
                  data_only=False, keep_links=True):
    reader = ExcelReader(filename, read_only, keep_vba,
                        data_only, keep_links)
    reader.read()

    archive = _validate_archive(filename)
    
    workbook_data = dict()
    for sheet, rel in reader.parser.find_sheets():
        if rel.target not in reader.valid_files or "chartsheet" in rel.Type:
            continue
        fh = archive.open(rel.target)
        sheet_data = parse_sheet(fh)
        workbook_data[sheet.name] = sheet_data

    data_strings = []
    ct = reader.package.find(SHARED_STRINGS)
    if ct is not None:
        strings_path = ct.PartName[1:]
        with extended_archive_open(archive, strings_path) as src:
            data_strings = get_data_strings(src)

    archive.close()
    workbook = reader.wb
    workbook._extended_value_workbook_data = workbook_data
    workbook._extended_value_data_strings = data_strings
    return workbook

def check_if_lxml(element):
    if type(element).__module__ == 'xml.etree.ElementTree':
        string = xml_tostring(element)
        el = lxml_fromstring(string)
        return el
    return element
   
def write_string_table(workbook):
    workbook_data = workbook._extended_value_workbook_data
    data_strings = workbook._extended_value_data_strings
    out = BytesIO()
    with xmlfile(out) as xf:
        with xf.element("sst", xmlns=SHEET_MAIN_NS, uniqueCount="%d" % len(data_strings)):
            for i in range(0, len(data_strings)):
                xml_el = data_strings[i]
                el = check_if_lxml(xml_el)
                xf.write(el)
    return out.getvalue()

def check_cell(cell):
    if cell.data_type != 's':
        return False
    if cell._comment is not None:
        return False
    if cell.hyperlink:
        return False
    return True

def extended_write_cell(xf, worksheet, cell, styled=None):
    if worksheet.title not in worksheet.parent._extended_value_workbook_data:
        return
    sheet = worksheet.parent._extended_value_workbook_data[worksheet.title]
    if (cell.row, cell.column) in sheet and check_cell(cell):
        attributes = {'r': cell.coordinate, 't': cell.data_type}
        if styled:
            attributes['s'] = '%d' % cell.style_id
        if LXML:
            with xf.element('c', attributes):
                with xf.element('v'):
                    xf.write('%.16g' % sheet[(cell.row, cell.column)])                    
        else:
            el = Element('c', attributes)
            cell_content = SubElement(el, 'v')
            cell_content.text = '%.16g' % sheet[(cell.row, cell.column)]
            xf.write(el)
    else:
        write_cell(xf, worksheet, cell, styled)
    return

class ExtendedWorksheetWriter(WorksheetWriter):

    def write_row(self, xf, row, row_idx):
        attrs = {'r': f"{row_idx}"}
        dims = self.ws.row_dimensions
        attrs.update(dims.get(row_idx, {}))

        with xf.element("row", attrs):

            for cell in row:
                if cell._comment is not None:
                    comment = openpyxl.comments.comment_sheet.CommentRecord.from_cell(cell)
                    self.ws._comments.append(comment)
                if (
                    cell._value is None
                    and not cell.has_style
                    and not cell._comment
                    ):
                    continue
                extended_write_cell(xf, self.ws, cell, cell.has_style)
        return


class ExtendedWorkbookWriter(WorkbookWriter):

    def write_rels(self, *args, **kwargs):
        styles =  Relationship(type='sharedStrings', Target='sharedStrings.xml')
        self.rels.append(styles)
        return super().write_rels(*args, **kwargs) 

class ExtendedExcelWriter(ExcelWriter):

    def __init__(self, workbook, archive):
        self._archive = archive
        self.workbook = workbook
        self.manifest = Manifest(Override = DEFAULT_OVERRIDE)
        self.vba_modified = set()
        self._tables = []
        self._charts = []
        self._images = []
        self._drawings = []
        self._comments = []
        self._pivots = []
        return

    def write_data(self):
        archive = self._archive
        props = ExtendedProperties()
        archive.writestr(ARC_APP, tostring(props.to_tree()))
        archive.writestr(ARC_CORE, tostring(self.workbook.properties.to_tree()))
        if self.workbook.loaded_theme:
            archive.writestr(ARC_THEME, self.workbook.loaded_theme)
        else:
            archive.writestr(ARC_THEME, theme_xml)
        self._write_worksheets()
        self._write_chartsheets()
        self._write_images()
        self._write_charts()

        if self.workbook._extended_value_workbook_data \
                and self.workbook._extended_value_data_strings:
            string_table_out = write_string_table(self.workbook)
            self._archive.writestr(ARC_SHARED_STRINGS, string_table_out)
        
        self._write_external_links()
        stylesheet = write_stylesheet(self.workbook)
        archive.writestr(ARC_STYLE, tostring(stylesheet))
        
        writer = ExtendedWorkbookWriter(self.workbook)
        
        archive.writestr(ARC_ROOT_RELS, writer.write_root_rels())
        archive.writestr(ARC_WORKBOOK, writer.write())
        archive.writestr(ARC_WORKBOOK_RELS, writer.write_rels())
        self._merge_vba()
        self.manifest._write(archive, self.workbook)
        return

    def write_worksheet(self, ws):
        ws._drawing = SpreadsheetDrawing()
        ws._drawing.charts = ws._charts
        ws._drawing.images = ws._images
        if self.workbook.write_only:
            if not ws.closed:
                ws.close()
            writer = ws._writer
        else:
            writer = ExtendedWorksheetWriter(ws)
            writer.write()

        ws._rels = writer._rels
        self._archive.write(writer.out, ws.path[1:])
        self.manifest.append(ws)
        writer.cleanup()
        return

def save_workbook(workbook, filename):
    archive = ZipFile(filename, 'w', ZIP_DEFLATED, allowZip64=True)
    writer = ExtendedExcelWriter(workbook, archive)
    writer.save()
    return True

我能用这个强制将我存储的数字视为文本吗?我一直遇到一些奇怪的错误,将类似CSV的字符串保存到单元格1,2,3中,最终会以一个数字列表的形式存储,并且其中一个数字是“NaN”,这完全破坏了我的程序对该数据的使用。 - nmz787

0

我尝试运行代码时出现了错误:

NameError: name 'CommentRecord' is not defined

CommentRecord 在文件 extendedopenpyxl.py 的第 192 行存在。 如果你添加以下代码,问题就可以解决:

from openpyxl.comments.comment_sheet import *

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接