提取Python docx中字体名称和大小

Question

提取Python docx中字体名称和大小

3

我想用Python编写一个程序，检查MS Word文件(.docx)的一些属性，例如边距、字体名称和字体大小。(在继续之前，我应该指出，老实说，我不知道我在干什么)

对于字体部分，我遇到了真正的问题：
根据： https://python-docx.readthedocs.io/en/latest/user/styles-understanding.html

“一个样式可以从另一个样式继承属性，类似于Cascading Style Sheets (CSS)的工作原理。继承是使用base_style属性指定的。通过基于另一个样式创建一个样式，可以形成任意深度的继承层次结构。没有基本样式的样式继承文档默认值的属性。”

因此，我尝试了这段代码：

d = Document('1.docx')
d_styles = d.styles

for st in d_styles:
    if st.name != "No List": #Ignoring The Numbering Style
        print(st.type, st.name, st.base_style)
        #print(dir(st.base_style), '\n') there is no such thing as font in dir(st.base_style)

st.base_style 返回 "None"

因此，基于“没有基础样式的样式从文档默认值继承属性”的原则，答案应该在这部分中。但是我不知道如何达到它。

下面的代码也返回了 "None":

for st in d_styles:
    if st.name != "No List": #Ignoring The Numbering Style
        print(st.font.name)
#Outputs: None

for para in d.paragraphs:
    for r in para.runs:
        print (r.font.name)
#Outputs: None

for para in d.paragraphs:
    print(para.style.font.name)
#Outputs: None

我使用了以下资源：
https://python-docx.readthedocs.io/en/latest/api/style.html
https://python-docx.readthedocs.io/en/latest/user/styles-understanding.html

编辑：

我尝试将样式对象视为字典处理：

for key, value in styles.items() :
    print (key, value)
#ERROR: 'Styles' object has no attribute 'items'

print(styles.items())
#ERROR: 'Styles' object has no attribute 'items'

print(styles.keys())
#ERROR: 'Styles' object has no attribute 'keys'

print(styles.values())
#ERROR: 'Styles' object has no attribute 'values'

即使这段代码返回了None：

style = d.styles['Normal']
f = style.font
print(f.name)

- AKLMI

我建议查看文档的XML并查看其中给出的提示。最终，“python-docx”只是基于底层XML文档的用户界面。您可以使用“print（d.styles [“Normal”] ._ element.xml）”开始。如果您在那里找不到任何字体信息，那将解释为什么会得到“None”作为字体名称。 - scanny

非常感谢您的帮助。我没有找到任何字体信息，除非我漏掉了什么：<w:style xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w16cex="http://schemas.microsoft.com/office/word/2018/wordml/cex"。 - AKLMI

xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid" xmlns:w16="http://schemas.microsoft.com/office/word/2018/wordml" xmlns:w16sdtdh="http://schemas.microsoft.com/office/word/2020/wordml/sdtdatahash" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" w:type="paragraph" w:default="1" w:styleId="Normal"> <w:name w:val="Normal"/> <w:qFormat/> </w:style> - AKLMI

好的，我预计这会回退到文档默认设置。我不确定它确切的定义在哪里，但我不认为python-docx有任何API支持来获取或设置它。 - scanny

这个问题有任何更新吗？我也得到了段落.style.font.name和size的None，但是我能够保留文本和样式名称。这是否与库中的错误有关？我使用的是最新版本。 - Sojimanatsu

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- ARK1375 · Accepted Answer

根据文件：

样式对象提供按名称访问已定义样式的字典式访问。

我认为这是你的问题。你试图将字典作为列表访问，这只返回字典的键而不是值。尝试使用下面的代码段来解决问题。但是以后参考资料，请仔细阅读Style Document。

要获取style的键值，请使用：

d = Document('1.docx')
d_styles = d.styles
print(d_styles.keys())

之后，您可以使用d_styles['yourKey']访问字典的每个值。要同时获取值和键，请尝试下面的代码片段。

d = Document('1.docx')
d_styles = d.styles
for key in d_styles:
    print(f'{key} : {d_styles[key]}')

记住，每个样式（例如 d_styles [key]）也是可迭代的，这意味着您可以对其进行迭代。因此，下面的代码片段也是有效的。

d = Document('1.docx')
d_styles = d.styles
for key in d_styles:
    print(f'{key} : {d_styles[key]}')
    for val in d_styles[key]:
        print(val)

玩一下键和属性，你就会找到你想要的东西。