如何使用Python从PDF文件中提取粗体文本？

Question

如何使用Python从PDF文件中提取粗体文本？

pythonnlppython-re

3

Surgical rooms and services - To include surgical suites, major and minor, treatment rooms, endoscopy labs, cardiac cath labs, X-ray.
Facility Basic Charges - pulmonary and cardiology procedural rooms. The hospital's charge for surgical suites and services shall include the entire above listed nursing personnel services, supplies, and equipment

- Shri Tech1404

2个回答

0

使用这段代码：

import pdfplumber
import re
demo = []
with pdfplumber.open('HCSC IL Inpatient_Outpatient Unbundling Policy- Facility.pdf') as pdf: 
    for i in range(0, 50):
        try:
            text = pdf.pages[i]  
            clean_text = text.filter(lambda obj: obj["object_type"] == "char" and "Bold" in obj["fontname"])
            demo.append(str(re.findall(r'(\d+\.\s.*\n?)+', clean_text.extract_text())).replace('[]', ' '))
        except IndexError:
            print("")
            break

- Shrikant Shejwal_IND

请始终提供您的代码为什么有效的解释。 - rikyeah

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Nestor Silva · Accepted Answer

你可以使用以下代码来完成：

import pdfplumber
with pdfplumber.open('test.pdf') as pdf: 
    text = pdf.pages[0]
    clean_text = text.filter(lambda obj: obj["object_type"] == "char" and "Bold" in obj["fontname"])
    print(clean_text.extract_text())

它使用pdfplumber库，因此您可以查看其文档以获取更多信息。