Python tabula-py如何在pdf表格中有换行符时读取表格?

5

我尝试使用Python包tabula-py读取pdf中的表格,但是pdf表格单元格中的换行符会将原始单元格中的内容分隔成多个单元格。

我尝试搜索各种Python包来解决这个问题。看起来tabula-py是将pdf表格转换为pandas数据最稳定的包。然而,如果无法解决这个问题,我不得不转向在线服务,该服务可以为我生成理想的Excel输出。

from tabula import read_pdf
df=read_pdf("C:/Users/Desktop/test.pdf", pages='all')

我希望使用this能够正确转换PDF表格。
2个回答

4

Tabula不再有“电子表格”选项。相反,请使用“栅格”选项以避免换行符将其分隔成新行。代码如下:

import tabula

# Read pdf into DataFrame
df = tabula.read_pdf("FDA EPC Text Phrases  (updated March 2018.pdf", pages='all', 
lattice=True)
print(df)

0
您可以使用值为“True”的“电子表格”选项来忽略由换行符引起的多行NAN值。
import tabula

# Read pdf into DataFrame
df = tabula.read_pdf("FDA EPC Text Phrases  (updated March 2018.pdf", pages='all', spreadsheet=True)
print(df)
#print(df['Active Moiety Name'])
#print(df['FDA Established Pharmacologic Class\r(EPC) Text Phrase\rPLR regulations require that the following\rstatement is included in the Highlights\rIndications and Usage heading if a drug is a\rmember of an EPC [see 21 CFR\r201.57(a)(6)]: “(Drug) is a (FDA EPC Text\rPhrase) indicated for [indication(s)].” For\reach listed active moiety, the associated\rFDA EPC text phrase is included in this\rdocument. For more information about how\rFDA determines the EPC Text Phrase, see\rthe 2009 "Determining EPC for Use in the\rHighlights" guidance and 2013 "Determining\rEPC for Use in the Highlights" MAPP\r7400.13.'])

输出:

1758                                         ziconotide                  N-type calcium channel antagonist                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
1759                                         zidovudine  HIV nucleoside analog reverse transcriptase in...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
1760                                           zileuton                           5-lipoxygenase inhibitor                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
1761                                        zinc cation                        copper absorption inhibitor                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
1762                                        ziprasidone                             atypical antipsychotic                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
1763                                    zoledronic acid                                     bisphosphonate                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
1764                          zoledronic acid anhydrous                                     bisphosphonate                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
1765                                       zolmitriptan     serotonin 5-HT1B/1D receptor agonist (triptan)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
1766                                       zolmitriptan     serotonin 5-HT1B/1D receptor agonist (triptan)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
1767                                           zolpidem           gamma-aminobutyric acid (GABA) A agonist                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
1768                                         zonisamide                           antiepileptic drug (AED)  

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接