用Python将完整的csv表格写入PDF

7
我有一个Python脚本,可以使用reportlab platypus将.csv文件中的文本消息数据写入PDF表格中。它只会将表格的最后一行写入单元格,忽略此之前的所有其他行。它只会将Excel截图中突出显示的最后一行写入PDF中。下面是它写入PDF时的样子。请注意保留HTML标签。

enter image description here

enter image description here

它还会创建三到四页的PDF文件,推断它试图为完整表格腾出空间,但不会写入。这是我迄今为止一直在使用的代码。我想将其以与Excel片段中显示的相同格式写入PDF文档。我应该如何重构我的代码来实现这一点?
# Script to generate a PDF report after data has been parsed into smsInfo.csv file

# import statements
import requests
from reportlab.lib import colors
from reportlab.lib.pagesizes import *
from reportlab.platypus import *
from reportlab.lib.styles import getSampleStyleSheet
import csv
import os
import datetime

now = datetime.datetime.now()

# Get de work directory
cwd = os.getcwd()

# Introduction text
line1 = 'LYIT MOBILE FORENSICS DIVISION'
line2 = 'Date: ' + now.strftime("%d-%m-%y")
line3 = 'Case Number: 10'
line4 = 'This forensic report on sms card data has been compiled by the forensic'
line5 = 'examiner in conclusion to the investigation into the RTA'
line6 = 'case which occurred on 23/01/2018.'


#PDF document layout
table_style = TableStyle([('ALIGN',(1,1),(-2,-2),'RIGHT'),
                       ('TEXTCOLOR',(1,1),(-2,-2),colors.red),
                       ('VALIGN',(0,0),(0,-1),'TOP'),
                       ('TEXTCOLOR',(0,0),(0,-1),colors.blue),
                       ('ALIGN',(0,-1),(-1,-1),'CENTER'),
                       ('VALIGN',(0,-1),(-1,-1),'MIDDLE'),
                       ('TEXTCOLOR',(0,-1),(-1,-1),colors.green),
                       ('INNERGRID', (0,0), (-1,-1), 0.25, colors.black),
                       ('BOX', (0,0), (-1,-1), 0.25, colors.black),
                       ])
styles = getSampleStyleSheet()
styleNormal = styles['Normal']
styleHeading = styles['Heading1']
styleHeading2 = styles['Heading2']
styleHeading.alignment = 1 # centre text (TA_CENTRE)

#Configure style and word wrap
s = getSampleStyleSheet()
s = s["BodyText"]
s.wordWrap = 'CJK'

# File that must be written to report
with open('H:\College Fourth Year\Development Project\Final Year Project 2018\ExtractedEvidence\smsInfo.csv', "r") as csvfile:
    reader = csv.reader(csvfile)
    lista = list(reader)

headers = lista[0]

conteo = 1

for numRecord in range(1,len(lista)):

    record1 = lista[numRecord]

    data = list()
    emptyRecords = list()
    records = list()
    header = list()

    countRecords = 0

    for line in record1:

        if line == '':
            emptyRecords.append(line)
        else:
            records.append(line)
            header.append(headers[countRecords])

            data.append([str(headers[countRecords]), str(line)])

        countRecords = countRecords + 1

    data2 = [[Paragraph(cell, s) for cell in row] for row in data]
    t = Table(data2)
    t.setStyle(table_style)

    elements = []

    # Name of file
    fileName = cwd + '\\' + 'Forensic Reports\\SMS Data Report' + '.pdf'

    conteo = conteo + 1

    archivo_pdf = SimpleDocTemplate(fileName, pagesize = letter, rightMargin = 40, leftMargin = 40, topMargin = 40, bottomMargin = 28)

    #Send the data and build the file
    elements.append(Paragraph(line1, styleNormal))
    elements.append(Paragraph(line2, styleNormal))
    elements.append(Paragraph(line3, styleNormal))
    elements.append(Spacer(inch, .25*inch))
    elements.append(Paragraph(line4, styleNormal))
    elements.append(Paragraph(line5, styleNormal))
    elements.append(Paragraph(line6, styleNormal))
    elements.append(Spacer(inch, .25*inch))
    elements.append(t)

    archivo_pdf.build(elements)
    print ('SMS Data Forensic Report Generated!')

1
从最开始开始。如果您使用print打印数据,所有行是否都出现在控制台上? - Jongware
不是打印到控制台,而是打印到PDF。 - GreenCoder90
我会打印到控制台以检查输出。 - GreenCoder90
它只是打印了五次:“SMS数据取证报告已生成![]”。 - GreenCoder90
当我执行“print(data)”时,它会将所有所需的信息打印到控制台。 - GreenCoder90
1个回答

13

当前脚本似乎会在CSV的每一行上覆盖相同的PDF文件。下面显示的修改后的脚本生成了如电子表格中所示的表格。我进行了一些重大改变,因此您需要修改文件路径、格式、内容等以适应您的应用程序。新代码还包括列宽,并删除了一些似乎不必要的字符。

生成的PDF示例

PDF Example

用于测试的CSV数据:

['ID','Incoming Number','Date & Time','Read','Sent/Replied','Body','Seen']
(1,'555-555-5555','23-01-2018 17:03:52',1,1,'Where are you at? Are you on your way yet?',1)
(2,'555-555-5555','23-01-2018 17:04:08',1,2,'Yes I am driving at the moment',1)
(3,'555-555-5555','23-01-2018 17:04:34',1,1,'Be here soon or I'm going to call you',1)
(4,'555-555-5555','23-01-2018 17:05:10',1,2,'Yes I will try and pick up the speed and make up time. I'm breaking the speed limit already.',1)
(5,'555-555-5555','23-01-2018 17:05:46',1,1,'Ok',1)

修改后的Python脚本

表格样式代码比必要的更长,以帮助演示单元格范围的工作方式。

import csv
import datetime
from reportlab.lib.units import cm, inch
from reportlab.lib import colors
from reportlab.lib.pagesizes import letter
from reportlab.platypus import *
from reportlab.lib.styles import getSampleStyleSheet

# Data from CSV
with open('smsInfo.csv', "r") as csvfile:
    data = list(csv.reader(csvfile))

elements = []

# PDF Text
# PDF Text - Styles
styles = getSampleStyleSheet()
styleNormal = styles['Normal']

# PDF Text - Content
line1 = 'LYIT MOBILE FORENSICS DIVISION'
line2 = 'Date: {}'.format(datetime.datetime.now().strftime("%d-%m-%y"))
line3 = 'Case Number: 10'
line4 = 'This forensic report on sms card data has been compiled by the forensic'
line5 = 'examiner in conclusion to the investigation into the RTA'
line6 = 'case which occurred on 23/01/2018.'

elements.append(Paragraph(line1, styleNormal))
elements.append(Paragraph(line2, styleNormal))
elements.append(Paragraph(line3, styleNormal))
elements.append(Spacer(inch, .25 * inch))
elements.append(Paragraph(line4, styleNormal))
elements.append(Paragraph(line5, styleNormal))
elements.append(Paragraph(line6, styleNormal))
elements.append(Spacer(inch, .25 * inch))

# PDF Table
# PDF Table - Styles
# [(start_column, start_row), (end_column, end_row)]
all_cells = [(0, 0), (-1, -1)]
header = [(0, 0), (-1, 0)]
column0 = [(0, 0), (0, -1)]
column1 = [(1, 0), (1, -1)]
column2 = [(2, 0), (2, -1)]
column3 = [(3, 0), (3, -1)]
column4 = [(4, 0), (4, -1)]
column5 = [(5, 0), (5, -1)]
column6 = [(6, 0), (6, -1)]
table_style = TableStyle([
    ('VALIGN', all_cells[0], all_cells[1], 'TOP'),
    ('LINEBELOW', header[0], header[1], 1, colors.black),
    ('ALIGN', column0[0], column0[1], 'LEFT'),
    ('ALIGN', column1[0], column1[1], 'LEFT'),
    ('ALIGN', column2[0], column2[1], 'LEFT'),
    ('ALIGN', column3[0], column3[1], 'RIGHT'),
    ('ALIGN', column4[0], column4[1], 'RIGHT'),
    ('ALIGN', column5[0], column5[1], 'LEFT'),
    ('ALIGN', column6[0], column6[1], 'RIGHT'),
])

# PDF Table - Column Widths
colWidths = [
    0.7 * cm,  # Column 0
    3.1 * cm,  # Column 1
    3.7 * cm,  # Column 2
    1.2 * cm,  # Column 3
    2.5 * cm,  # Column 4
    6 * cm,  # Column 5
    1.1 * cm,  # Column 6
]

# PDF Table - Strip '[]() and add word wrap to column 5
for index, row in enumerate(data):
    for col, val in enumerate(row):
        if col != 5 or index == 0:
            data[index][col] = val.strip("'[]()")
        else:
            data[index][col] = Paragraph(val, styles['Normal'])

# Add table to elements
t = Table(data, colWidths=colWidths)
t.setStyle(table_style)
elements.append(t)

# Generate PDF
archivo_pdf = SimpleDocTemplate(
    'SMS Data Report.pdf',
    pagesize=letter,
    rightMargin=40,
    leftMargin=40,
    topMargin=40,
    bottomMargin=28)
archivo_pdf.build(elements)
print('SMS Data Forensic Report Generated!')

谢谢。我今天会尝试这个。 - GreenCoder90
它已经在工作,但我想整个表格周围都有网格线。我尝试使用"('INNERGRID', (0,0), (-1,-1), 1, colors.black),",但是表格中的数据似乎超出了网格线。是否有任何办法使网格线根据单元格中的内容移动? - GreenCoder90
我已经解决了。我只是调整了列宽以适应文本的宽度。我的问题现在得到了回答。谢谢。我现在会接受你的问题。 - GreenCoder90
@GreenCoder90 这个示例也已经被更正为适当的宽度。 - Adam Moller
谢谢。我现在明白了布局。 - GreenCoder90

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接