我已经添加了标题,逐行循环并将每行附加到 < tr > 和 < td > 标签上,它应该作为单个表格工作,无需使用这些标签(< tr >< /tr > 和 < td >< /td >[为了可读性而添加的空格])来处理 col1 和 col2。
日志:片段:
MUTHU PAGE
2019/08/19 19:59:25 MUTHUKUMAR_TIME_DATE,line: 118 INFO | Logger
object created for: MUTHUKUMAR_APP_USER_SIGNUP_LOG 2019/08/19 19:59:25
MUTHUKUMAR_DB_USER_SIGN_UP,line: 48 INFO | ***** User SIGNUP page
start ***** 2019/08/19 19:59:25 MUTHUKUMAR_DB_USER_SIGN_UP,line: 49
INFO | Enter first name: [只允许字母字符,最少3个字符,最多20个字符]
HTML 源页面:
'''
<?xml version="1.0" encoding="utf-8"?>
<body>
<table>
<p>
MUTHU PAGE
</p>
<tr>
<td>
2019/08/19 19:59:25 MUTHUKUMAR_TIME_DATE,line: 118 INFO | Logger object created for: MUTHUKUMAR_APP_USER_SIGNUP_LOG
</td>
</tr>
<tr>
<td>
2019/08/19 19:59:25 MUTHUKUMAR_DB_USER_SIGN_UP,line: 48 INFO | ***** User SIGNUP page start *****
</td>
</tr>
<tr>
<td>
2019/08/19 19:59:25 MUTHUKUMAR_DB_USER_SIGN_UP,line: 49 INFO | Enter first name: [Alphabet character only allowed, minimum 3 character to maximum 20 chracter]
'''
代码:
from bs4 import BeautifulSoup
soup = BeautifulSoup(features='xml')
body = soup.new_tag('body')
soup.insert(0, body)
table = soup.new_tag('table')
body.insert(0, table)
with open('C:\\Users\xxxxx\\Documents\\Latest_24_may_2019\\New_27_jun_2019\\DB\\log\\input.txt') as infile:
title_s = soup.new_tag('p')
title_s.string = " MUTHU PAGE "
table.insert(0, title_s)
for line in infile:
row = soup.new_tag('tr')
col1 = list(line.split('\n'))
col1 = [ each for each in col1 if each != '']
for coltext in col1:
col = soup.new_tag('td')
col.string = coltext
row.insert(0, col)
table.insert(len(table.contents), row)
with open('C:\\Users\xxxx\\Documents\\Latest_24_may_2019\\New_27_jun_2019\\DB\\log\\output.html', 'w') as outfile:
outfile.write(soup.prettify())