如何使用Python和BeautifulSoup解析HTML表格并写入CSV文件

Question

如何使用Python和BeautifulSoup解析HTML表格并写入CSV文件

7

我尝试解析HTML页面，获取货币值并写入CSV文件中。我有以下代码：

#!/usr/bin/env python

import urllib2
from BeautifulSoup import BeautifulSoup

contenturl = "http://www.bank.gov.ua/control/en/curmetal/detail/currency?period=daily"
soup = BeautifulSoup(urllib2.urlopen(contenturl).read())

table = soup.find('div', attrs={'class': 'content'})

rows = table.findAll('tr')
for tr in rows:
    cols = tr.findAll('td')
    for td in cols:
        text = td.find(text=True) + ';'
        print text,
    print

问题是我不知道如何仅检索货币的值。我尝试了一些正则表达式，如“^[0-9]{3}” - 以三个数字开头，但没有起作用。

- user2140323

你为什么要使用BeautifulSoup 3而不是4呢？虽然这与你的问题关系不大，但bs4在某些地方提供了更好的功能。 - Martijn Pieters

你是想获取“官方汇率”列的值吗？ - jurgenreza

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Martijn Pieters · Accepted Answer

最好选出表格中的特定单元格。带有 cell_c 类的 td 单元格包含您感兴趣的数据，而最后一个单元格始终是货币汇率：

rows = table.findAll('tr')
for tr in rows:
    cols = tr.findAll('td')
    if 'cell_c' in cols[0]['class']:
        # currency row
        digital_code, letter_code, units, name, rate = [c.text for c in cols]
        print digital_code, letter_code, units, name, rate

将数据存储在单独的变量中后，您现在可以将文本转换为十进制数字，并将它们存储在数据库中，或者进行其他操作。