我对Python还非常陌生,但是我想构建一个网络爬虫工具,可以从在线的HTML表中提取数据,并将其以相同格式打印到CSV中。
这是HTML表格的示例(它非常庞大,因此我只提供几行)。
<div class="col-xs-12 tab-content">
<div id="historical-data" class="tab-pane active">
<div class="tab-header">
<h2 class="pull-left bottom-margin-2x">Historical data for Bitcoin</h2>
<div class="clear"></div>
<div class="row">
<div class="col-md-12">
<div class="pull-left">
<small>Currency in USD</small>
</div>
<div id="reportrange" class="pull-right">
<i class="glyphicon glyphicon-calendar fa fa-calendar"></i>
<span>Aug 16, 2017 - Sep 15, 2017</span> <b class="caret"></b>
</div>
</div>
</div>
<table class="table">
<thead>
<tr>
<th class="text-left">Date</th>
<th class="text-right">Open</th>
<th class="text-right">High</th>
<th class="text-right">Low</th>
<th class="text-right">Close</th>
<th class="text-right">Volume</th>
<th class="text-right">Market Cap</th>
</tr>
</thead>
<tbody>
<tr class="text-right">
<td class="text-left">Sep 14, 2017</td>
<td>3875.37</td>
<td>3920.60</td>
<td>3153.86</td>
<td>3154.95</td>
<td>2,716,310,000</td>
<td>64,191,600,000</td>
</tr>
<tr class="text-right">
<td class="text-left">Sep 13, 2017</td>
<td>4131.98</td>
<td>4131.98</td>
<td>3789.92</td>
<td>3882.59</td>
<td>2,219,410,000</td>
<td>68,432,200,000</td>
</tr>
<tr class="text-right">
<td class="text-left">Sep 12, 2017</td>
<td>4168.88</td>
<td>4344.65</td>
<td>4085.22</td>
<td>4130.81</td>
<td>1,864,530,000</td>
<td>69,033,400,000</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
我特别想重新创建一个表格,并使用提供的相同列标题:"日期","开盘价","最高价","最低价","收盘价","交易量","市值"。 目前,我已经能够编写一个简单的脚本,它将基本上转到URL,下载HTML,使用BeautifulSoup解析,然后使用'for'语句获取td元素。 下面是我的代码示例(URL被省略)和结果:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import csv
url = "enterURLhere"
page = requests.get(url)
pagetext = page.text
pricetable = {
"Date" : [],
"Open" : [],
"High" : [],
"Low" : [],
"Close" : [],
"Volume" : [],
"Market Cap" : []
}
soup = BeautifulSoup(pagetext, 'html.parser')
file = open("test.csv", 'w')
for row in soup.find_all('tr'):
for col in row.find_all('td'):
print(col.text)
有没有人能够指点一下如何将数据拉取重新格式化成表格?谢谢。