如何解析位于div内部的表格

3
                                <div id="findet_1" name="findet_1" >

                                    <table width="100%" border="0" cellspacing="0" cellpadding="0">

                                        <tr>

                                            <td class="thc01 w160 gL_10 UC" >&nbsp;Standalone</td>

                                            <td class="thc01 w160 gL_10 tar">Jun'16</td>

                                            <td class="thc01 w160 gL_10 tar">Mar'16</td>

                                            <td class="thc01 w160 gL_10 tar">Dec'15</td>

                                            <td class="thc01 w160 gL_10 tar"><div class="PR20">Sep'15</div></td>

                                        </tr>

                                        <tr>

                                            <td class="thc02 w160 gD_12" >Net Sales</td>

                                            <td class="thc02 w160 gD_12 tar">16,339.70</td>

                                            <td class="thc02 w160 gD_12 tar">15,589.40</td>

                                            <td class="thc02 w160 gD_12 tar">15,065.00</td>

                                            <td class="thc02 w160 gD_12 tar"><span class="PR20">14,824.50</span></td>

                                        </tr>

                                        <tr>

                                            <td class="thc02 w160 gD_12" >Other Income</td>

                                            <td class="thc02 w160 gD_12 tar">50.10</td>

                                            <td class="thc02 w160 gD_12 tar">46.30</td>

                                            <td class="thc02 w160 gD_12 tar">153.30</td>

                                            <td class="thc02 w160 gD_12 tar"><span class="PR20">1,087.40</span></td>

                                        </tr>

                                        <tr>

                                            <td class="thc02 w160 gD_12" >PBDIT</td>

                                            <td class="thc02 w160 gD_12 tar">6,612.30</td>

                                            <td class="thc02 w160 gD_12 tar">5,930.60</td>

                                            <td class="thc02 w160 gD_12 tar">5,543.30</td>

                                            <td class="thc02 w160 gD_12 tar"><span class="PR20">5,416.80</span></td>

                                        </tr>

                                        <tr>

                                            <td class="thc02 w160 gD_12" >Net Profit</td>

                                            <td class="thc02 w160 gD_12 tar">1,427.50</td>

                                            <td class="thc02 w160 gD_12 tar">1,693.90</td>

                                            <td class="thc02 w160 gD_12 tar">1,709.10</td>

                                            <td class="thc02 w160 gD_12 tar"><span class="PR20">2,223.70</span></td>

                                        </tr>

                                    </table>

                                </div>

我正在尝试读取这个表格,但无法做到。我使用beautifulsoup的findall来首先查找div。表格存在于div中。我无法找到那个表格。另一个问题是什么是最好的遍历行的方法。例如,我想要输出csv格式,应该用双引号括起来,如下所示: "STANDALONE","Jun'16","Mar'16","Dec'15","Sep'15" "Net Sales","16,339.70","15,589.40","15,065.00","14,824.50" "Other Income","50.10","46.30","153.30","1,087.40" "PBDIT","6,612.30","5,930.60","5,543.30","5,416.80" "Net Profit","1,427.50","1,693.90","1,709.10","2,223.70"
我的代码:
from urllib.request import urlopen from bs4 import BeautifulSoup import re
html = urlopen("http://www.moneycontrol.com/india/stockpricequote/computers-software/tataconsultancyservices/TCS")
bsObj = BeautifulSoup(html, "html.parser")
link = bsObj.findAll("div", id="findet_1")
table1 = link.find('table').find_all('tr')

我知道我们可以使用get_text获取值,并使用for循环遍历行,但我无法找到表格本身 :(

2个回答

3

试试这个:

table_div = html.find('div' , {'id': 'findet_1', 'name': 'findet_1' })
table = table_div.find('table')

或者这个

table_div = html.find('div' , {'id': 'findet_1', 'name': 'findet_1' })
table = table_div.find_all('tr')

谢谢。这个有效。请问如何将每一行存储到文本文件中? - Bhavesh Ghodasara
使用csv模块 - arcegk

-1
唯一的区别在于find_all()返回一个包含单个结果的列表,而find()只返回结果。
如果find_all()找不到任何内容,则返回一个空列表。如果find()找不到任何内容,则返回None:
link = bsObj.findAll("div", id="findet_1")
if link:
    table1 = link[0].find('table').find_all('tr')

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接