从一个标签中获取标签类型(而非可导航字符串)的子标签

14

Beautiful soup 文档提供了属性 .contents 和 .children 以访问给定标签的子节点(分别是列表和可迭代对象),包括 Navigable Strings 和 Tags 两种类型。我只想获取标签类型的子节点。

我目前是使用列表推导来实现这一点:

rows=[x for x in table.tbody.children if type(x)==bs4.element.Tag]

但我想知道是否有更好/更Pythonic/内置的方法来获取仅标签子元素。


6
table.tbody.find_all(True, recursive=False)(我还没有尝试过) - jfs
你可以发布自己的答案:包括文档链接、可运行的代码示例和输入/输出。 - jfs
1个回答

17

感谢J.F.Sebastian,以下内容可正常工作:

rows=table.tbody.find_all(True, recursive=False)

文档在这里:http://www.crummy.com/software/BeautifulSoup/bs4/doc/#true

在我的情况下,我需要表中实际的行,所以我最终使用了以下代码,它更精确、更易读:

rows=table.tbody.find_all('tr')

再次提供文档链接:http://www.crummy.com/software/BeautifulSoup/bs4/doc/#navigating-using-tag-names

我认为这比遍历Tag的所有子节点更好。

以下是使用的输入:

<table cellspacing="0" cellpadding="0">
  <thead>
    <tr class="title-row">
      <th class="title" colspan="100">
        <div style="position:relative;">
          President
            <span class="pct-rpt">
                99% reporting
            </span>
        </div>
      </th>
    </tr>
    <tr class="header-row">
        <th class="photo first">

        </th>
        <th class="candidate ">
          Candidate
        </th>
        <th class="party ">
          Party
        </th>
        <th class="votes ">
          Votes
        </th>
        <th class="pct ">
          Pct.
        </th>
        <th class="change ">
          Change from &lsquo;08
        </th>
        <th class="evotes last">
          Electoral Votes
        </th>
    </tr>
  </thead>
  <tbody>
      <tr class="">
          <td class="photo first">
            <div class="photo_wrap"><img alt="P-barack-obama" height="48" src="http://i1.nyt.com/projects/assets/election_2012/images/candidate_photos/election_night/p-barack-obama.jpg?1352320690" width="68" /></div>
          </td>
          <td class="candidate ">
            <div class="winner dem"><img alt="Hp-checkmark@2x" height="9" src="http://i1.nyt.com/projects/assets/election_2012/images/swatches/hp-checkmark@2x.png?1352320690" width="10" />Barack Obama</div>
          </td>
          <td class="party ">
            Dem.
          </td>
          <td class="votes ">
            2,916,811
          </td>
          <td class="pct ">
            57.3%
          </td>
          <td class="change ">
            -4.6%
          </td>
          <td class="evotes last">
            20
          </td>
      </tr>
      <tr class="">
          <td class="photo first">

          </td>
          <td class="candidate ">
            <div class="not-winner">Mitt Romney</div>
          </td>
          <td class="party ">
            Rep.
          </td>
          <td class="votes ">
            2,090,116
          </td>
          <td class="pct ">
            41.1%
          </td>
          <td class="change ">
            +4.3%
          </td>
          <td class="evotes last">
            0
          </td>
      </tr>
      <tr class="">
          <td class="photo first">

          </td>
          <td class="candidate ">
            <div class="not-winner">Gary Johnson</div>
          </td>
          <td class="party ">
            Lib.
          </td>
          <td class="votes ">
            54,798
          </td>
          <td class="pct ">
            1.1%
          </td>
          <td class="change ">
            &ndash;
          </td>
          <td class="evotes last">
            0
          </td>
      </tr>
      <tr class="last-row">
          <td class="photo first">

          </td>
          <td class="candidate ">
            div class="not-winner">Jill Stein</div>
          </td>
          <td class="party ">
            Green
          </td>
          <td class="votes ">
            29,336
          </td>
          <td class="pct ">
            0.6%
          </td>
          <td class="change ">
            &ndash;
          </td>
          <td class="evotes last">
            0
          </td>
      </tr>
      <tr>
        <td class="footer" colspan="100">
          <a href="/2012/results/president">President Map</a> &nbsp;|&nbsp;
          <a href="/2012/results/president/big-board">President Big Board</a>&nbsp;|&nbsp;
          <a href="/2012/results/president/exit-polls?state=il">Exit Polls</a>
        </td>
      </tr>
  </tbody>
</table>


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接