使用BeautifulSoup无法获取所需XPATH元素

3

我刚开始学习网络爬虫,并且使用Python的BeautifulSoup库。我想要从一个样本网页中获取一些属性数据来进行测试。代码如下:

import requests
from bs4 import BeautifulSoup as Soup

page = "http://www.zillow.com/homedetails/1630-Amalfi-Dr-Pacific-Palisades-CA-90272/20546602_zpid/"
response = requests.get(page)
soup = Soup(response.text)

# now, I would like to get the price for sale price of the apartment 
# the element in the HTML DOM is as following, 
# <span class="" id="yui_3_18_1_1_1464168312477_3548">$12,895,000<span class="value-suffix"></span></span>
# The XPath of the element, //*[@id="yui_3_18_1_1_1464168312477_3548"]

# I write the code as following,
value = soup.select('span#yui_3_18_1_1_1464168312477_3548')
print value 

我没有得到任何结果。我做错了什么吗?


我在浏览器中打开了网页。打开了页面源代码并搜索了“yui_3_18_1_1_1464168312477_3548”,但没有找到任何结果。你确定这个网页上有带有这个id的span元素吗? - Muhammad Tahir
它不在源代码中,而是动态生成的。 - Padraic Cunningham
好的,我并不是很擅长网络爬虫,事实上,这是我第一次尝试。所以,我的问题是,假设我想获取房产的出售价格和地址,我该如何获取这些信息? - Arefe
1个回答

3
你正在控制台查看源代码,但这与你从请求中获得的源代码不同,span id="yui_3_18_1_1_1464170172533_3087"是动态生成的,因此您需要使用类似于selenium的工具。
不幸的是,每次访问时ID也是唯一的,因此我们无法使用它。但是父级
保持一致,我们可以使用css选择器获取具有main-row home-summary-row类的第一个
In [4]: from selenium import webdriver
In [5]: dr = webdriver.PhantomJS()

In [6]: dr.get("http://www.zillow.com/homedetails/1630-Amalfi-Dr-Pacific-Palisades-CA-90272/20546602_zpid/")
In [7]: span = dr.find_element_by_css_selector('div.main-row.home-summary-row span')
In [8]: print(span.text)
$12,895,000

我使用了phantomjs进行无头浏览,如果您喜欢,也可以使用Firefox或Chrome,所有信息都在链接中。
实际上,再次查看源代码,我们可以使用bs4做同样的事情,id是唯一动态生成的内容,所以如果我们忘记了id,我们可以获取价格。
In [26]: soup.select_one("div.main-row.home-summary-row span").text
Out[26]: u'$12,895,000'

更好的方法是使用元标签获得大量信息:

import requests
from bs4 import BeautifulSoup as Soup

page = "http://www.zillow.com/homedetails/1630-Amalfi-Dr-Pacific-Palisades-CA-90272/20546602_zpid/"
response = requests.get(page)
soup = Soup(response.text,"lxml")
metas = soup.select("meta")

现在如果我们看一下 metas 返回的内容:
from pprint import pprint as pp

pp(metas)

[<meta content="on" http-equiv="x-dns-prefetch-control"/>,
 <meta charset="unicode-escape"/>,
 <meta content="View 31 photos of this $12,895,000, 7 bed, 10.0 bath, 10500 sqft single family home located at 1630 Amalfi Dr, Pacific Palisades, CA 90272 built in 2015. MLS # 16-103696." name="description"/>,
 <meta content="Zillow, Inc." name="author"/>,
 <meta content="Copyright (c) 2006-2014 Zillow, Inc." name="Copyright"/>,
 <meta content="none" name="msapplication-config"/>,
 <meta content="ALL" name="ROBOTS"/>,
 <meta content="NOYDIR" name="ROBOTS"/>,
 <meta content="NOODP" name="ROBOTS"/>,
 <meta content="yes" name="apple-mobile-web-app-capable"/>,
 <meta content="black-translucent" name="apple-mobile-web-app-status-bar-style"/>,
 <meta content="telephone=no" name="format-detection"/>,
 <meta content="#3366b8" name="msapplication-TileColor"/>,
 <meta content="http://www.zillowstatic.com/static/images/logos/zillow-logo-win8-tile.png" name="msapplication-TileImage"/>,
 <meta content="/8Me6HBNZX/rt2n5/y1Lo3ZIrkcvkTBimqviTDiurR4=" name="verify-v1"/>,
 <meta content="7cb4abe457d82ae8" name="y_key"/>,
 <meta content="width=device-width, height=device-height, initial-scale=1.0, maximum-scale=1.0, minimum-scale=1.0, user-scalable=no" name="viewport"/>,
 <meta content="Zillow Real Estate, Rentals, and Mortgage" itemprop="name"/>,
 <meta content="The most trafficked website about home sales and rentals, with real estate values for almost every U.S. home. 1,000,000 listings that you won't find on MLS." itemprop="description"/>,
 <meta content="http://www.zillowstatic.com/static/images/social/share_thumbnail.png" itemprop="image"/>,
 <meta content="691f1bfccade71b5-c065751219a379dd-g64cedb67f5ea020a-a" name="google-translate-customization"/>,
 <meta content="202692,878610170,662000799,100001769907023,10716009,769244502,10716649,503322863" property="fb:admins"/>,
 <meta content="172285552816089" property="fb:app_id"/>,
 <meta content="zillow_fb:home" property="og:type"/>,
 <meta content="1630 Amalfi Dr, Pacific Palisades, CA 90272" property="og:zillow_fb:address"/>,
 <meta content="7" property="zillow_fb:beds"/>,
 <meta content="10" property="zillow_fb:baths"/>,
 <meta content='For sale: $12,895,000. Stunning brand new Contemporary Cape Cod Estate in Palisades Riviera by Huntington Estate Homes w/ 7 beds, 10 baths, + office in 10,500 sq ft on an 18,590 sq ft lot. Soaring ceilings, magnificent chandelier, &amp; floating staircase create a grand entrance w/ glass wine cellar, formal living &amp; dining rooms. Floor plan flows openly between gourmet kitchen, family room, &amp; patio with a set of disappearing Fleetwood Pocket doors. Fireplaces in living, family, &amp; master suite add warmth to the contemporary feel, &amp; detailed wood paneling &amp; coffered ceilings enhance quality of design throughout. Master suite opens completely to sweeping ocean views &amp; private patio. Lower level feats. Old Hollywood style theater w/130" screen, surround sound, stadium seats, floor-to-ceiling suede panels, exercise pool, spa, gym, office, guest beds, open air patio, &amp; elevator access to take you from floor to floor. Perfect for entertaining - outdoor BBQ, seating, &amp; saltwater pool/spa complete this elegant estate.' property="zillow_fb:description"/>,
 <meta content="http://www.zillow.com/homedetails/1630-Amalfi-Dr-Pacific-Palisades-CA-90272/20546602_zpid/" property="og:url"/>,
 <meta content="Pacific Palisades Home For Sale" property="og:title"/>,
 <meta content="http://photos2.zillowstatic.com/p_d/IS5ypcj39edbdc1000000000.jpg" property="og:image"/>,
 <meta content='For sale: $12,895,000. Stunning brand new Contemporary Cape Cod Estate in Palisades Riviera by Huntington Estate Homes w/ 7 beds, 10 baths, + office in 10,500 sq ft on an 18,590 sq ft lot. Soaring ceilings, magnificent chandelier, &amp; floating staircase create a grand entrance w/ glass wine cellar, formal living &amp; dining rooms. Floor plan flows openly between gourmet kitchen, family room, &amp; patio with a set of disappearing Fleetwood Pocket doors. Fireplaces in living, family, &amp; master suite add warmth to the contemporary feel, &amp; detailed wood paneling &amp; coffered ceilings enhance quality of design throughout. Master suite opens completely to sweeping ocean views &amp; private patio. Lower level feats. Old Hollywood style theater w/130" screen, surround sound, stadium seats, floor-to-ceiling suede panels, exercise pool, spa, gym, office, guest beds, open air patio, &amp; elevator access to take you from floor to floor. Perfect for entertaining - outdoor BBQ, seating, &amp; saltwater pool/spa complete this elegant estate.' property="og:description"/>,
 <meta content="https://videos.zillowstatic.com/production/07a58eebcafbfe833b92f17945131f2e251b5fe5/mp4_600k_landscape_z1/mp4_600k_landscape_z1.mp4" property="og:video"/>,
 <meta content="https://videos.zillowstatic.com/production/07a58eebcafbfe833b92f17945131f2e251b5fe5/mp4_600k_landscape_z1/mp4_600k_landscape_z1.mp4" property="og:video:secure_url"/>,
 <meta content="640" property="og:video:width"/>,
 <meta content="video/mp4" property="og:video:type"/>,
 <meta content="360" property="og:video:height"/>,
 <meta content="238648973530.apps.googleusercontent.com" name="google-signin-clientid"/>,
 <meta content="https://www.googleapis.com/auth/plus.login https://www.googleapis.com/auth/plus.profile.emails.read" name="google-signin-scope"/>,
 <meta content="http://zillow.com" name="google-signin-cookiepolicy"/>,
 <meta content="summary_large_image" name="twitter:card"/>,
 <meta content="@Zillow" name="twitter:site"/>,
 <meta content="@Zillow" name="twitter:creator"/>,
 <meta content="1630 Amalfi Dr" name="twitter:title"/>,
 <meta content="Stunning brand new Contemporary Cape Cod Estate in Palisades Riviera by Huntington Estate Homes w/ 7 beds, 10 baths, + office in 10,500 sq ft on an 18,590 sq ft lot. Soaring ceilings, magnificent chandelier, &amp;amp; floating staircase create a grand entrance w/ glass wine cellar, formal living &amp;amp; dining rooms. Floor plan flows openly between gourmet kitchen, family room, &amp;amp; patio with a set of disappearing Fleetwood Pocket doors. Fireplaces in living, family, &amp;amp; master suite add warmth to the contemporary feel, &amp;amp; detailed wood paneling &amp;amp; coffered ceilings enhance quality of design throughout. Master suite opens completely to sweeping ocean views &amp;amp; private patio. Lower level feats. Old Hollywood style theater w/130&amp;quot; screen, surround sound, stadium seats, floor-to-ceiling suede panels, exercise pool, spa, gym, office, guest beds, open air patio, &amp;amp; elevator access to take you from floor to floor. Perfect for entertaining - outdoor BBQ, seating, &amp;amp; saltwater pool/spa complete this elegant estate." name="twitter:description"/>,
 <meta content="http://photos2.zillowstatic.com/p_d/IS5ypcj39edbdc1000000000.jpg" name="twitter:image"/>,
 <meta content="1630 Amalfi Dr, Pacific Palisades, CA 90272" itemprop="name"/>,
 <meta content="USD" itemprop="priceCurrency"/>,
 <meta content="$12,895,000" itemprop="price"/>,
 <meta content="34.060605" itemprop="latitude"/>,
 <meta content="-118.501625" itemprop="longitude"/>]

我们可以使用属性来获取价格和其他信息:
In [22]: soup = Soup(response.text,"lxml")

In [23]: soup.select_one("meta[itemprop=price]")["content"]
Out[23]: '$12,895,000'

In [24]: soup.select_one("meta[name=twitter:description]")["content"]
Out[24]: 'Stunning brand new Contemporary Cape Cod Estate in Palisades Riviera by Huntington Estate Homes w/ 7 beds, 10 baths, + office in 10,500 sq ft on an 18,590 sq ft lot. Soaring ceilings, magnificent chandelier, &amp; floating staircase create a grand entrance w/ glass wine cellar, formal living &amp; dining rooms. Floor plan flows openly between gourmet kitchen, family room, &amp; patio with a set of disappearing Fleetwood Pocket doors. Fireplaces in living, family, &amp; master suite add warmth to the contemporary feel, &amp; detailed wood paneling &amp; coffered ceilings enhance quality of design throughout. Master suite opens completely to sweeping ocean views &amp; private patio. Lower level feats. Old Hollywood style theater w/130&quot; screen, surround sound, stadium seats, floor-to-ceiling suede panels, exercise pool, spa, gym, office, guest beds, open air patio, &amp; elevator access to take you from floor to floor. Perfect for entertaining - outdoor BBQ, seating, &amp; saltwater pool/spa complete this elegant estate.'
In [27]: soup.select_one("meta[itemprop=latitude]")["content"]
Out[27]: '34.060605'
In [28]: soup.select_one("meta[itemprop=longitude]")["content"]
Out[28]: '-118.501625'
In [29]: soup.select_one("meta[property=og:zillow_fb:address]")["content"]
Out[29]: '1630 Amalfi Dr, Pacific Palisades, CA 90272'

你能为我写一些示例代码让我入门吗?然后我会接受你的答案。我想要获取房产的销售价格和地址。 - Arefe
是的,我正在做这个,由于每次调用页面时ID都是唯一的,所以我们需要找到一个解决方法,让它更有趣。 - Padraic Cunningham
这真的很有帮助,我非常感激这个答案。 - Arefe

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接