使用BeautifulSoup和Python触发onclick事件

Question

使用BeautifulSoup和Python触发onclick事件

javascriptjquerypythonbeautifulsouppyqt4

8

我正在尝试从以下网站获取塞浦路斯所有住宿的链接： http://www.zoover.nl/cyprus 目前，我只能检索到已经显示的前15个链接。所以现在我必须调用“volgende”链接上的单击事件。然而，我不知道如何做到这一点，在源代码中，我无法跟踪到调用函数使用类似于这里发布的方法： Issues with invoking "on click event" on the html page using beautiful soup in Python 我只需要实现“点击”发生的步骤，这样我就可以获取接下来的15个链接，以此类推。

是否有人知道如何帮忙？先谢谢了！

编辑：我的代码现在看起来像这样：

def getZooverLinks(country):
    zooverWeb = "http://www.zoover.nl/"
    url = zooverWeb + country
    parsedZooverWeb = parseURL(url)
    driver = webdriver.Firefox()
    driver.get(url)

    button = driver.find_element_by_class_name("next")
    links = []
    for page in xrange(1,3):
        for item in parsedZooverWeb.find_all(attrs={'class': 'blue2'}):
            for link in item.find_all('a'):
                newLink = zooverWeb + link.get('href')
                links.append(newLink)
        button.click()'

我遇到以下错误：

selenium.common.exceptions.StaleElementReferenceException: Message: 元素不再附加到 DOM 上 Stacktrace: at fxdriver.cache.getElementAt (resource://fxdriver/modules/web-element-cache.js:8956) at Utils.getElementAt (file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/fxdriver@googlecode.com/components/command-processor.js:8546) at fxdriver.preconditions.visible (file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/fxdriver@googlecode.com/components/command-processor.js:9585) at DelayedCommand.prototype.checkPreconditions_ (file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/fxdriver@googlecode.com/components/command-processor.js:12257) at DelayedCommand.prototype.executeInternal_/h (file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/fxdriver@googlecode.com/components/command-processor.js:12274) at DelayedCommand.prototype.executeInternal_ (file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/fxdriver@googlecode.com/components/command-processor.js:12279) at DelayedCommand.prototype.execute/< (file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/fxdriver@googlecode.com/components/command-processor.js:12221)

我感到困惑 :/

- steph

2个回答

6

我试过以下代码，并成功加载了下一页。希望这也能帮到您。代码:

from selenium import webdriver
import os
chromedriver = "C:\Users\pappuj\Downloads\chromedriver"
os.environ["webdriver.chrome.driver"] = chromedriver
driver = webdriver.Chrome(chromedriver)
url='http://www.zoover.nl/cyprus'
driver.get(url)
driver.find_element_by_class_name('next').click()

谢谢

- user4901185

这与原问题有关吗？ - JabberwockyDecompiler

只有在按钮点击后完成后获取soap结果，你可以使用以下代码：soup_level2 = BeautifulSoup(driver.page_source, 'html.parser') - Amirkhm

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Joost · Accepted Answer

虽然使用Beautifulsoup的 evaluateJavaScript 方法做这个可能很诱人，但Beautifulsoup最终是一个解析器而不是一个交互式的网络浏览客户端。

你应该认真考虑使用selenium来解决这个问题，就像在这个答案中简要展示的那样。selenium有相当好的Python绑定可用。

你可以使用selenium查找元素并单击它，然后将页面传递给Beautifulsoup，并使用现有代码获取链接。

或者，你可以使用在onclick处理程序中列出的Javascript代码。我从源代码中提取了这个: EntityQuery('Ns=pPopularityScore%7c1&No=30&props=15292&dims=530&As=&N=0+3+10500915');。每页的No参数增加15，但props让我猜测。尽管如此，我建议不要深入研究这个，而是像客户端一样与网站进行交互，使用selenium。这更能应对他们方面的变化。