在Selenium WebDriver中设置PhantomJS的超时时间。

Question

在Selenium WebDriver中设置PhantomJS的超时时间。

pythonseleniumselenium-webdriverphantomjs

10

情境

我有一个简单的Python脚本，用于获取给定URL的HTML源代码：

    browser = webdriver.PhantomJS()
    browser.get(url)
    content = browser.page_source

偶尔，URL指向一个加载缓慢的外部资源（例如视频文件或非常缓慢的广告内容）的页面。

Webdriver将等待这些资源加载完毕后再完成.get（url）请求。

注意：由于某种其他原因，我需要使用PhantomJS而不是requests或urllib2进行此操作。

问题

我想在PhantomJS资源加载上设置超时，以便如果资源加载时间过长，浏览器会认为它不存在或其他情况。

这将使我能够根据浏览器加载的内容执行后续的.pagesource查询。

webdriver.PhantomJS文档非常薄弱，我在 SO 上没有找到类似的问题。

提前感谢！

- tohster

2个回答

11

PhantomJS提供了resourceTimeout选项，可能能够满足您的需求。我从这里引用文档：

(以毫秒为单位) 定义了请求任何资源超时时间，在此时间之后，该资源请求将停止尝试并继续处理页面的其他部分。当超时发生时，将调用onResourceTimeout回调函数。

所以在Ruby中，您可以这样做：

require 'selenium-webdriver'

capabilities = Selenium::WebDriver::Remote::Capabilities.phantomjs("phantomjs.page.settings.resourceTimeout" => "5000")
driver = Selenium::WebDriver.for :phantomjs, :desired_capabilities => capabilities

我相信Python，它类似于（未经测试，仅提供逻辑，您是Python开发人员，希望您能弄清楚）

driver = webdriver.PhantomJS(desired_capabilities={'phantomjs.page.settings.resourceTimeout': '5000'})

- Yi Zeng

太棒了！我之前还没想到如何使用desired_capabilities配置PhantomJS，所以这不仅回答了我的问题，而且更广泛地展示了如何使用你提供的API参考来配置PhantomJS。 - tohster

我正在设置这个，但仍然看到资源超时 - 这是使用selenium（通过python）和phantomjs 1.9.7。 - AdamC

我也在尝试在Python上实现这个，Adam。不是很多人知道。https://dev59.com/GX7aa4cB1Zd3GeqPwOQG#23180327?noredirect=1 - User

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- EwyynTomato · Accepted Answer

以下为详细解释，如果您觉得过长，请看TLDR:

当前版本的Selenium Ghostdriver（在PhantomJS 1.9.8中）忽略了resourceTimeout选项，使用webdriver的implicitly_wait()、set_page_load_timeout()方法，并将它们包装在try-except块中。

#Python
from selenium import webdriver
from selenium.common.exceptions import TimeoutException

browser = webdriver.PhantomJS()
browser.implicitly_wait(3)
browser.set_page_load_timeout(3)
try:
    browser.get("http://url_here")
except TimeoutException as e:
    #Handle your exception here
    print(e)
finally:
    browser.quit()

说明

为了将PhantomJS页面设置提供给Selenium，可以使用webdriver的DesiredCapabilities，例如：

#Python
from selenium import webdriver
cap = webdriver.DesiredCapabilities.PHANTOMJS
cap["phantomjs.page.settings.resourceTimeout"] = 1000
cap["phantomjs.page.settings.loadImages"] = False
cap["phantomjs.page.settings.userAgent"] = "faking it"
browser = webdriver.PhantomJS(desired_capabilities=cap)

//Java
DesiredCapabilities capabilities = DesiredCapabilities.phantomjs();
capabilities.setCapability("phantomjs.page.settings.resourceTimeout", 1000);
capabilities.setCapability("phantomjs.page.settings.loadImages", false);
capabilities.setCapability("phantomjs.page.settings.userAgent", "faking it");
WebDriver webdriver = new PhantomJSDriver(capabilities);

但是，这里有一个问题：截至今天（2014年12月11日），使用PhantomJS 1.9.8及其嵌入的Ghostdriver，Ghostdriver将不会应用resourceTimeout（请参见Github上的Ghostdriver问题＃380）。

为了解决这个问题，只需使用Selenium的超时函数/方法，并将webdriver的get方法包装在try-except / try-catch块中，例如：

#Python
from selenium import webdriver
from selenium.common.exceptions import TimeoutException

browser = webdriver.PhantomJS()
browser.implicitly_wait(3)
browser.set_page_load_timeout(3)
try:
    browser.get("http://url_here")
except TimeoutException as e:
    #Handle your exception here
    print(e)
finally:
    browser.quit()

//Java
WebDriver webdriver = new PhantomJSDriver();
webdriver.manage().timeouts()
        .pageLoadTimeout(3, TimeUnit.SECONDS)
        .implicitlyWait(3, TimeUnit.SECONDS);
try {
    webdriver.get("http://url_here");
} catch (org.openqa.selenium.TimeoutException e) {
    //Handle your exception here
    System.out.println(e.getMessage());
} finally {
    webdriver.quit();
}