Selenium(Python)引发了StaleElementReferenceException并且不会继续下载所有webdriver.find_elements_by_partial_link_text()。

4
我正在使用Python的Selenium绑定来下载页面上包含字符串“VS”的所有链接。问题在于列表中的第二个项目不是有效的网页(返回404错误),而且如果我手动单击损坏的链接,它会返回:
error.html - 404 error page does not exist.

如果我运行以下代码,它会引发错误。

import selenium
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import StaleElementReferenceException, NoSuchElementException, NoSuchWindowException

# To prevent download dialog
profile = webdriver.FirefoxProfile()
profile.set_preference('browser.download.folderList', 2)  # custom location
profile.set_preference('browser.download.manager.showWhenStarting', False)
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'application/msword, application/vnd.ms-powerpoint')

driver = webdriver.Firefox(profile)
driver.get("http://www.SOME_URL.com/")

links = driver.find_elements_by_partial_link_text("VS")

for link in links:
    url = link.get_attribute("href")
    try:
        driver.get(url)
    except StaleElementReferenceException:
       pass

错误:

Traceback (most recent call last):
  File "C:\Users\lskrinjar\Dropbox\work\preracun\src\web_data_mining\get_files_from_web.py", line 79, in <module>
    url = link.get_attribute("href")
  File "C:\Python27\lib\site-packages\selenium-2.44.0-py2.7.egg\selenium\webdriver\remote\webelement.py", line 93, in get_attribute
    resp = self._execute(Command.GET_ELEMENT_ATTRIBUTE, {'name': name})
  File "C:\Python27\lib\site-packages\selenium-2.44.0-py2.7.egg\selenium\webdriver\remote\webelement.py", line 385, in _execute
    return self._parent.execute(command, params)
  File "C:\Python27\lib\site-packages\selenium-2.44.0-py2.7.egg\selenium\webdriver\remote\webdriver.py", line 173, in execute
    self.error_handler.check_response(response)
  File "C:\Python27\lib\site-packages\selenium-2.44.0-py2.7.egg\selenium\webdriver\remote\errorhandler.py", line 166, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: Element not found in the cache - perhaps the page has changed since it was looked up
Stacktrace:
    at fxdriver.cache.getElementAt (resource://fxdriver/modules/web-element-cache.js:8329:1)
    at Utils.getElementAt (file:///c:/users/lskrin~1/appdata/local/temp/tmppsn7tm/extensions/fxdriver@googlecode.com/components/command-processor.js:7922:10)
    at WebElement.getElementAttribute (file:///c:/users/lskrin~1/appdata/local/temp/tmppsn7tm/extensions/fxdriver@googlecode.com/components/command-processor.js:11107:31)
    at DelayedCommand.prototype.executeInternal_/h (file:///c:/users/lskrin~1/appdata/local/temp/tmppsn7tm/extensions/fxdriver@googlecode.com/components/command-processor.js:11635:16)
    at fxdriver.Timer.prototype.setTimeout/<.notify (file:///c:/users/lskrin~1/appdata/local/temp/tmppsn7tm/extensions/fxdriver@googlecode.com/components/command-processor.js:548:5)
1个回答

0
链接列表是网页上的元素列表。一旦您离开该页面,这些元素就不再存在(您有一个列表,其中每个元素指向一个不存在的网络元素)。您应该使用URL字符串列表而不是引用元素列表:
list_of_links = []
links = driver.find_elements_by_partial_link_text("VS")

for link in links:
    list_of_links.append(link.get_attribute("href"))

for string_link in list_of_links:
    try:
        driver.get(string_link)
    except StaleElementReferenceException:
       pass

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接