如何使用Python和Selenium定位表格中的元素？

Question

如何使用Python和Selenium定位表格中的元素？

pythonpython-2.7seleniumselenium-webdriver

3

我试图使用selenium来帮助从一个使用javascript加载信息的网站中检索数据。您可以在这里找到链接：Animal population。该页面显示了一些可选择的字段，对于我的目的，我想检索2011年英国蜜蜂数量的数据。一旦选择了可选择的字段，页面将加载一个包含相应数据的表格。我只想获取“整个国家”的“人口”和“密度”数字。迄今为止，我的代码只选择了年份、国家和物种字段，并返回表格后，它定位了“整个国家”的字段（请随时为改进我的现有代码提供建议）。我还没有能够检索整个国家的人口和密度信息，我已经尝试使用xpath和“后续兄弟”，但它显示了异常来定位元素。我也不想依靠行/单元格的位置，因为我还将尝试获取接下来几年的这些信息，而表格字段将改变位置。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Firefox()
driver.get('https://www.oie.int/wahis_2/public/wahid.php/Countryinformation/Animalpopulation')



select = Select(driver.find_element_by_id('country6'))
select.select_by_value('GBR')
select = Select(driver.find_element_by_id('year'))
select.select_by_value('2011')

try:
    element = WebDriverWait(driver, 40).until(EC.presence_of_element_located((By.CLASS_NAME, "TableContent ")))
    print element
    select = Select(driver.find_element_by_id('selected_species'))
    select.select_by_value('1')
except:
    print "Not found"

country_td = driver.find_element(By.XPATH, '//td/b[text()="The Whole Country"]')

#population_td = driver.find_element(By.XPATH, '//td/b[text()="The Whole Country"]/following-sibling::text()')
print country_td.text

感谢您的帮助。

- Ana

2个回答

3

在你的示例中，following-sibling 的作用是查找类型为 <b> 的元素的下一个同级元素。但你想要的是类型为 <td> 的元素。但你也可以使用父元素。

人口的xpath： //b[text()="The Whole Country"]/../../td[4]/b 或者 //td/b[text()="The Whole Country"]/../following-sibling::td[1]/b 密度的xpath： //b[text()="The Whole Country"]/../../td[5]/b 或者 //td/b[text()="The Whole Country"]/../following-sibling::td[2]/b 这两种xpath都可以工作。使用..将xpath导航到父元素，然后您可以继续使用兄弟元素或使用td[X]定位元素。在这个示例中，您还可以省略每个xpath末尾的最后一个/b。

注意：这真的很糟糕，最好的做法是始终使用明确的属性来查找元素。但是，在此示例中，这并不总是可能。

另外，你应该先选择“Bees”，然后等待表格出现，因为在选择年份/国家和选择“Bees”之间，表格会重新加载，这可能导致数据不一致。

select = Select(driver.find_element_by_id('selected_species'))
select.select_by_value('1')
element = WebDriverWait(driver, 40).until(EC.presence_of_element_located((By.CLASS_NAME, "TableContent ")))
print element

提示：有一款名为XPath Helper的Chrome扩展程序，可以在您访问的网站上测试您的XPath。

- Robert G

非常好，谢谢！我之所以没有首选蜜蜂，是因为在选择蜜蜂后，表格会再次加载，并且物种的预选字段将再次选择“所有物种”。 - Ana

好的，我明白。但是我担心你可能会遇到问题，比如获取错误的数据，因为在使用xpath之前表格没有重新加载，或者遇到NoSuchElementException异常。如果元素具有唯一的属性，那么防止这种情况发生将更容易。 - Robert G

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Guy · Accepted Answer

你需要通过使用following-sibling来获得数据，但需要先向上一级。

population = driver.find_element(By.XPATH, ('//td[b[text()="The Whole Country"]]/following-sibling::td[1]')
density = driver.find_element(By.XPATH, ('//td[b[text()="The Whole Country"]]/following-sibling::td[2]')

或者使用country_td。

population = country_td.find_element(By.XPATH, ('/../following-sibling::td[1]')
density = country_td.find_element(By.XPATH, ('/../following-sibling::td[2]')