Python Selenium等待多个元素加载完成

10

我有一个列表,它是通过 AJAX 动态加载的。一开始加载时,它的代码是这样的:

<ul><li class="last"><a class="loading" href="#"><ins>&nbsp;</ins>Загрузка...</a></li></ul>

当列表加载时,所有的li和a都会被更改。而且这里始终有多于1个li。

<ul class="ltr">
<li id="t_b_68" class="closed" rel="simple">
<a id="t_a_68" href="javascript:void(0)">Category 1</a>
</li>
<li id="t_b_64" class="closed" rel="simple">
<a id="t_a_64" href="javascript:void(0)">Category 2</a>
</li>
...

我需要检查列表是否已加载,所以我检查它是否具有多个li。

到目前为止,我尝试过:

1)自定义等待条件

class more_than_one(object):
    def __init__(self, selector):
        self.selector = selector

    def __call__(self, driver):
        elements = driver.find_elements_by_css_selector(self.selector)
        if len(elements) > 1:
            return True
        return False

...

try:
        query = WebDriverWait(driver, 30).until(more_than_one('li'))
    except:
        print "Bad crap"
    else:
        # Then load ready list

2)基于find_elements_by的自定义函数

def wait_for_several_elements(driver, selector, min_amount, limit=60):
    """
    This function provides awaiting of <min_amount> of elements found by <selector> with
    time limit = <limit>
    """
    step = 1   # in seconds; sleep for 500ms
    current_wait = 0
    while current_wait < limit:
        try:
            print "Waiting... " + str(current_wait)
            query = driver.find_elements_by_css_selector(selector)
            if len(query) > min_amount:
                print "Found!"
                return True
            else:
                time.sleep(step)
                current_wait += step
        except:
            time.sleep(step)
            current_wait += step

    return False

因为当前元素传递给此函数后,驱动程序(driver)会在DOM中丢失,所以这无法正常工作。 UL没有改变,但是由于某种原因Selenium无法再找到它。

3)显示等待。这太糟糕了,因为有些列表会立即加载,而有些则需要加载10多秒钟。如果我使用此技术,我必须在每个出现的最大时间内等待,这对我的情况非常不利。

4)我也无法正确地使用XPATH等待子元素。这只是期望ul出现。

try:
    print "Going to nested list..."
    #time.sleep(WAIT_TIME)
    query = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, './/ul')))
    nested_list = child.find_element_by_css_selector('ul')
请告诉我确保特定元素加载了多个后代元素的正确方法。 注:所有这些检查和搜索都应相对于当前元素进行。

  1. 为什么显式等待“糟糕”?它们就是为了这个目的而设计的。
  2. 您正在检查超过一个if len(elements)>1:,但您只传递了li作为选择器,并使用物理驱动程序搜索元素。因此,它肯定会搜索整个文档吧?您应该首先获取ul,然后在其中搜索。
- Arran
我认为他所说的显式等待是指硬编码的 time.sleep() 调用。 - E.Z.
谢谢你,@Arran!在我的动态列表中,在树遍历期间访问选择器\xpathes是否有更有效的方法,而不是从一开始就将它们存储为文本,并在每次递归时添加和弹出选择器\xpathes的部分? - Ragnar Lodbrok
是的,@Mr.E。我试图等待猜测小的时间段(比如5秒),但在更深入的列表遍历中,在某些情况下它失败了。 - Ragnar Lodbrok
如果在情况#2中,我将元素传递给函数而不是驱动程序,则此元素在第一个“未找到情况”(未在DOM异常中找到)后丢失。这很奇怪,因为我搜索的根列表(在其中搜索)在加载过程中没有被更改。只有它的内容。 - Ragnar Lodbrok
5个回答

6

首先,这些元素是AJAX元素。

现在,根据要求定位所有所需的元素并创建一个列表,最简单的方法是使用WebDriverWait等待visibility_of_all_elements_located(),您可以使用以下任何一种定位策略(Locator Strategies)

  • Using CSS_SELECTOR:

    elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "ul.ltr li[id^='t_b_'] > a[id^='t_a_'][href]")))
    
  • Using XPATH:

    elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//ul[@class='ltr']//li[starts-with(@id, 't_b_')]/a[starts-with(@id, 't_a_') and starts-with(., 'Category')]")))
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
如果您的使用场景是等待某个特定数量的元素加载完成,例如 10 个元素,您可以使用 lambda 函数,如下所示:
  • Using >:

    myLength = 9
    WebDriverWait(driver, 20).until(lambda driver: len(driver.find_elements_by_xpath("//ul[@class='ltr']//li[starts-with(@id, 't_b_')]/a[starts-with(@id, 't_a_') and starts-with(., 'Category')]")) > int(myLength))
    
  • Using ==:

    myLength = 10
    WebDriverWait(driver, 20).until(lambda driver: len(driver.find_elements_by_xpath("//ul[@class='ltr']//li[starts-with(@id, 't_b_')]/a[starts-with(@id, 't_a_') and starts-with(., 'Category')]")) == int(myLength))
    
你可以在How to wait for number of elements to be loaded using Selenium and Python中找到相关讨论。
参考资料:
你可以在以下链接找到一些相关的详细讨论:
- Getting specific elements in selenium - Cannot find table element from div element in selenium python - Extract text from an aria-label selenium webdriver (python)

你如何设置数量? - User
1
@用户,请查看更新后的答案并告知状态。 - undetected Selenium
1
由于某些原因,在XPath中使用联合运算符(“|”)等待多个元素只能使用driver.find_elements_by_xpath而不能使用expected_conditions.visibility_of_all_elements_located - marvinsxtr

1
我创建了AllEc,它基本上是通过利用WebDriverWait.until逻辑来实现的。
这将等待直到超时发生或者所有元素都被找到。
from typing import Callable
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import StaleElementReferenceException

class AllEc(object):
    def __init__(self, *args: Callable, description: str = None):
        self.ecs = args
        self.description = description

    def __call__(self, driver):
        try:
            for fn in self.ecs:
                if not fn(driver):
                    return False
            return True
        except StaleElementReferenceException:
            return False

# usage example:
wait = WebDriverWait(driver, timeout)
ec1 = EC.invisibility_of_element_located(locator1)
ec2 = EC.invisibility_of_element_located(locator2)
ec3 = EC.invisibility_of_element_located(locator3)

all_ec = AllEc(ec1, ec2, ec3, description="Required elements to show page has loaded.") 
found_elements = wait.until(all_ec, "Could not find all expected elements")

另外,我创建了AnyEc来查找多个元素,但仅返回找到的第一个元素。
class AnyEc(object):
    """
    Use with WebDriverWait to combine expected_conditions in an OR.

    Example usage:

        >>> wait = WebDriverWait(driver, 30)
        >>> either = AnyEc(expectedcondition1, expectedcondition2, expectedcondition3, etc...)
        >>> found = wait.until(either, "Cannot find any of the expected conditions")
    """

    def __init__(self, *args: Callable, description: str = None):
        self.ecs = args
        self.description = description

    def __iter__(self):
        return self.ecs.__iter__()

    def __call__(self, driver):
        for fn in self.ecs:
            try:
                rt = fn(driver)
                if rt:
                    return rt
            except TypeError as exc:
                raise exc
            except Exception as exc:
                # print(exc)
                pass

    def __repr__(self):
        return " ".join(f"{e!r}," for e in self.ecs)

    def __str__(self):
        return f"{self.description!s}"

either = AnyEc(ec1, ec2, ec3)
found_element = wait.until(either, "Could not find any of the expected elements")

最后,如果可能的话,您可以尝试等待Ajax完成。这在所有情况下都不是有用的 - 例如,Ajax总是活动的。在Ajax运行并完成的情况下,它可以起作用。还有一些Ajax库不设置active属性,因此请确保您可以依赖它。
def is_ajax_complete(driver)
    rt = driver.execute_script("return jQuery.active", *args)
    return rt == 0

wait.until(lambda driver: is_ajax_complete(driver), "Ajax did not finish")


0

这是我解决问题的方法,我想要等待一定数量的帖子通过 AJAX 完全加载

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

# create a new Chrome session
driver = webdriver.Chrome()

# navigate to your web app.
driver.get("http://my.local.web")

# get the search button
seemore_button = driver.find_element_by_id("seemoreID")

# Count the cant of post
seemore_button.click()

# Wait for 30 sec, until AJAX search load the content
WebDriverWait(driver,30).until(EC.visibility_of_all_elements_located(By.CLASS_NAME, "post"))) 

# Get the list of post
listpost = driver.find_elements_by_class_name("post")

0

(1) 你没有提到它出现了什么错误

(2) 既然你提到了

...因为驱动程序(传递给此函数的当前元素)...

我会假设这实际上是一个WebElement。在这种情况下,不要将对象本身传递给您的方法,而是只需传递查找该WebElement的选择器(在您的情况下,ul)。如果“驱动程序在DOM中迷失了方向”,那么在while current_wait < limit:循环内重新创建它可能会缓解问题

(3) 是的,time.sleep()只能帮助你到这里

(4) 由于动态加载的li元素包含class=closed,所以可以尝试使用(By.CSS_SELECTOR, 'ul > li.closed')代替(By.XPATH, './/ul')(有关CSS选择器的更多详细信息,请参见此处


(4) 肯定会起作用!非常感谢。 我下班后大约一个小时会提供更多有关基础部分的信息。 - Ragnar Lodbrok
根据您的建议,我已经添加了解决方案和代码示例,并解释了答案。非常感谢您!真希望我能给您的答案点赞。 附注:我在最初的示例中误用了驱动程序和其作用范围。 - Ragnar Lodbrok

0

记住了 Mr.E.Arran 的评论,我完全使用 CSS 选择器来遍历我的列表。棘手的部分是关于我的列表结构和标记(更改类等),以及在遍历期间创建所需的选择器并将它们保存在内存中。

我通过搜索任何非加载状态的元素来处理等待多个元素的情况。您也可以像这样使用“:nth-child”选择器:

#in for loop with enumerate for i    
selector.append(' > li:nth-child(%i)' % (i + 1))  # identify child <li> by its order pos

这是我对示例进行了详细注释的代码解决方案:

def parse_crippled_shifted_list(driver, frame, selector, level=1, parent_id=0, path=None):
    """
    Traversal of html list of special structure (you can't know if element has sub list unless you enter it).
    Supports start from remembered list element.

    Nested lists have classes "closed" and "last closed" when closed and "open" and "last open" when opened (on <li>).
    Elements themselves have classes "leaf" and "last leaf" in both cases.
    Nested lists situate in <li> element as <ul> list. Each <ul> appears after clicking <a> in each <li>.
    If you click <a> of leaf, page in another frame will load.

    driver - WebDriver; frame - frame of the list; selector - selector to current list (<ul>);
    level - level of depth, just for console output formatting, parent_id - id of parent category (in DB),
    path - remained path in categories (ORM objects) to target category to start with.
    """

    # Add current level list elements
    # This method selects all but loading. Just what is needed to exclude.
    selector.append(' > li > a:not([class=loading])')

    # Wait for child list to load
    try:
        query = WebDriverWait(driver, WAIT_LONG_TIME).until(
            EC.presence_of_all_elements_located((By.CSS_SELECTOR, ''.join(selector))))

    except TimeoutException:
        print "%s timed out" % ''.join(selector)

    else:
        # List is loaded
        del selector[-1]  # selector correction: delete last part aimed to get loaded content
        selector.append(' > li')

        children = driver.find_elements_by_css_selector(''.join(selector))  # fetch list elements

        # Walk the whole list
        for i, child in enumerate(children):

            del selector[-1]  # delete non-unique li tag selector
            if selector[-1] != ' > ul' and selector[-1] != 'ul.ltr':
                del selector[-1]

            selector.append(' > li:nth-child(%i)' % (i + 1))  # identify child <li> by its order pos
            selector.append(' > a')  # add 'li > a' reference to click

            child_link = driver.find_element_by_css_selector(''.join(selector))

            # If we parse freely further (no need to start from remembered position)
            if not path:
                # Open child
                try:
                    double_click(driver, child_link)
                except InvalidElementStateException:
                        print "\n\nERROR\n", InvalidElementStateException.message(), '\n\n'
                else:
                    # Determine its type
                    del selector[-1]  # delete changed and already useless link reference
                    # If <li> is category, it would have <ul> as child now and class="open"
                    # Check by class is priority, because <li> exists for sure.
                    current_li = driver.find_element_by_css_selector(''.join(selector))

                    # Category case - BRANCH
                    if current_li.get_attribute('class') == 'open' or current_li.get_attribute('class') == 'last open':
                        new_parent_id = process_category_case(child_link, parent_id, level)  # add category to DB
                        selector.append(' > ul')  # forward to nested list
                        # Wait for nested list to load
                        try:
                            query = WebDriverWait(driver, WAIT_LONG_TIME).until(
                                EC.presence_of_all_elements_located((By.CSS_SELECTOR, ''.join(selector))))

                        except TimeoutException:
                            print "\t" * level,  "%s timed out (%i secs). Failed to load nested list." %\
                                                 ''.join(selector), WAIT_LONG_TIME
                        # Parse nested list
                        else:
                            parse_crippled_shifted_list(driver, frame, selector, level + 1, new_parent_id)

                    # Page case - LEAF
                    elif current_li.get_attribute('class') == 'leaf' or current_li.get_attribute('class') == 'last leaf':
                        process_page_case(driver, child_link, level)
                    else:
                        raise Exception('Damn! Alien class: %s' % current_li.get_attribute('class'))

            # If it's required to continue from specified category
            else:
                # Check if it's required category
                if child_link.text == path[0].name:
                    # Open required category
                    try:
                        double_click(driver, child_link)

                    except InvalidElementStateException:
                            print "\n\nERROR\n", InvalidElementStateException.msg, '\n\n'

                    else:
                        # This element of list must be always category (have nested list)
                        del selector[-1]  # delete changed and already useless link reference
                        # If <li> is category, it would have <ul> as child now and class="open"
                        # Check by class is priority, because <li> exists for sure.
                        current_li = driver.find_element_by_css_selector(''.join(selector))

                        # Category case - BRANCH
                        if current_li.get_attribute('class') == 'open' or current_li.get_attribute('class') == 'last open':
                            selector.append(' > ul')  # forward to nested list
                            # Wait for nested list to load
                            try:
                                query = WebDriverWait(driver, WAIT_LONG_TIME).until(
                                    EC.presence_of_all_elements_located((By.CSS_SELECTOR, ''.join(selector))))

                            except TimeoutException:
                                print "\t" * level, "%s timed out (%i secs). Failed to load nested list." %\
                                                     ''.join(selector), WAIT_LONG_TIME
                            # Process this nested list
                            else:
                                last = path.pop(0)
                                if len(path) > 0:  # If more to parse
                                    print "\t" * level, "Going deeper to: %s" % ''.join(selector)
                                    parse_crippled_shifted_list(driver, frame, selector, level + 1,
                                                                parent_id=last.id, path=path)
                                else:  # Current is required
                                    print "\t" * level,  "Returning target category: ", ''.join(selector)
                                    path = None
                                    parse_crippled_shifted_list(driver, frame, selector, level + 1, last.id, path=None)

                        # Page case - LEAF
                        elif current_li.get_attribute('class') == 'leaf':
                            pass
                else:
                    print "dummy"

        del selector[-2:]

仅仅是说一下,这不完全是一个跨浏览器的解决方案。nth-child 在 IE < 9 上会出问题。 - Arran

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接