如何检查网页元素是否可见

Question

如何检查网页元素是否可见

9

我正在使用Python和BeautifulSoup4进行开发，需要获取页面上可见的链接。以下是给定的代码：

soup = BeautifulSoup(html)
links = soup('a')

我希望能创建一个名为 is_visible 的方法，用于检查页面上是否显示了链接。

使用Selenium的解决方案

因为我也在使用 Selenium，所以我知道以下解决方案：

from selenium.webdriver import Firefox

firefox = Firefox()
firefox.get('https://google.com')
links = firefox.find_elements_by_tag_name('a')

for link in links:
    if link.is_displayed():
        print('{} => Visible'.format(link.text))
    else:
        print('{} => Hidden'.format(link.text))

firefox.quit()

性能问题

不幸的是，is_displayed方法和获取文本属性会执行http请求以检索这些信息。因此，当页面上有许多链接或需要多次执行此操作时，情况可能变得非常缓慢。

另一方面，BeautifulSoup可以在获取页面源代码后立即执行这些解析操作，而且速度为零。但我无法想出如何做到这一点。

- blueSurfer

我认为你可以做的最好的事情就是检查beautiful soup标记的'style'属性并解析该值，以查看其中是否包含'display:none'或类似的内容。 - Germano

7

不幸的是，Beautifulsoup只是一个HTML解析器，不是浏览器，因此它对页面如何呈现一无所知。我认为你需要使用Selenium。注意保持原意不变，使翻译更加通俗易懂。 - fasouto

就我的理解，@fasouto是正确的。beautifulsoup实际上并没有呈现任何内容，如果你阅读selenium文档，你会发现它可以自动化浏览器，而不仅仅是纯HTML。如果你真的想做这件事，我认为你必须坚持使用selenium自己的方法来完成。 - ddavison

元素使用内联、链接或内部CSS（input除外）隐藏。或者使用JS进行隐藏。然后你还有其他看不见的东西，比如白色背景上的白色文本。你到底想要检查什么？只是CSS display:none吗？那么你需要使用tinycss解析所有样式表，并查看规则是否与元素匹配。如果找到匹配项，请检查应用了哪些样式。困难在于层叠部分。此外，如果父元素被隐藏，则子元素也会被隐藏。因此，您必须检查该元素的所有父级是否也可见...或者只需坚持使用Selenium。 - allcaps

1

请查看此线程：https://dev59.com/b3I-5IYBdhLWcg3wR2Jr - alecxe

显示剩余4条评论

2个回答

1

据我所知，BeautifulSoup只能帮助您解析HTML文档的实际标记。如果这就是你需要的，那么你可以像这样做（是的，我已经知道它不完美）：

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc)


def is_visible_1(link):
    #do whatever in this function you can to determine your markup is correct
    try:
        style = link.get('style')
        if 'display' in style and 'none' in style:#or use a regular expression
            return False
    except Exception:
        return False
    return True

def is_visible_2(**kwargs):
    try:
        soup = kwargs.get('soup', None)
        del kwargs['soup']
        #Exception thrown if element can't be found using kwargs
        link = soup.find_all(**kwargs)[0]
        style = link.get('style')
        if 'display' in style and 'none' in style:#or use a regular expression
            return False
    except Exception:
        return False
    return True


#checks links that already exist, not *if* they exist
for link in soup.find_all('a'):
    print(str(is_visible_1(link)))

#checks if an element exists
print(str(is_visible_2(soup=soup,id='someID')))

BeautifulSoup没有考虑其他方面会告诉你元素是否可见，比如CSS、脚本和动态DOM更改。另一方面，Selenium确实告诉你一个元素是否正在被渲染，通常通过给定浏览器的可访问性API来实现。你必须决定牺牲准确性以换取速度是否值得追求。祝你好运！ :-)

- UVUCodeMonkey

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- ewwink · Accepted Answer

尝试使用find_elements_by_xpath和execute_script。

from selenium import webdriver

driver = webdriver.Chrome()

driver.get("https://www.google.com/?hl=en")
links = driver.find_elements_by_xpath('//a')

driver.execute_script('''
    var links = document.querySelectorAll('a');

    links.forEach(function(a) {
        a.addEventListener("click", function(event) {
            event.preventDefault();
        });
    });
''')

visible = []
hidden = []
for link in links:
    try:
        link.click()
        visible.append('{} => Visible'.format(link.text))
    except:
        hidden.append('{} => Hidden'.format(link.get_attribute('textContent')))

    #time.sleep(0.1)

print('\n'.join(visible))
print('===============================')
print('\n'.join(hidden))
print('===============================\nTotal links length: %s' % len(links))

driver.execute_script('alert("Finish")')