Selenium WebDriver无法完全加载页面(Python)

4

我一直在尝试使用Python编写的Selenium Webdriver来登录这个网站(登录页面)

为此,我在Python中执行了以下操作:

from selenium import webdriver 
import bs4 as bs


driver = webdriver.Chrome()
driver.get('https://app.chatra.io/')

我随后尝试使用Beautiful Soup进行解析:
html = driver.execute_script('return document.documentElement.outerHTML')
soup = bs.BeautifulSoup(html, 'html.parser')
print(soup.prettify)

主要问题在于页面从未完全加载。当我在自己的浏览器中加载页面时,一切都正常。然而,当selenium webdriver尝试加载它时,它似乎只停在了一半。

有任何想法吗?有没有任何想法来解决这个问题或者了解更多信息的地方呢?

2个回答

3
首先,我在最新的Chrome中也可以复现这个问题(使用chromedriver 2.34 - 目前也是最新版本)- 目前还不确定发生了什么。解决方法:Firefox对我来说完美地解决了这个问题
而且,在driver.get()和HTML解析之间,我会添加一个额外的步骤 - 显式等待,让页面正确加载直到所需条件成立:显式等待
import bs4 as bs
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Firefox()
driver.get('https://app.chatra.io/')

wait = WebDriverWait(driver, 10)
wait.until(EC.visibility_of_element_located((By.ID, "signin-email")))

html = driver.execute_script('return document.documentElement.outerHTML')
soup = bs.BeautifulSoup(html, 'html.parser')
print(soup.prettify())

请注意,您还需要调用prettify() - 这是一个方法。

1
以下是您所面临问题的几个方面:
  • As you are trying to take help of BeautifulSoup so if you try to use urlopen from urllib.request the error says it all :

    urllib.error.HTTPError: HTTP Error 403: Forbidden
    

    Which means urllib.request is getting detected and HTTP Error 403: Forbidden is raised. Hence using webdriver from selenium makes sense.

  • Next, when you take help of ChromeDriver and Chrome initially the Website opens and renders. But soon ChromeDriver being a WebDriver is detected and ChromeDriver is unable to parse the <head> & <body> tags. You see the minimal header as :

    <!DOCTYPE html>
    <html xmlns="http://www.w3.org/1999/xhtml" class="supports cssfilters flexwrap chrome webkit win hover web"></html>
    
  • Finally, when you take help of GeckoDriver and Firefox Quantum the Website opens and renders properly as follows :

    Code Block :

    from selenium import webdriver
    from bs4 import BeautifulSoup as soup
    
    driver = webdriver.Firefox(executable_path=r'C:\Utility\BrowserDrivers\geckodriver.exe')
    driver.get('https://app.chatra.io/')
    html = driver.execute_script('return document.documentElement.outerHTML')
    pagesoup = soup(html, "html.parser")
    print(pagesoup)
    

    Console Output :

    <html class="supports cssfilters flexwrap firefox gecko win hover web"><head>
    <link class="" href="https://app.chatra.io/b281cc6b75916e26b334b5a05913e3eb18fd3a4d.css?meteor_css_resource=true&amp;_g_app_v_=51" rel="stylesheet" type="text/css"/>
    <meta charset="utf-8"/>
    <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
    <meta content="width=device-width, initial-scale=1, maximum-scale=1, minimum-scale=1, user-scalable=no, viewport-fit=cover" name="viewport"/>
    .
    .
    .
    <em>··· Chatra</em>
    .
    .
    .
    </div></body></html>
    
  • Adding prettify to the soup extraction :

    Code Block :

    from selenium import webdriver
    from bs4 import BeautifulSoup as soup
    
    driver = webdriver.Firefox(executable_path=r'C:\Utility\BrowserDrivers\geckodriver.exe')
    driver.get('https://app.chatra.io/')
    html = driver.execute_script('return document.documentElement.outerHTML')
    pagesoup = soup(html, "html.parser")
    print(pagesoup.prettify)
    

    Console Output :

    <bound method Tag.prettify of <html class="supports cssfilters flexwrap firefox gecko win hover web"><head>
    <link class="" href="https://app.chatra.io/b281cc6b75916e26b334b5a05913e3eb18fd3a4d.css?meteor_css_resource=true&amp;_g_app_v_=51" rel="stylesheet" type="text/css"/>
    <meta charset="utf-8"/>
    <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
    <meta content="width=device-width, initial-scale=1, maximum-scale=1, minimum-scale=1, user-scalable=no, viewport-fit=cover" name="viewport"/>
    .
    .
    .
    <em>··· Chatra</em>
    .
    .
    .
    </div></body></html>>
    
  • Even you can use Selenium's page_source method as follows :

    Code Block :

    from selenium import webdriver
    
    driver = webdriver.Firefox(executable_path=r'C:\Utility\BrowserDrivers\geckodriver.exe')
    driver.get('https://app.chatra.io/')
    print(driver.page_source)
    

    Console Output :

<html class="supports cssfilters flexwrap firefox gecko win hover web">

<head>
  <link rel="stylesheet" type="text/css" class="" href="https://app.chatra.io/b281cc6b75916e26b334b5a05913e3eb18fd3a4d.css?meteor_css_resource=true&amp;_g_app_v_=51">
  <meta charset="utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1, minimum-scale=1, user-scalable=no, viewport-fit=cover">

  <!-- platform specific stuff -->
  <meta name="msapplication-tap-highlight" content="no">
  <meta name="apple-mobile-web-app-capable" content="yes">

  <!-- favicon -->
  <link rel="shortcut icon" href="/static/favicon.ico">

  <!-- win8 tile -->
  <meta name="msapplication-TileImage" content="/static/win-tile.png">
  <meta name="msapplication-TileColor" content="#ffffff">
  <meta name="application-name" content="Chatra">

  <!-- apple touch icon -->
  <!--<link rel="apple-touch-icon" sizes="256x256" href="/static/?????.png">-->

  <title>··· Chatra</title>

  <style>
    body {
      background: #f6f5f7
    }
  </style>

  <style type="text/css"></style>
</head>

<body>



  <script async="" src="https://www.google-analytics.com/analytics.js"></script>
  <script type="text/javascript" src="/meteor_runtime_config.js"></script>

  <script type="text/javascript" src="https://app.chatra.io/9153feecdc706adbf2c71253473a6aa62c803e45.js?meteor_js_resource=true&amp;_g_app_v_=51"></script>



  <div class="body body-layout">
    <div class="body-layout__main main-layout">
      <aside class="main-layout__left-sidebar">
        <div class="left-sidebar-layout">
        </div>
      </aside>
      <div class="main-layout__content">
        <div class="content-layout">


          <main class="content-layout__main is-no-fades js-popover-boundry js-main">

            <div class="center loading loading--light">
              <div class="content-padding nothing">


                <em>··· Chatra</em>


              </div>
            </div>

          </main>
        </div>
      </div>
    </div>
  </div>
</body>
</html>


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接