使用Python请求登录网站

Question

使用Python请求登录网站

pythonhtmlauthenticationweb-scrapingpython-requests

12

我正在尝试使用requests来抓取数据登录https://www.voxbeam.com/login。我是Python初学者，主要只是按照教程做了一些练习，还自己用BeautifulSoup进行了一些网页数据抓取。

看着HTML代码：

<form id="loginForm" action="https://www.voxbeam.com//login" method="post" autocomplete="off">

<input name="userName" id="userName" class="text auto_focus" placeholder="Username" autocomplete="off" type="text">

<input name="password" id="password" class="password" placeholder="Password" autocomplete="off" type="password">

<input id="challenge" name="challenge" value="78ed64f09c5bcf53ead08d967482bfac" type="hidden">

<input id="hash" name="hash" type="hidden">

我知道应该使用方法post，并发送userName和password。

我正在尝试这个：

import requests
import webbrowser

url = "https://www.voxbeam.com/login"
login = {'userName': 'xxxxxxxxx',
         'password': 'yyyyyyyyy'}

print("Original URL:", url)

r = requests.post(url, data=login)

print("\nNew URL", r.url)
print("Status Code:", r.status_code)
print("History:", r.history)

print("\nRedirection:")
for i in r.history:
    print(i.status_code, i.url)

# Open r in the browser to check if I logged in
new = 2  # open in a new tab, if possible
webbrowser.open(r.url, new=new)

成功登录后，我期望能够进入r仪表板的URL，以便开始抓取所需的数据。

当我以身份验证信息替换xxxxxx和yyyyyy运行代码时，我会得到以下输出：

Original URL: https://www.voxbeam.com/login

New URL https://www.voxbeam.com/login
Status Code: 200
History: []

Redirection:

Process finished with exit code 0

我在浏览器中打开了一个新选项卡，网址是www.voxbeam.com/login。

代码有问题吗？HTML中漏掉了什么吗？期望从r中获取仪表板URL并重定向尝试在浏览器选项卡中打开URL以直观检查响应，还是我应该用不同的方法来做这件事情？

我已经阅读了这里几天的很多类似的问题，但似乎每个网站的身份验证过程都略有不同，而我也查看了http://docs.python-requests.org/en/latest/user/authentication/，其中描述了其他方法，但我没有在HTML中找到任何提示表明我应该使用那些方法之一而不是post。

我也尝试过。

r = requests.get(url, auth=('xxxxxxxx', 'yyyyyyyy'))

但它似乎也不起作用。

- Pablo

1

你应该提交所有的表单字段（用户名、密码、挑战、哈希）。 - t.m.adam

4个回答

3

尝试按以下方式更清晰地指定URL：

  url=https://www.voxbeam.com//login?id=loginForm

这将在登录表单上设置焦点，以便应用POST方法。

- Mohammad Jbber

1

根据网站处理登录过程的方式，这可能会很棘手，但我的做法是使用代理应用程序Charles监听我手动登录时浏览器发送到网站服务器的请求。然后，我将在Charles中显示的完全相同的标头和Cookie复制到自己的Python代码中，这样就可以工作了！我认为Cookie和标头用于防止机器人登录。

- Reza Hosseini

0

from webbot import Browser

web = Browser() # this will navigate python to browser

link = web.go_to('enter your login page url') 
#remember click the login button then place here

login = web.click('login') #if you have login button in your web , if you have signin button then replace login with signin, in my case it is login


id = web.type('enter your Id/Username/Emailid',into='Id/Username/Emilid',id='txtLoginId') #id='txtLoginId' this varies from web to web find this by inspecting the Id/Username/Emailid Button, in my case it is txtLoginId

next = web.click('NEXT', tag='span')

passw = web.type('Enter Your Password', into='Password', id='txtpasswrd')
#id='txtpasswrd' (this also varies from web to web similiarly inspect the Password Button)in my case it is txtpasswrd

home = web.click('NEXT', id="fa fa-home", tag='span') 
# id="fa fa-home" (Now inspect all necessary Buttons and move accordingly) in my case it is fa fa-home
next11 = web.click('NEXT', tag='span')

- Parajuli Ram Prasad

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- bl79 · Accepted Answer

如上所述，您应该发送表单所有字段的值。这些可以在浏览器的Web检查器中找到。此表单会发送2个额外的隐藏值：

url = "https://www.voxbeam.com//login"
data = {'userName':'xxxxxxxxx','password':'yyyyyyyyy','challenge':'zzzzzzzzz','hash':''}  
# note that in email have encoded '@' like uuuuuuu%40gmail.com      

session = requests.Session()
r = session.post(url, headers=headers, data=data)

此外，许多网站对机器人采取了保护措施，例如隐藏表单字段、JavaScript、发送编码值等。作为解决方法，您可以：

1）使用手动登录的 Cookies：

url = "https://www.voxbeam.com"
headers = {'user-agent': "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36"}
cookies = {'PHPSESSID':'zzzzzzzzzzzzzzz', 'loggedIn':'yes'}

s = requests.Session()
r = s.post(url, headers=headers, cookies=cookies)

2) 使用Selenium模块：

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

url = "https://www.voxbeam.com//login"
driver = webdriver.Firefox()
driver.get(url)

u = driver.find_element_by_name('userName')
u.send_keys('xxxxxxxxx')
p = driver.find_element_by_name('password')
p.send_keys('yyyyyyyyy')
p.send_keys(Keys.RETURN)