Python - Requests/RoboBrowser - ASPX POST JavaScript Python - Requests / RoboBrowser - 通过ASPX POST JavaScript的方式

4
我正在移植一个使用curl并将payloads发送到URL的bash脚本,现在遇到了问题。使用Robobrowser登录网站时,使用页面表单进行POST时出现问题。
步骤如下:
- 登录 /SubLogin.aspx - 成功后重定向到 /OptionsSummary.aspx - 使用参数GET /FindMe.aspx - 点击按钮“Phone Lists”(应该会加载包含项目“工作”的“Phone Lists”表格)POST /FindMe.aspx - 选择“工作”项会执行一个POST请求到/PhoneLists.aspx(这应该会加载一个名为“工作”的用户列表)
我已成功地通过RoboBrowser和Requests+bs4进行了身份验证并执行了GET操作,但是我对如何POST回到 页面本身感到困惑。
使用RoboBrowser(liboncall.py)
#!/usr/bin/python

from robobrowser import RoboBrowser
from bs4 import BeautifulSoup as BS

oc_mailbox = '123456'
oc_password_hashed = 'ABCDEFG'

base_uri = 'http://example.com'
auth_uri = oc_base_uri + '/SubLogin.aspx'
find_uri = oc_base_uri + '/FindMe.aspx'
phne_uri = oc_base_uri + '/PhoneLists.aspx'


p_auth_payload = {
        'SubLoginControl:javascriptTest': 'true',
        'SubLoginControl:mailbox': mailbox,
        'SubLoginControl:phoneNumber': '',
        'SubLoginControl:password': password_hashed,
        'SubLoginControl:btnLogOn': 'Logon',
        'SubLoginControl:webLanguage': 'en-US',
        'SubLoginControl:initialLanguage': 'en-US',
        'SubLoginControl:errorCallBackNumber': 'Entered telephone number contains non-dialable characters.',
        'SubLoginControl:cookieMailbox': 'mailbox',
        'SubLoginControl:cookieCallbackNumber': 'callbackNumber',
        'SubLoginControl:serverDomain': ''
        }

p_find_payload = {
        'FindMeControl:enableFindMe': 'on',
        'FindMeControl:MasterDataControl:focusElement': '',
        'FindMeControl:MasterDataControl:masterList:_ctl0:enabled': 'on',
        'FindMeControl:MasterDataControl:masterList:_ctl0:itemGuid': '',
        'FindMeControl:MasterDataControl:hidSelectedScheduleName': '',
        'FindMeControl:MasterDataControl:hidbtnStatus': '',
        'FindMeControl:MasterDataControl:hidScheduleXML': '',
        'FindMeControl:MasterDataControl:tempScheduleXML': '',
        'FindMeControl:MasterDataControl:hidSelectedScheduleGUID': '',
        'FindMeControl:MasterDataControl:hidChangedScheduleList': '',
        'FindMeControl:btnPhoneLists': 'Phone Lists',
        'FindMeControl:enableFindMeHidden': '',
        'FindMeControl:applySet': 'false'
        }

p_phne_payload = {
        '__EVENTARGUMENT': '',
        '__EVENTTARGET': 'PhoneListsControl$MasterDataControl$masterList$_ctl0$SelectButton',
        'PhoneListsControl:MasterDataControl:focusElement': '',
        'PhoneListsControl:MasterDataControl:masterList:_ctl0:itemGuid': '',
        'PhoneListsControl:MasterDataControl:hidSelectedScheduleName': '',
        'PhoneListsControl:MasterDataControl:hidbtnStatus': '',
        'PhoneListsControl:MasterDataControl:hidScheduleXML': '',
        'PhoneListsControl:MasterDataControl:tempScheduleXML': '',
        'PhoneListsControl:MasterDataControl:hidSelectedScheduleGUID': '',
        'PhoneListsControl:MasterDataControl:hidChangedScheduleList': '',
        'PhoneListsControl:applySet': 'false'
        }


def auth(mailbox, password):
    browser = RoboBrowser(history=False)
    browser.open(oc_auth_uri)

    signin = browser.get_form(id='aspnetForm')
    signin['SubLoginControl:mailbox'].value = mailbox
    signin['SubLoginControl:password'].value = password
    signin['SubLoginControl:javascriptTest'].value = 'true'
    signin['SubLoginControl:btnLogOn'].value = 'Logon'
    signin['SubLoginControl:webLanguage'].value = 'en-US'
    signin['SubLoginControl:initialLanguage'].value = 'en-US'
    signin['SubLoginControl:errorCallBackNumber'].value = 'Entered+telephone+number+contains+non-dialable+characters.'
    signin['SubLoginControl:cookieMailbox'].value = 'mailbox'
    signin['SubLoginControl:cookieCallbackNumber'].value = 'callbackNumber'
    signin['SubLoginControl:serverDomain'].value = ''

    browser.submit_form(signin)
    return browser

登录网站并显示URL以验证我们是否在其中:

In [20]: from liboncall import *
In [21]: m = auth(oc_mailbox, oc_password_hashed)

In [22]: m.url
Out[22]: u'http://example.com/OptionsSummary.aspx'

打开 "/FindMe.aspx":
In [24]: m.open(find_uri)

In [25]: m.url
Out[25]: u'http://example.com/FindMe.aspx'

最初,“/FindMe.aspx”将加载一个表单和一个名为“电话列表”的按钮(FindMeControl:btnPhoneLists)。

In [26]: m.select('title')
Out[26]: [<title>Find Me</title>]

In [27]: form_find_a = m.get_form(action="FindMe.aspx")

In [28]: for i in form_find_a.keys():
    print(i)
    ....:
    __VIEWSTATE
    __EVENTVALIDATION
    FindMeControl:enableFindMe
    FindMeControl:MasterDataControl:focusElement
    FindMeControl:MasterDataControl:masterList:_ctl0:enabled
    FindMeControl:MasterDataControl:masterList:_ctl0:itemGuid
    FindMeControl:MasterDataControl:btnAdd
    FindMeControl:MasterDataControl:btnDelete
    FindMeControl:MasterDataControl:btnRename
    FindMeControl:MasterDataControl:btnCancel
    FindMeControl:MasterDataControl:btnEnter
    FindMeControl:MasterDataControl:btnUpdate
    FindMeControl:MasterDataControl:hidSelectedScheduleName
    FindMeControl:MasterDataControl:hidbtnStatus
    FindMeControl:MasterDataControl:hidScheduleXML
    FindMeControl:MasterDataControl:tempScheduleXML
    FindMeControl:MasterDataControl:hidSelectedScheduleGUID
    FindMeControl:MasterDataControl:hidChangedScheduleList
    FindMeControl:btnApply
    FindMeControl:btnSchedules
    FindMeControl:btnPhoneLists
    FindMeControl:enableFindMeHidden
    FindMeControl:applySet

删除不必要的表单字段,填写表单并提交:

In [29]: find_remove = (
'FindMeControl:MasterDataControl:btnAdd',
'FindMeControl:MasterDataControl:btnDelete',
'FindMeControl:MasterDataControl:btnRename',
'FindMeControl:MasterDataControl:btnCancel',
'FindMeControl:MasterDataControl:btnEnter',
'FindMeControl:MasterDataControl:btnUpdate',
'FindMeControl:btnApply',
'FindMeControl:btnSchedules')

In [30]: for i in find_remove:
        form_find_a.fields.pop(i)

In [31]: form_find_a['FindMeControl:enableFindMe'].value = 'on'
form_find_a['FindMeControl:MasterDataControl:focusElement'].value = ''
form_find_a['FindMeControl:MasterDataControl:masterList:_ctl0:enabled'].value = 'on'
form_find_a['FindMeControl:MasterDataControl:masterList:_ctl0:itemGuid'].value = ''
form_find_a['FindMeControl:MasterDataControl:hidSelectedScheduleName'].value = ''
form_find_a['FindMeControl:MasterDataControl:hidbtnStatus'].value = ''
form_find_a['FindMeControl:MasterDataControl:hidScheduleXML'].value = ''
form_find_a['FindMeControl:MasterDataControl:tempScheduleXML'].value = ''
form_find_a['FindMeControl:MasterDataControl:hidSelectedScheduleGUID'].value = ''
form_find_a['FindMeControl:MasterDataControl:hidChangedScheduleList'].value = ''
form_find_a['FindMeControl:btnPhoneLists'].value = 'Phone Lists'
form_find_a['FindMeControl:enableFindMeHidden'].value = ''
form_find_a['FindMeControl:applySet'].value = 'false'
Out [31]: ...

In [32]: m.submit_form(form_find_a)

验证页面是否已更新并具有“工作”列表项:

In [33]: m.parsed.find('title')
Out[33]: <title>Phone Lists</title>

In [34]: m.parsed.find('a', id='PhoneListsControl_MasterDataControl_masterList__ctl0_SelectButton')
Out[34]: <a class="linkButtonItem" href="javascript:__doPostBack('PhoneListsControl$MasterDataControl$masterList$_ctl0$SelectButton','')" id="PhoneListsControl_MasterDataControl_masterList__ctl0_SelectButton" onclick="javascript:onClick();">Work</a>

获取"PhoneLists.aspx"表单,删除不必要的字段,填写并提交。
In [35]: form_find_b = m.get_form(action='PhoneLists.aspx')

In [36]: phne_remove = (
    'PhoneListsControl:MasterDataControl:btnAdd',
    'PhoneListsControl:MasterDataControl:btnDelete',
    'PhoneListsControl:MasterDataControl:btnRename',
    'PhoneListsControl:MasterDataControl:btnCancel',
    'PhoneListsControl:MasterDataControl:btnEnter',
    'PhoneListsControl:MasterDataControl:btnUpdate',
    'PhoneListsControl:btnApply',
    'PhoneListsControl:btnBack')

In [37]: for i in phne_remove:
            form_find_b.fields.pop(i)

In [38]: form_find_b['PhoneListsControl:MasterDataControl:focusElement'].value = ''             
form_find_b['PhoneListsControl:MasterDataControl:hidChangedScheduleList'].value = ''
form_find_b['PhoneListsControl:MasterDataControl:hidScheduleXML'].value = ''
form_find_b['PhoneListsControl:MasterDataControl:hidSelectedScheduleGUID'].value = ''
form_find_b['PhoneListsControl:MasterDataControl:hidSelectedScheduleName'].value = ''
form_find_b['PhoneListsControl:MasterDataControl:hidbtnStatus'].value = ''
form_find_b['PhoneListsControl:MasterDataControl:masterList:_ctl0:itemGuid'].value = ''
form_find_b['PhoneListsControl:MasterDataControl:tempScheduleXML'].value = ''
form_find_b['PhoneListsControl:applySet'].value = 'false'

In [39]: m.submit_form(form_find_b)

查看帖子以查看用户列表是否已加载。在这种情况下,用户列表未加载。

In [40]: m.parsed.findAll('div', id='PhoneListsControl_phoneListMembersText')
Out[41]: [<div class="displayText" id="PhoneListsControl_phoneListMembersText"></div>]

如果成功,上面的内容将返回:
<div id="PhoneListsControl_phoneListMembersText" class="displayText" style="top: 315px; left: 281px;">&nbsp;&nbsp;Work&nbsp;&nbsp;</div>

除了表格中的以下内容(PhoneListsControl_phoneListDetail):
<input name="PhoneListsControl:phoneListDetail:_ctl2:number" type="text" value="95551234567" maxlength="50" id="PhoneListsControl_phoneListDetail__ctl2_number" onkeyup="enableApplyButton('PhoneListsControl_')" style="width:140px;">
...
<input name="PhoneListsControl:phoneListDetail:_ctl3:number" type="text" value="95551236789" maxlength="50" id="PhoneListsControl_phoneListDetail__ctl2_number" onkeyup="enableApplyButton('PhoneListsControl_')" style="width:140px;">
...

在这个项目中,我发现Robobrowser没有包含所有所需的表单数据,无法按预期工作,“PhoneLists.aspx”中的提交( '__ EVENTTARGET':'PhoneListsControl $ MasterDataControl $ masterList $ _ctl0 $ SelectButton'__EVENTARGUMENT)。然后设置参数并执行submit_form(form_find_b)也无法实现预期的结果。我想知道是否可以使用robobrowser.forms.form中的add_field(),但我不理解如何正确地使用它,(如果要使用它,例如添加 __ EVENTTARGET 和 __ EVENTARGUMENT 隐藏输入字段到表单)。
还是说我忽略了其他什么东西,RoboBrowser / Requests不支持此类帖子? 正如在mechanize中提到的那样,表单需要执行javascript吗?这里
1个回答

8

已解决

经过大量的搜索,在Reddit上重新发布求助帖子,然后偶然间发现这个RoboBrowser问题,向我展示了如何正确使用“fields.add_field()”方法;问题已得到解决。

例如:

b_e_arg = robobrowser.forms.fields.Input('\<input name="__EVENTARGUMENT" value="" \/\>')

b_e_target = robobrowser.forms.fields.Input('\<input name="__EVENTTARGET" value="PhoneListsControl$MasterDataControl$masterList$_ctl0$SelectButton" \/\>')

In [30]: form_find_b.add_field(b_e_target)
In [31]: form_find_b.add_field(b_e_arg)

一旦表单更新了这些值,提交到“PhoneLists.aspx”的表单将按预期工作。

In [33]: m.submit_form(form_find_b)

In [34]: m.url
Out[34]: u'http://example/PhoneLists.aspx'

In [35]: m.parsed.findAll('div', id='PhoneListsControl_phoneListMembersText')
Out[35]: [<div class="displayText" id="PhoneListsControl_phoneListMembersText">  Work  </div>]

In [36]: m.parsed.findAll('input', id='PhoneListsControl_phoneListDetail__ctl2_number')
Out[36]: [<input id="PhoneListsControl_phoneListDetail__ctl2_number" maxlength="50" name="PhoneListsControl:phoneListDetail:_ctl2:number" onkeyup="enableApplyButton('PhoneListsControl_')" type="text" value="95551234567"/>]

我希望其他需要爬取ASPX网站的人会发现这个有用。祝愉快地黑客活动!

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接