使用Python的requests库模拟Ajax请求

Question

使用Python的requests库模拟Ajax请求

15

为什么request没有下载这个网页的响应内容？

#!/usr/bin/python

import requests

headers={ 'content-type':'application/x-www-form-urlencoded; charset=UTF-8',
     'Accept-Encoding': 'gzip, deflate',
     'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:27.0) Gecko/20100101 Firefox/27.0',
     'Referer' : 'http://sportsbeta.ladbrokes.com/football',
    }

payload={'N': '4294966750',
     'facetCount_156%23327': '12',
     'facetCount_157%23325': '8',
     'form-trigger':'moreId',
     'moreId':'156%23327',
     'pageId':'p_football_home_page',
     'pageType':'EventClass',
     'type':'ajaxrequest'
     }

url='http://sportsbeta.ladbrokes.com/view/EventDetailPageComponentController'

r = requests.post(url, data=payload, headers=headers)

这是我在Firebug中看到的POST参数，返回的响应包含一个列表（足球联赛），但是当我像这样运行我的Python脚本时，什么也没有得到。您可以通过单击链接左侧导航栏中“比赛”部分的“查看全部”并查看Firebug中的XHR，在Firefox中查看请求。 Firebug响应按预期显示HTML正文。有人有任何想法吗？我在有效负载中处理%符号会有任何问题吗？编辑：尝试使用会话

from requests import Request, Session

#turn post string into dict: 
def parsePOSTstring(POSTstr):
    paramList = POSTstr.split('&')
    paramDict = dict([param.split('=') for param in paramList])
    return paramDict

headers={'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:27.0) Gecko/20100101 Firefox/27.0',
     'Referer' : 'http://sportsbeta.ladbrokes.com/football'
    }

#prep the data (POSTstr copied from Firebug raw source)
POSTstr = "moreId=156%23327&facetCount_156%23327=12&event=&N=4294966750&pageType=EventClass&
          pageId=p_football_home_page&type=ajaxrequest&eventIDNav=&removedSelectionNav=&
          currentSelectedId=&form-trigger=moreId"
payload = parsePOSTstring(POSTstr)

#end url
url='http://sportsbeta.ladbrokes.com/view/EventDetailPageComponentController'

#start a session to manage cookies, and visit football page first so referer agrees
s = Session()
s.get('http://sportsbeta.ladbrokes.com/football')
#now visit disired url with headers/data
r = s.post(url, data=payload, headers=headers)

#print output
print r.text #this is empty

工作中的curl

curl 'http://sportsbeta.ladbrokes.com/view/EventDetailPageComponentController'
-H 'Cookie: JSESSIONID=DE93158F07E02DD3CC1CC32B1AA24A9E.ecomprodsw015;
    geoCode=FRA; 
    FLAGS=en|en|uk|default|ODDS|0|GBP;
    ECOM_BETA_SPORTS=1;
    PLAYED=4%7C0%7C0%7C0%7C0%7C0%7C0'
-H 'Referer: http://sportsbeta.ladbrokes.com/football'
-H 'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:27.0) 
    Gecko/20100101 Firefox/27.0'  
--data 'facetCount_157%23325=8&moreId=156%23327&
        facetCount_156%23327=12&event=&
        N=4294966750&
        pageType=EventClass&pageId=p_football_home_page&
        type=ajaxrequest&eventIDNav=&
        removedSelectionNav=&currentSelectedId=&
        form-trigger=moreId' --compressed

这个卷曲效果还是可以的。

- fpghost

你首先需要访问http://sportsbeta.ladbrokes.com/football（而不是主页）。然后它似乎可以工作。除了Referer和User-Agent之外，您不需要任何其他标头。 - Blender

@Blender 我已经按照您建议的最小标头更新了我的答案，并使用请求会话来管理 cookie 并首先访问“足球”主页，因为需要发起 ajax 请求，但我仍然得到一个空的“r.text”。这段代码对你有用吗？ - fpghost

1

只要正确解码百分号编码字符（即在解码之前更改％23为＃，或修复parsePOSTstring），它就可以正常工作。由于我一直使用字典，所以我没有看到问题。 - Blender

@Blender 你可以使用 urllib.parse 中的 unquote 来解决这个问题。urllib.parse.unquote('facetCount_157%23325')``` - Nivatius

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Blender · Accepted Answer

这是我能想到的最小工作示例：

from requests import Session

session = Session()

# HEAD requests ask for *just* the headers, which is all you need to grab the
# session cookie
session.head('http://sportsbeta.ladbrokes.com/football')

response = session.post(
    url='http://sportsbeta.ladbrokes.com/view/EventDetailPageComponentController',
    data={
        'N': '4294966750',
        'form-trigger': 'moreId',
        'moreId': '156#327',
        'pageType': 'EventClass'
    },
    headers={
        'Referer': 'http://sportsbeta.ladbrokes.com/football'
    }
)

print response.text

您只是没有正确解码百分号编码的POST数据，所以在实际的POST数据中#被表示为%23（例如，156%23327应该是156#327）。