Python LXML从Steam Bundle页面获取数据 - 列表超出索引错误

Question

Python LXML从Steam Bundle页面获取数据 - 列表超出索引错误

pythonxpathpython-requestslxmlsteam

3

我正在开发一个Python程序，它会在获得steam软件包ID后返回当前价格。

该程序使用requests和lxml库。

有两个路径可获取最终价格：

/html/body/div[1]/div[7]/div[4]/div[1]/div[2]/div/div[2]/div[10]/div[3]/div
//*[@id="game_area_purchase"]/div/div/div/div[1]/div/div/div[2]

示例： https://store.steampowered.com/bundle/16140

以下是代码:

import requests
import lxml.html
    
#example URL for steam bundle    
URL = "https://store.steampowered.com/bundle/16140"
    
html = requests.get(URL)
doc = lxml.html.fromstring(html.content)
    
#xpath to price location    
price = doc.xpath('/html/body/div[1]/div[7]/div[4]/div[1]/div[2]/div/div[2]/div[10]/div[3]/div/text()')
    
print(price)

程序返回如下结果：

[]

或者这个

Traceback (most recent call last):
  File <path-to-program>, line 9, in <module>
    price = doc.xpath('/html/body/div[1]/div[7]/div[4]/div[1]/div[2]/div/div[2]/div[10]/div[3]/div/text()')[0]
IndexError: list index out of range

对于这两个选项，我都遇到了错误。我该怎么办才能解决它？

- ernikus

当您请求页面时，requests 返回的不是您期望的内容 - 它会返回带有性内容/裸露警告的页面。您可以尝试创建 requests.Session 来在单个会话中发送多个请求。请注意，您应该使用您的出生日期数据发送 POST 请求。还要注意，您的方法应该适用于其他不需要年龄验证的页面。 - JaSON

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- JaSON · Accepted Answer

为了获取所需的页面HTML，您需要添加一个请求，其中包含名为“birthtime”的cookie，该cookie“告诉”服务器您的年龄允许您访问具有性/裸露内容的页面。

import requests
import lxml.html
    
URL = "https://store.steampowered.com/bundle/16140"
session = requests.Session()
r1 = session.get(URL)
r1.cookies['birthtime']='439423201'  # this is date in seconds since "epoch" (January 1, 1970)
r2 = session.get(URL, cookies=r1.cookies)

doc = lxml.html.fromstring(r2.content)
print(doc.xpath('//div[contains(@class, "discount_final_price")]/text()')[0])