Python请求问题:Cloudflare错误信息“启用Cookies”。

6

我计划为Sneakersnstuff.com网站创建一个基本的网络爬虫,但是我的努力因为一个错误而被提前中止了。当请求到https://www.sneakersnstuff.com/时,网站没有显示HTML内容,也没有显示入口验证码,而是重定向到一个Cloudflare页面,并显示“启用Cookie”错误消息。下面是我的代码和响应。

import requests
import cfscrape


session = requests.session()

response = session.get('https://www.sneakersnstuff.com/')

print(response.headers)

<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
<!--[if IE 7]>    <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
<!--[if IE 8]>    <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
<!--[if gt IE 8]><!-->
<html class="no-js" lang="en-US">
<!--<![endif]-->

<head>
    <title>Access denied | www.sneakersnstuff.com used Cloudflare to restrict access</title>
    <meta charset="UTF-8" />
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />
    <meta name="robots" content="noindex, nofollow" />
    <meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1" />
    <link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/cf.errors.css" type="text/css"
        media="screen,projection" />
    <!--[if lt IE 9]><link rel="stylesheet" id='cf_styles-ie-css' href="/cdn-cgi/styles/cf.errors.ie.css" type="text/css" media="screen,projection" /><![endif]-->
    <style type="text/css">
        body {
            margin: 0;
            padding: 0
        }
    </style>


    <!--[if gte IE 10]><!-->
    <script type="text/javascript" src="/cdn-cgi/scripts/zepto.min.js"></script>
    <!--<![endif]-->
    <!--[if gte IE 10]><!-->
    <script type="text/javascript" src="/cdn-cgi/scripts/cf.common.js"></script>
    <!--<![endif]-->



</head>

<body>
    <div id="cf-wrapper">
        <div class="cf-alert cf-alert-error cf-cookie-error" id="cookie-alert" data-translate="enable_cookies">Please
            enable cookies.</div>
        <div id="cf-error-details" class="cf-error-details-wrapper">
            <div class="cf-wrapper cf-header cf-error-overview">
                <h1>
                    <span class="cf-error-type" data-translate="error">Error</span>
                    <span class="cf-error-code">1020</span>
                    <small class="heading-ray-id">Ray ID: 578133293d83e0d6 &bull; 2020-03-22 16:13:25 UTC</small>
                </h1>
                <h2 class="cf-subheadline">Access denied</h2>
            </div><!-- /.header -->

            <section></section><!-- spacer -->

            <div class="cf-section cf-wrapper">
                <div class="cf-columns two">
                    <div class="cf-column">
                        <h2 data-translate="what_happened">What happened?</h2>
                        <p>This website is using a security service to protect itself from online attacks.</p>

                    </div>



                </div>
            </div><!-- /.section -->

            <div class="cf-error-footer cf-wrapper">
                <p>
                    <span class="cf-footer-item">Cloudflare Ray ID: <strong>578133293d83e0d6</strong></span>
                    <span class="cf-footer-separator">&bull;</span>
                    <span class="cf-footer-item"><span>Your IP</span>: 96.241.108.243</span>
                    <span class="cf-footer-separator">&bull;</span>
                    <span class="cf-footer-item"><span>Performance &amp; security by</span> <a
                        href="https://www.cloudflare.com/5xx-error-landing?utm_source=error_footer" id="brand_link"
                        target="_blank">Cloudflare</a></span>

                </p>
            </div><!-- /.error-footer -->


        </div><!-- /#cf-error-details -->
    </div><!-- /#cf-wrapper -->

    <script type="text/javascript">
        window._cf_translation = {};


    </script>

</body>

</html>

我曾尝试使用一款被许多人推荐的名为cfscrape的库,但却未能成功。


当使用requests时,我通过在标头中提供受支持的用户代理解决了我的问题。之前我使用的用户代理会导致问题。现在我将其更改为Mozilla(https://dev59.com/Omkv5IYBdhLWcg3wbgQx),它可以正常工作。不幸的是,响应消息并没有真正有助于找出问题所在。 - clel
2个回答

7

为cloudscraper添加浏览器/用户代理过滤对我很有帮助。

import cloudscraper
from bs4 import BeautifulSoup

# Adding Browser / User-Agent Filtering should help ie. 

# will give you only desktop firefox User-Agents on Windows
scraper = cloudscraper.create_scraper(browser={'browser': 'firefox','platform': 'windows','mobile': False})

html = scraper.get("https://www.sneakersnstuff.com/").content

soup = BeautifulSoup(html, 'html.parser')

print(soup)

0
import cloudscraper
from bs4 import BeautifulSoup

scraper = cloudscraper.create_scraper()

html = scraper.get("https://www.sneakersnstuff.com/").content

soup = BeautifulSoup(html, 'html.parser')

print(soup)

输出:

cloudscraper.exceptions.CloudflareReCaptchaProvider: Cloudflare reCaptcha detected, unfortunately you haven't loaded an anti reCaptcha provider correctly via the 'recaptcha' parameter.

下一步是什么?

第三方reCaptcha解决方案 描述

cloudscraper目前支持以下第三方reCaptcha解决方案,如果您需要的话。

anticaptcha
deathbycaptcha
2captcha
9kw
return_response

1
尝试时我一直收到错误提示,cloudscraper.exceptions.CloudflareCode1020: Cloudflare已阻止此请求(检测到代码1020)。 - Chris Yun
@ChrisYun,您的设备因多次请求而被阻止。 - αԋɱҽԃ αмєяιcαη
这个问题怎么解决? - Chris Yun
1
@ChrisYun 你可以使用 requests 库和 proxiessocks - αԋɱҽԃ αмєяιcαη
有没有本地的方法可以不使用代理来完成这个任务? - Chris Yun

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接