我正在尝试使用以下代码通过CURL解码网页www.dealstan.com:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); // Define target site
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Return page in string
curl_setopt($cr, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.3 Safari/533.2');
curl_setopt($ch, CURLOPT_ENCODING , "gzip");
curl_setopt($ch, CURLOPT_TIMEOUT,5);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); // Follow redirects
$return = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
$html = str_get_html("$return");
echo $html;
但是,它显示了一些垃圾字符。
大约100行的内容是:"��}{w�6����9�X�n���.........."
我尝试在 hurl.it 中寻找响应信息,发现了一个有趣的点,看起来像是 html 被编码两次(根据响应猜测)
以下是响应结果:
200 OK 18.87 kB 490 ms View Request View Response HEADERS
Cache-Control: max-age=0, no-cache
Cf-Ray: 18be7f54f8d80f1b-IAD
Connection: keep-alive
Content-Encoding: gzip, gzip ==============>? 怀疑这个,有人知道吗?
Content-Type: text/html; charset=UTF-8
Date: Wed, 19 Nov 2014 18:33:39 GMT
Server: cloudflare-nginx
Set-Cookie: __cfduid=d1cff1e3134c5f32d2bddc10207bae0681416422019; expires=Thu, 19-Nov-15 18:33:39 GMT; path=/; domain=.dealstan.com; HttpOnly
Transfer-Encoding: chunked
Vary: Accept-Encoding
X-Page-Speed: 1.8.31.2-3973
X-Pingback: http://www.dealstan.com/xmlrpc.php
X-Powered-By: HHVM/3.2.0 BODY view raw
H4sIAAAAAAAAA5V8Q5AoWrBk27Ztu/u2bdu2bdu2bdu2bds2583f/pjFVOQqozZnUxkVJ7PwoyAA/qeAb3y83LbYHs/3Hv79wKm/2N5cZyJVtCWu1xyteyzLNqYuWbdtHeELCyIZRRp/1Fe7es3+wL3Vfb
有人知道如何解码带有头部"Content-Encoding: gzip, gzip"的响应吗?
该网站在 Firefox、Chrome 等浏览器中可以正常加载,但是我无法使用 CURL 解码。
请帮忙解决这个问题。