如何使用Python捕获网络流量

Question

如何使用Python捕获网络流量

15

我正在使用Python尝试爬取我的计算机和一个网站之间的HTTP(s)流量，其中包括所有的请求、响应，例如图片和外部调用等。

我已经尝试在hit_site函数中查找网络流量，但是没有找到相应的信息。

hit_site("http://www.google.com")

def hit_site(url):
    print url
    r = requests.get(url,stream = True)
    print r.headers
    print r.encoding
    print r.status_code
    print r.json()
    print requests.get(url,stream=True)
    print r.request.headers
    print r.response.headers
    for line in r.iter_lines():
        print line
    data = r.text
    soup = BeautifulSoup(data)
    return soup

我想要捕获的信息类型的示例如下（我使用了fiddler2来获取此信息。所有这些以及更多的信息都来自访问groupon.com）：

#   Result  Protocol    Host    URL Body    Caching Content-Type    Process Comments    Custom  
6   200 HTTP    www.groupon.com /   23,236  private, max-age=0, no-cache, no-store, must-revalidate text/html; charset=utf-8    chrome:6080         
7   200 HTTP    www.groupon.com /homepage-assets/styles-6fca4e9f48.css  6,766   public, max-age=31369910    text/css; charset=UTF-8 chrome:6080         
8   200 HTTP    Tunnel to   img.grouponcdn.com:443  0           chrome:6080         
9   200 HTTP    img.grouponcdn.com  /deal/gsPCLbbqioFVfvjT3qbBZo/The-Omni-Mount-Washington-Resort_01-960x582/v1/c550x332.jpg    94,555  public, max-age=315279127; Expires: Fri, 18 Oct 2024 22:20:20 GMT   image/jpeg  chrome:6080         
10  200 HTTP    img.grouponcdn.com  /deal/d5YmjhxUBi2mgfCMoriV/pE-700x420/v1/c220x134.jpg   17,832  public, max-age=298601213; Expires: Mon, 08 Apr 2024 21:35:06 GMT   image/jpeg  chrome:6080         
11  200 HTTP    www.groupon.com /homepage-assets/main-fcfaf867e3.js 9,604   public, max-age=31369913    application/javascript  chrome:6080         
12  200 HTTP    www.groupon.com /homepage-assets/locale.js?locale=en_US&country=US  1,507   public, max-age=994 application/javascript  chrome:6080         
13  200 HTTP    www.groupon.com /tracky 3       application/octet-stream    chrome:6080         
14  200 HTTP    www.groupon.com /cart/widget?consumerId=b577c9c2-4f07-11e4-8305-0025906127fe    17  private, max-age=0, no-cache, no-store, must-revalidate application/json; charset=utf-8 chrome:6080         
15  200 HTTP    www.googletagmanager.com    /gtm.js?id=GTM-B76Z 39,061  private, max-age=911; Expires: Wed, 22 Oct 2014 20:48:14 GMT    text/javascript; charset=UTF-8  chrome:6080

我非常感激任何有关如何使用Python捕获网络流量的想法。

- maudulus

@CharlesDuffy 你有什么建议吗？ - maudulus

1

请查看 urllib2.build_opener(...) 和 HTTPHandler(debuglevel=1)。 - Kijewski

“如何使用Python捕获网络流量”是一个很好的问题。但“在Python中使用Requests（人性化的HTTP）来捕获计算机和网站之间的所有HTTP（s）流量”则不是。如果前者是您想要问的问题，为什么不更新标题呢？ - Charles Duffy

2

请查看这里：http://www.binarytides.com/code-a-packet-sniffer-in-python-with-pcapy-extension/ - Totem

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Charles Duffy · Accepted Answer

dpkt 是一个广泛使用的工具（用Python编写），用于解析TCP流量，包括支持解码与SSL握手有关的数据包。另一个在Python中运行和解码捕获数据的工具是pypcapfile。

请注意，为了解码SSL流量包括数据，需要知道私钥。对于像Google这样的第三方服务器，你无法控制，这会带来一些问题，并且需要付出大量努力来解决。其中一种方法是设置代理服务器，并使用已知的私钥进行中间人攻击（并将自签名CA安装到本地存储中以强制浏览器接受它）。