解析原始的HTTP请求

9

我正在处理一个HTTP流量数据集,其中包含完整的POST和GET请求,如下所示。我已经用Java编写了代码,将每个请求分离并将其保存为字符串元素在数组列表中。

现在我很困惑如何在Java中解析这些原始HTTP请求,有没有比手动解析更好的方法?
GET http://localhost:8080/tienda1/imagenes/3.gif/ HTTP/1.1
User-Agent: Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.8 (like Gecko)
Pragma: no-cache
Cache-control: no-cache
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Encoding: x-gzip, x-deflate, gzip, deflate
Accept-Charset: utf-8, utf-8;q=0.5, *;q=0.5
Accept-Language: en
Host: localhost:8080
Cookie: JSESSIONID=FB018FFB06011CFABD60D8E8AD58CA21
Connection: close

你需要在哪里解析它们?在Servlet或类似技术中(或者)普通的Java类中? - kosa
1
数据从哪里来?你需要解析出什么? - Perception
2
如果你非常需要直接使用HTTP,而且这不是为了一个课程,我强烈建议使用像Apache Commons HttpClient这样的工具。自己实现会有很多陷阱(例如分块传输编码)。 - Alan Krueger
我目前正在使用Apache Common,但迄今为止没有任何效果。我需要转换原始请求字符串才能使其工作吗? - Ali Ahmad
1
@AliAhmad - 你到底想要实现什么?如果你正在使用HttpClient类,那么你不需要手动解析HTTP数据流。 - Perception
1
你问了如何解析HTTP,但这可能意味着很多事情,具体取决于你想从原始流中提取什么。如果没有说明你的最终目标,这个问题就会变得“不具有建设性”。 - Jim Garrison
3个回答

22

这是一个通用的HTTP请求解析器,适用于所有方法类型(GET、POST等),方便您使用:

    package util.dpi.capture;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.StringReader;
import java.util.Hashtable;

/**
 * Class for HTTP request parsing as defined by RFC 2612:
 * 
 * Request = Request-Line ; Section 5.1 (( general-header ; Section 4.5 |
 * request-header ; Section 5.3 | entity-header ) CRLF) ; Section 7.1 CRLF [
 * message-body ] ; Section 4.3
 * 
 * @author izelaya
 *
 */
public class HttpRequestParser {

    private String _requestLine;
    private Hashtable<String, String> _requestHeaders;
    private StringBuffer _messagetBody;

    public HttpRequestParser() {
        _requestHeaders = new Hashtable<String, String>();
        _messagetBody = new StringBuffer();
    }

    /**
     * Parse and HTTP request.
     * 
     * @param request
     *            String holding http request.
     * @throws IOException
     *             If an I/O error occurs reading the input stream.
     * @throws HttpFormatException
     *             If HTTP Request is malformed
     */
    public void parseRequest(String request) throws IOException, HttpFormatException {
        BufferedReader reader = new BufferedReader(new StringReader(request));

        setRequestLine(reader.readLine()); // Request-Line ; Section 5.1

        String header = reader.readLine();
        while (header.length() > 0) {
            appendHeaderParameter(header);
            header = reader.readLine();
        }

        String bodyLine = reader.readLine();
        while (bodyLine != null) {
            appendMessageBody(bodyLine);
            bodyLine = reader.readLine();
        }

    }

    /**
     * 
     * 5.1 Request-Line The Request-Line begins with a method token, followed by
     * the Request-URI and the protocol version, and ending with CRLF. The
     * elements are separated by SP characters. No CR or LF is allowed except in
     * the final CRLF sequence.
     * 
     * @return String with Request-Line
     */
    public String getRequestLine() {
        return _requestLine;
    }

    private void setRequestLine(String requestLine) throws HttpFormatException {
        if (requestLine == null || requestLine.length() == 0) {
            throw new HttpFormatException("Invalid Request-Line: " + requestLine);
        }
        _requestLine = requestLine;
    }

    private void appendHeaderParameter(String header) throws HttpFormatException {
        int idx = header.indexOf(":");
        if (idx == -1) {
            throw new HttpFormatException("Invalid Header Parameter: " + header);
        }
        _requestHeaders.put(header.substring(0, idx), header.substring(idx + 1, header.length()));
    }

    /**
     * The message-body (if any) of an HTTP message is used to carry the
     * entity-body associated with the request or response. The message-body
     * differs from the entity-body only when a transfer-coding has been
     * applied, as indicated by the Transfer-Encoding header field (section
     * 14.41).
     * @return String with message-body
     */
    public String getMessageBody() {
        return _messagetBody.toString();
    }

    private void appendMessageBody(String bodyLine) {
        _messagetBody.append(bodyLine).append("\r\n");
    }

    /**
     * For list of available headers refer to sections: 4.5, 5.3, 7.1 of RFC 2616
     * @param headerName Name of header
     * @return String with the value of the header or null if not found.
     */
    public String getHeaderParam(String headerName){
        return _requestHeaders.get(headerName);
    }
}

8
我正在处理一个HTTP流量数据集,其中包含完整的POST和GET请求。
因此,您想解析包含多个HTTP请求的文件或列表。您想提取哪些数据?无论如何,这里是一个Java HTTP解析类here,它可以读取请求行中使用的方法、版本和URI,并将所有标头读入Hashtable中。
您可以使用该类,或者自己编写一个,如果您感觉像重新发明轮子一样。查看RFC以了解请求的外观,以便正确解析它。
Request       = Request-Line              ; Section 5.1
                    *(( general-header        ; Section 4.5
                     | request-header         ; Section 5.3
                     | entity-header ) CRLF)  ; Section 7.1
                    CRLF
                    [ message-body ]          ; Section 4.3

5
如果您只想原样发送请求,那么很容易,只需使用TCP套接字发送实际字符串即可!就像这样:
    Socket socket = new Socket(host, port);

    BufferedWriter out = new BufferedWriter(
            new OutputStreamWriter(socket.getOutputStream(), "UTF8"));

    for (String line : getContents(request)) {
        System.out.println(line);
        out.write(line + "\r\n");
    }

    out.write("\r\n");
    out.flush();

请参考JoeJag的博客文章,了解完整的代码。点击此处链接更新 我启动了一个名为RawHTTP的项目,提供了HTTP请求、响应、头等解析器......这个项目非常好用,可以轻松地在其上编写HTTP服务器和客户端。如果你想寻找一些低级别的东西,请查看它。点击此处链接

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接