C++从字符串中提取数据

Question

C++从字符串中提取数据

3

有没有一种优雅的方式从字符串中提取数据（也许使用boost库）？

Content-Type: text/plain
Content-Length: 15
Content-Date: 2/5/2013
Content-Request: Save

hello world

假设我有上面的字符串，并想提取所有字段，包括“hello world”文本。有人可以指点一下吗？

- marcwho

你正在寻找某种解析器。你能描述一下字符串的预期格式吗？它总是6行吗？那四个字段名称？第五行总是空的吗？ - Drew Dormann

更多关于格式的信息：其中一些字段是可选的，因此它们可能被省略。不过，假定每个字段都在新行上，并且在所有字段之后有一个空行，然后是实际内容。 - marcwho

8个回答

4

这是一个相当紧凑的用C语言编写的解析器： https://github.com/openwebos/nodejs/blob/master/deps/http_parser/http_parser.c

- Markus Schumann

2

有几种解决方案。如果格式很简单，你可以逐行读取文件。如果该行以一个关键字开头，你可以简单地分割它以获取值。如果不是，则该行本身就是值。这可以很容易地用STL完成，而且非常优雅。

如果语法更复杂，并且你添加了boost标签，你可以考虑使用Boost Spirit来解析它并从中获取值。

- Baptiste Wicht

2

我认为最简单的解决方案是使用正则表达式。在C++11中有一个标准正则表达式，也可以在boost中找到一些。

- Artem Sobolev

1

我发现Boost.Xpressive在这种情况下非常有用。 - Bob Murphy

1

如果您想自己编写解析代码，请先查看HTTP规范。这将为您提供语法：

    generic-message = start-line
                      *(message-header CRLF)
                      CRLF
                      [ message-body ]
    start-line      = Request-Line | Status-Line

所以我会首先使用split()在CRLF上进行分割，将其拆分为组成行。然后，您可以遍历生成的向量。直到您到达一个空白CRLF元素，您正在解析标题，因此您会在第一个“：”上拆分键和值。

一旦您到达空元素，您就会解析响应正文。

警告：我自己过去曾经这样做过，我可以告诉你，并非所有的web服务器都对行尾符号（您可能会发现仅有CR或仅有LF）保持一致，而且并非所有的浏览器/其他抽象层与它们传递给您的内容保持一致。因此，您可能会在不希望出现的地方找到额外的CRLF，或者在您期望它们出现的地方找不到CRLF。祝你好运。

- i_am_jorf

1

你可以使用string::find和空格一起查找它们的位置，然后从该位置复制，直到找到'\n'。

- rubbyrubber

0

如果您可以访问C+11，则可以使用std::regex(http://en.cppreference.com/w/cpp/regex)。

std::string input = "Content-Type: text/plain";
std::regex contentTypeRegex("Content-Type: (.+)");

std::smatch match;

if (std::regex_match(input, match, contentTypeRegex)) {
     std::ssub_match contentTypeMatch = match[1];
     std::string contentType = contentTypeMatch.str();
     std::cout << contentType;
}
//else not found

编译/运行版本在这里：http://ideone.com/QTJrue

这个正则表达式是一个非常简化的例子，但对于多个字段来说原理是相同的。

- Robert Prior

0

如果您准备手动展开循环，可以使用std::istringstream和提取运算符的常规重载（使用适当的操作器，如get_time()用于处理日期）以简单的方式提取数据。

另一种可能性是使用std::regex匹配所有类似于<string>:<string>的模式，并迭代所有匹配项（如果要处理多行，则egrep语法似乎很有前途）。

或者，如果您想要以困难的方式进行操作，并且您的字符串具有特定的语法，则可以使用Boost.Spirit轻松定义语法并生成解析器。

- Andy Prowl

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- sehe · Accepted Answer

尝试

http://pocoproject.org/

Comes with HTTPServer and Client implementations
http://cpp-netlib.github.com/

Comes with request/response handling

Boost Spirit demo: http://liveworkspace.org/code/3K5TzT

You'd have to specify a simple grammar (or complex, if you wanted to 'catch' all the subtleties of HTTP)

#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/karma.hpp>

typedef std::map<std::string, std::string> Headers;
typedef std::pair<std::string, std::string> Header;
struct Request { Headers headers; std::vector<char> content; };

BOOST_FUSION_ADAPT_STRUCT(Request, (Headers, headers)(std::vector<char>, content))

namespace qi    = boost::spirit::qi;
namespace karma = boost::spirit::karma;

template <typename It, typename Skipper = qi::blank_type>
    struct parser : qi::grammar<It, Request(), Skipper>
{
    parser() : parser::base_type(start)
    {
        using namespace qi;

        header = +~char_(":\n") > ": " > *(char_ - eol);
        start = header % eol >> eol >> eol >> *char_;
    }

  private:
    qi::rule<It, Header(),  Skipper> header;
    qi::rule<It, Request(), Skipper> start;
};

bool doParse(const std::string& input)
{
    auto f(begin(input)), l(end(input));

    parser<decltype(f), qi::blank_type> p;
    Request data;

    try
    {
        bool ok = qi::phrase_parse(f,l,p,qi::blank,data);
        if (ok)   
        {
            std::cout << "parse success\n";
            std::cout << "data: " << karma::format_delimited(karma::auto_, ' ', data) << "\n";
        }
        else      std::cerr << "parse failed: '" << std::string(f,l) << "'\n";

        if (f!=l) std::cerr << "trailing unparsed: '" << std::string(f,l) << "'\n";
        return ok;
    } catch(const qi::expectation_failure<decltype(f)>& e)
    {
        std::string frag(e.first, e.last);
        std::cerr << e.what() << "'" << frag << "'\n";
    }

    return false;
}

int main()
{
    const std::string input = 
        "Content-Type: text/plain\n"
        "Content-Length: 15\n"
        "Content-Date: 2/5/2013\n"
        "Content-Request: Save\n"
        "\n"
        "hello world";

    bool ok = doParse(input);

    return ok? 0 : 255;
}