boost::spirit::qi期望解析器和解析器分组的意外行为

Question

boost::spirit::qi期望解析器和解析器分组的意外行为

c++parsingboostboost-spiritboost-spirit-qi

4

我希望有人能够为我解答在灵活解析中如何使用>和>>运算符的疑惑。

我已经拥有一个工作的语法，其中顶层规则如下：

test = identifier >> operationRule >> repeat(1,3)[any_string] >> arrow >> any_string >> conditionRule;

它依赖于属性，以自动将解析值分配给融合适配的结构（即boost元组）。

然而，我知道一旦我们匹配了operationRule，我们必须继续或失败（即不希望允许回溯尝试以identifier开头的其他规则）。

test = identifier >> 
           operationRule > repeat(1,3)[any_string] > arrow > any_string > conditionRule;

这会导致一个晦涩的编译器错误（'boost::Container' : use of class template requires template argument list）。稍微调整一下，以下代码就可以编译：

test = identifier >> 
           (operationRule > repeat(1,3)[any_string] > arrow > any_string > conditionRule);

但是属性设置不再起作用——在解析后，我的数据结构包含垃圾。这可以通过添加像 [at_c<0>(_val) = _1] 这样的操作来修复，但这似乎有些笨重-而且根据 boost 文档，这会使事情变慢。

所以，我的问题是：

防止回溯是否值得？
为什么需要分组运算符 ()？
我上面的最后一个示例是否真的在匹配 operationRule 后停止回溯（我怀疑不是，如果 (...) 中的整个解析器失败，则似乎仍然允许回溯）？
如果前一个问题的答案是否定的，如何构建一条规则，如果未匹配 operation，则允许回溯，但一旦匹配了操作，则不允许回溯？
为什么分组运算符会破坏属性文法-需要操作？

我意识到这是一个相当广泛的问题-任何指向正确方向的提示将不胜感激！

- Zero

我的语法更完整的描述在之前的一个问题这里。我认为这与问题无关，所以我没有放进来。 - Zero

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- sehe · Accepted Answer

Is it worth preventing back-tracking?

Absolutely. Preventing back tracking in general is a proven way to improve parser performance.
- reduce the use of (negative) lookahead (operator !, operator - and some operator &)
- order branches (operator |, operator ||, operator^ and some operator */-/+) such that the most frequent/likely branch is ordered first, or that the most costly branch is tried last
Using expectation points (>) does not essentially reduce backtracking: it will just disallow it. This will enable targeted error messages, prevent useless 'parsing-into-the-unknown'.
Why do I need the grouping operator ()

I'm not sure. I had a check using my what_is_the_attr helpers from here
- ident >> op >> repeat(1,3)[any] >> "->" >> any
  synthesizes attribute:
```
fusion::vector4<string, string, vector<string>, string>
```
- ident >> op > repeat(1,3)[any] > "->" > any
  synthesizes attribute:
```
fusion::vector3<fusion::vector2<string, string>, vector<string>, string>
```
I haven't found the need to group subexpressions using parentheses (things compile), but obviously DataT needs to be modified to match the changed layout.
```
typedef boost::tuple<
    boost::tuple<std::string, std::string>, 
    std::vector<std::string>, 
    std::string
> DataT;
```

完整的代码如下所示，展示了我更喜欢使用调整后的结构体来完成这个任务。

我的上面的例子真的在匹配operationRule后停止回溯吗？（我怀疑不是，如果 (...) 内的整个解析器失败，回溯将被允许）

完全正确。如果未满足期望，则会抛出 qi::expectation_failure<> 异常。默认情况下，这将中止解析。您可以使用 qi::on_error 来重试、失败、接受或重新抛出。MiniXML example 有很好的关于使用带有 qi::on_error 的期望点的示例
如果上一个问题的答案是 /否/，~~如何构建一个规则，在 operation 没有匹配时允许回溯，但一旦匹配了 operation，就不再允许回溯？~~
为什么分组运算符会破坏属性文法 - 需要使用操作？

它并没有破坏属性文法，只是改变了暴露的类型。因此，如果将适当的属性引用绑定到规则/语法中，您就不需要语义操作。现在，我觉得应该有不使用分组的方法 ~~，所以让我尝试一下（最好是在您的简短自包含示例中）~~。事实上，我没有发现这样的需要。我已经添加了一个完整的示例，帮助您看到我的测试中发生了什么，并且没有使用语义操作。

完整代码

完整的代码展示了5个场景：

选项1：原始代码，没有期望值

(没有相关更改)
选项2：带有期望值

使用上面所示的修改后的DataT typedef
选项3：适应的结构体，没有期望值

使用BOOST_FUSION_ADAPT_STRUCT定义的用户自定义结构体
选项4：适应的结构体，带有期望值

修改来自选项3的适应结构体
选项5：前瞻性hack

这个选项利用一个“聪明的”（？）hack，将所有的>>转换为期望值，并在之前检测是否存在operationRule匹配。这当然不是最优解，但允许您保持DataT不变，并且不使用语义动作。

显然，在编译之前将OPTION定义为所需的值。

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/karma.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/fusion/adapted.hpp>
#include <iostream>

namespace qi    = boost::spirit::qi; 
namespace karma = boost::spirit::karma; 

#ifndef OPTION
#define OPTION 5
#endif

#if OPTION == 1 || OPTION == 5 // original without expectations (OR lookahead hack)
    typedef boost::tuple<std::string, std::string, std::vector<std::string>, std::string> DataT;
#elif OPTION == 2 // with expectations
    typedef boost::tuple<boost::tuple<std::string, std::string>, std::vector<std::string>, std::string> DataT;
#elif OPTION == 3 // adapted struct, without expectations
    struct DataT
    {
        std::string identifier, operation;
        std::vector<std::string> values;
        std::string destination;
    };

    BOOST_FUSION_ADAPT_STRUCT(DataT, (std::string, identifier)(std::string, operation)(std::vector<std::string>, values)(std::string, destination));
#elif OPTION == 4 // adapted struct, with expectations
    struct IdOpT
    {
        std::string identifier, operation;
    };
    struct DataT
    {
        IdOpT idop;
        std::vector<std::string> values;
        std::string destination;
    };

    BOOST_FUSION_ADAPT_STRUCT(IdOpT, (std::string, identifier)(std::string, operation));
    BOOST_FUSION_ADAPT_STRUCT(DataT, (IdOpT, idop)(std::vector<std::string>, values)(std::string, destination));
#endif

template <typename Iterator>
struct test_parser : qi::grammar<Iterator, DataT(), qi::space_type, qi::locals<char> >
{
    test_parser() : test_parser::base_type(test, "test")
    {
        using namespace qi;

        quoted_string = 
               omit    [ char_("'\"") [_a =_1] ]             
            >> no_skip [ *(char_ - char_(_a))  ]
             > lit(_a); 

        any_string = quoted_string | +qi::alnum;

        identifier = lexeme [ alnum >> *graph ];

        operationRule = string("add") | "sub";
        arrow = "->";

#if OPTION == 1 || OPTION == 3   // without expectations
        test = identifier >> operationRule >> repeat(1,3)[any_string] >> arrow >> any_string;
#elif OPTION == 2 || OPTION == 4 // with expectations
        test = identifier >> operationRule  > repeat(1,3)[any_string]  > arrow  > any_string;
#elif OPTION == 5                // lookahead hack
        test = &(identifier >> operationRule) > identifier > operationRule > repeat(1,3)[any_string] > arrow > any_string;
#endif
    }

    qi::rule<Iterator, qi::space_type/*, qi::locals<char> */> arrow;
    qi::rule<Iterator, std::string(), qi::space_type/*, qi::locals<char> */> operationRule;
    qi::rule<Iterator, std::string(), qi::space_type/*, qi::locals<char> */> identifier;
    qi::rule<Iterator, std::string(), qi::space_type, qi::locals<char> > quoted_string, any_string;
    qi::rule<Iterator, DataT(),       qi::space_type, qi::locals<char> > test;
};

int main()
{
    std::string str("addx001 add 'str1'   \"str2\"       ->  \"str3\"");
    test_parser<std::string::const_iterator> grammar;
    std::string::const_iterator iter = str.begin();
    std::string::const_iterator end  = str.end();

    DataT data;
    bool r = phrase_parse(iter, end, grammar, qi::space, data);

    if (r)
    {
        using namespace karma;
        std::cout << "OPTION " << OPTION << ": " << str << " --> ";
#if OPTION == 1 || OPTION == 3 || OPTION == 5 // without expectations (OR lookahead hack)
        std::cout << format(delimit[auto_ << auto_ << '[' << auto_ << ']' << " --> " << auto_], data) << "\n";
#elif OPTION == 2 || OPTION == 4 // with expectations
        std::cout << format(delimit[auto_ << '[' << auto_ << ']' << " --> " << auto_], data) << "\n";
#endif
    }
    if (iter!=end)
        std::cout << "Remaining: " << std::string(iter,end) << "\n";
}

所有选项的输出：

for a in 1 2 3 4 5; do g++ -DOPTION=$a -I ~/custom/boost/ test.cpp -o test$a && ./test$a; done
OPTION 1: addx001 add 'str1'   "str2"       ->  "str3" --> addx001 add [ str1 str2 ]  -->  str3 
OPTION 2: addx001 add 'str1'   "str2"       ->  "str3" --> addx001 add [ str1 str2 ]  -->  str3 
OPTION 3: addx001 add 'str1'   "str2"       ->  "str3" --> addx001 add [ str1 str2 ]  -->  str3 
OPTION 4: addx001 add 'str1'   "str2"       ->  "str3" --> addx001 add [ str1 str2 ]  -->  str3 
OPTION 5: addx001 add 'str1'   "str2"       ->  "str3" --> addx001 add [ str1 str2 ]  -->  str3