boost::spirit如何从语义动作中访问位置迭代器

11

假设我有如下代码(行号仅供参考):

1:
2:function FuncName_1 {
3:    var Var_1 = 3;
4:    var  Var_2 = 4;
5:    ...

我想编写一个语法分析器,可以解析这样的文本,将所有标识符(函数和变量名)信息放入一棵树(utree?)中。 每个节点应保留:行号、列号和符号值。例如:

root: FuncName_1 (line:2,col:10)
  children[0]: Var_1 (line:3, col:8)
  children[1]: Var_1 (line:4, col:9)

我想将它放入树中,因为我计划遍历该树,并且对于每个节点,我必须知道上下文:(当前节点的所有父节点)。

例如,在处理具有Var_1的节点时,我必须知道这是函数FuncName_1的局部变量名称(当前正在作为节点处理,但在更早的一个级别)

我无法弄清楚一些事情

  1. 使用Spirit和语义动作以及utree是否可以完成此操作?或者应该使用variant <> trees?
  2. 如何同时传递这三个信息(列、行、符号名称)给节点?我知道我必须使用pos_iterator作为语法的迭代器类型,但要如何在语义动作中访问这些信息呢?

我是Boost的新手,所以我一遍又一遍地阅读Spirit文档,尝试在Google上搜索我的问题,但我似乎无法将所有碎片拼凑在一起以找到解决方案。看起来似乎没有人像我这样使用的用例(或者我只是找不到)。 似乎唯一具有位置迭代器的解决方案都是与解析错误处理有关的解决方案,但这不是我感兴趣的情况。 下面是仅解析我所说的代码的代码,但我不知道如何继续进行。

  #include <boost/spirit/include/qi.hpp>
  #include <boost/spirit/include/support_line_pos_iterator.hpp>

  namespace qi = boost::spirit::qi;
  typedef boost::spirit::line_pos_iterator<std::string::const_iterator> pos_iterator_t;

  template<typename Iterator=pos_iterator_t, typename Skipper=qi::space_type>
  struct ParseGrammar: public qi::grammar<Iterator, Skipper>
  {
        ParseGrammar():ParseGrammar::base_type(SourceCode)
        {
           using namespace qi;
           KeywordFunction = lit("function");
           KeywordVar    = lit("var");
           SemiColon     = lit(';');

           Identifier = lexeme [alpha >> *(alnum | '_')];
           VarAssignemnt = KeywordVar >> Identifier >> char_('=') >> int_ >> SemiColon;
           SourceCode = KeywordFunction >> Identifier >> '{' >> *VarAssignemnt >> '}';
        }

        qi::rule<Iterator, Skipper> SourceCode;
        qi::rule<Iterator > KeywordFunction;
        qi::rule<Iterator,  Skipper> VarAssignemnt;
        qi::rule<Iterator> KeywordVar;
        qi::rule<Iterator> SemiColon;
        qi::rule<Iterator > Identifier;
  };

  int main()
  {
     std::string const content = "function FuncName_1 {\n var Var_1 = 3;\n var  Var_2 = 4; }";

     pos_iterator_t first(content.begin()), iter = first, last(content.end());
     ParseGrammar<pos_iterator_t> resolver;    //  Our parser
     bool ok = phrase_parse(iter,
                            last,
                            resolver,
                            qi::space);

     std::cout << std::boolalpha;
     std::cout << "\nok : " << ok << std::endl;
     std::cout << "full   : " << (iter == last) << std::endl;
     if(ok && iter == last)
     {
        std::cout << "OK: Parsing fully succeeded\n\n";
     }
     else
     {
        int line   = get_line(iter);
        int column = get_column(first, iter);
        std::cout << "-------------------------\n";
        std::cout << "ERROR: Parsing failed or not complete\n";
        std::cout << "stopped at: " << line  << ":" << column << "\n";
        std::cout << "remaining: '" << std::string(iter, last) << "'\n";
        std::cout << "-------------------------\n";
     }
     return 0;
  }

关于 utree,它被广泛认为是一个不错的实验,但已经不再得到精神开发者的支持。引用一位开发者的话:“吸引人的地方在于只有一种AST类型需要处理,但这也是它的弱点:一种大小适合所有的方法几乎总是不好的。”(更多上下文请参见 [spirit-general] 邮件列表) - sehe
关于在解析器中携带上下文(例如用于语义分析)的“问题”,我的常规方法是在解析之后将其移动到AST的单独一遍中(以防止用语义操作“拖累”语法定义)。如果必须这样做,您可以使用phx::ref()和/或qi::localsSO搜索可能会找到使用示例)。 - sehe
1个回答

15
这是一个有趣的练习,我最终编写了一个可行的on_success[1]演示来注释AST节点。
假设我们想要一个类似这样的AST:
namespace ast
{
struct LocationInfo {
    unsigned line, column, length;
};

struct Identifier     : LocationInfo {
    std::string name;
};

struct VarAssignment  : LocationInfo {
    Identifier id;
    int value;
};

struct SourceCode     : LocationInfo {
    Identifier function;
    std::vector<VarAssignment> assignments;
};
}

我知道,“位置信息”对于SourceCode节点来说可能有点过头,但你懂的...总之,为了使其易于为这些节点分配属性而无需使用语义动作或大量特别制作的构造函数:

#include <boost/fusion/adapted/struct.hpp>
BOOST_FUSION_ADAPT_STRUCT(ast::Identifier,    (std::string, name))
BOOST_FUSION_ADAPT_STRUCT(ast::VarAssignment, (ast::Identifier, id)(int, value))
BOOST_FUSION_ADAPT_STRUCT(ast::SourceCode,    (ast::Identifier, function)(std::vector<ast::VarAssignment>, assignments))

好的,现在我们可以声明规则来公开这些属性:

qi::rule<Iterator, ast::SourceCode(),    Skipper> SourceCode;
qi::rule<Iterator, ast::VarAssignment(), Skipper> VarAssignment;
qi::rule<Iterator, ast::Identifier()>         Identifier;
// no skipper, no attributes:
qi::rule<Iterator> KeywordFunction, KeywordVar, SemiColon;

我们基本上不修改语法: 属性传播“只是自动的”。[2]
KeywordFunction = lit("function");
KeywordVar      = lit("var");
SemiColon       = lit(';');

Identifier      = as_string [ alpha >> *(alnum | char_("_")) ];
VarAssignment   = KeywordVar >> Identifier >> '=' >> int_ >> SemiColon; 
SourceCode      = KeywordFunction >> Identifier >> '{' >> *VarAssignment >> '}';

神奇之处

我们如何获取与节点相关联的源位置信息?

auto set_location_info = annotate(_val, _1, _3);
on_success(Identifier,    set_location_info);
on_success(VarAssignment, set_location_info);
on_success(SourceCode,    set_location_info);

现在,“annotate”只是一个可调用对象的简化版本,定义如下:
template<typename It>
struct annotation_f {
    typedef void result_type;

    annotation_f(It first) : first(first) {}
    It const first;

    template<typename Val, typename First, typename Last>
    void operator()(Val& v, First f, Last l) const {
        do_annotate(v, f, l, first);
    }
  private:
    void static do_annotate(ast::LocationInfo& li, It f, It l, It first) {
        using std::distance;
        li.line   = get_line(f);
        li.column = get_column(first, f);
        li.length = distance(f, l);
    }
    static void do_annotate(...) { }
};

由于 get_column 的工作方式,这个函数对象是有状态的(因为它记住了开始迭代器)[3]。正如你所看到的,do_annotate 只接受从 LocationInfo 派生的任何内容。

现在,来证明一下:

std::string const content = "function FuncName_1 {\n var Var_1 = 3;\n var  Var_2 = 4; }";

pos_iterator_t first(content.begin()), iter = first, last(content.end());
ParseGrammar<pos_iterator_t> resolver(first);    //  Our parser

ast::SourceCode program;
bool ok = phrase_parse(iter,
        last,
        resolver,
        qi::space,
        program);

std::cout << std::boolalpha;
std::cout << "ok  : " << ok << std::endl;
std::cout << "full: " << (iter == last) << std::endl;
if(ok && iter == last)
{
    std::cout << "OK: Parsing fully succeeded\n\n";

    std::cout << "Function name: " << program.function.name << " (see L" << program.printLoc() << ")\n";
    for (auto const& va : program.assignments)
        std::cout << "variable " << va.id.name << " assigned value " << va.value << " at L" << va.printLoc() << "\n";
}
else
{
    int line   = get_line(iter);
    int column = get_column(first, iter);
    std::cout << "-------------------------\n";
    std::cout << "ERROR: Parsing failed or not complete\n";
    std::cout << "stopped at: " << line  << ":" << column << "\n";
    std::cout << "remaining: '" << std::string(iter, last) << "'\n";
    std::cout << "-------------------------\n";
}

这会打印出:
ok  : true
full: true
OK: Parsing fully succeeded

Function name: FuncName_1 (see L1:1:56)
variable Var_1 assigned value 3 at L2:3:14
variable Var_2 assigned value 4 at L3:3:15

完整演示程序

Coliru上实时查看

同时还展示了:

  • error handling, e.g.:

    Error: expecting "=" in line 3: 
    
    var  Var_2 - 4; }
               ^---- here
    ok  : false
    full: false
    -------------------------
    ERROR: Parsing failed or not complete
    stopped at: 1:1
    remaining: 'function FuncName_1 {
    var Var_1 = 3;
    var  Var_2 - 4; }'
    -------------------------
    
  • BOOST_SPIRIT_DEBUG macros

  • A bit of a hacky way to conveniently stream the LocationInfo part of any AST node, sorry :)
//#define BOOST_SPIRIT_DEBUG
#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/support_line_pos_iterator.hpp>
#include <iomanip>

namespace qi = boost::spirit::qi;
namespace phx= boost::phoenix;

typedef boost::spirit::line_pos_iterator<std::string::const_iterator> pos_iterator_t;

namespace ast
{
    namespace manip { struct LocationInfoPrinter; }

    struct LocationInfo {
        unsigned line, column, length;
        manip::LocationInfoPrinter printLoc() const;
    };

    struct Identifier     : LocationInfo {
        std::string name;
    };

    struct VarAssignment  : LocationInfo {
        Identifier id;
        int value;
    };

    struct SourceCode     : LocationInfo {
        Identifier function;
        std::vector<VarAssignment> assignments;
    };

    ///////////////////////////////////////////////////////////////////////////
    // Completely unnecessary tweak to get a "poor man's" io manipulator going
    // so we can do `std::cout << x.printLoc()` on types of `x` deriving from
    // LocationInfo
    namespace manip {
        struct LocationInfoPrinter {
            LocationInfoPrinter(LocationInfo const& ref) : ref(ref) {}
            LocationInfo const& ref;
            friend std::ostream& operator<<(std::ostream& os, LocationInfoPrinter const& lip) {
                return os << lip.ref.line << ':' << lip.ref.column << ':' << lip.ref.length;
            }
        };
    }

    manip::LocationInfoPrinter LocationInfo::printLoc() const { return { *this }; }
    // feel free to disregard this hack
    ///////////////////////////////////////////////////////////////////////////
}

BOOST_FUSION_ADAPT_STRUCT(ast::Identifier,    (std::string, name))
BOOST_FUSION_ADAPT_STRUCT(ast::VarAssignment, (ast::Identifier, id)(int, value))
BOOST_FUSION_ADAPT_STRUCT(ast::SourceCode,    (ast::Identifier, function)(std::vector<ast::VarAssignment>, assignments))

struct error_handler_f {
    typedef qi::error_handler_result result_type;
    template<typename T1, typename T2, typename T3, typename T4>
        qi::error_handler_result operator()(T1 b, T2 e, T3 where, T4 const& what) const {
            std::cerr << "Error: expecting " << what << " in line " << get_line(where) << ": \n" 
                << std::string(b,e) << "\n"
                << std::setw(std::distance(b, where)) << '^' << "---- here\n";
            return qi::fail;
        }
};

template<typename It>
struct annotation_f {
    typedef void result_type;

    annotation_f(It first) : first(first) {}
    It const first;

    template<typename Val, typename First, typename Last>
    void operator()(Val& v, First f, Last l) const {
        do_annotate(v, f, l, first);
    }
  private:
    void static do_annotate(ast::LocationInfo& li, It f, It l, It first) {
        using std::distance;
        li.line   = get_line(f);
        li.column = get_column(first, f);
        li.length = distance(f, l);
    }
    static void do_annotate(...) {}
};

template<typename Iterator=pos_iterator_t, typename Skipper=qi::space_type>
struct ParseGrammar: public qi::grammar<Iterator, ast::SourceCode(), Skipper>
{
    ParseGrammar(Iterator first) : 
        ParseGrammar::base_type(SourceCode),
        annotate(first)
    {
        using namespace qi;
        KeywordFunction = lit("function");
        KeywordVar      = lit("var");
        SemiColon       = lit(';');

        Identifier      = as_string [ alpha >> *(alnum | char_("_")) ];
        VarAssignment   = KeywordVar > Identifier > '=' > int_ > SemiColon; // note: expectation points
        SourceCode      = KeywordFunction >> Identifier >> '{' >> *VarAssignment >> '}';

        on_error<fail>(VarAssignment, handler(_1, _2, _3, _4));
        on_error<fail>(SourceCode, handler(_1, _2, _3, _4));

        auto set_location_info = annotate(_val, _1, _3);
        on_success(Identifier,    set_location_info);
        on_success(VarAssignment, set_location_info);
        on_success(SourceCode,    set_location_info);

        BOOST_SPIRIT_DEBUG_NODES((KeywordFunction)(KeywordVar)(SemiColon)(Identifier)(VarAssignment)(SourceCode))
    }

    phx::function<error_handler_f> handler;
    phx::function<annotation_f<Iterator>> annotate;

    qi::rule<Iterator, ast::SourceCode(),    Skipper> SourceCode;
    qi::rule<Iterator, ast::VarAssignment(), Skipper> VarAssignment;
    qi::rule<Iterator, ast::Identifier()>             Identifier;
    // no skipper, no attributes:
    qi::rule<Iterator> KeywordFunction, KeywordVar, SemiColon;
};

int main()
{
    std::string const content = "function FuncName_1 {\n var Var_1 = 3;\n var  Var_2 - 4; }";

    pos_iterator_t first(content.begin()), iter = first, last(content.end());
    ParseGrammar<pos_iterator_t> resolver(first);    //  Our parser

    ast::SourceCode program;
    bool ok = phrase_parse(iter,
            last,
            resolver,
            qi::space,
            program);

    std::cout << std::boolalpha;
    std::cout << "ok  : " << ok << std::endl;
    std::cout << "full: " << (iter == last) << std::endl;
    if(ok && iter == last)
    {
        std::cout << "OK: Parsing fully succeeded\n\n";

        std::cout << "Function name: " << program.function.name << " (see L" << program.printLoc() << ")\n";
        for (auto const& va : program.assignments)
            std::cout << "variable " << va.id.name << " assigned value " << va.value << " at L" << va.printLoc() << "\n";
    }
    else
    {
        int line   = get_line(iter);
        int column = get_column(first, iter);
        std::cout << "-------------------------\n";
        std::cout << "ERROR: Parsing failed or not complete\n";
        std::cout << "stopped at: " << line  << ":" << column << "\n";
        std::cout << "remaining: '" << std::string(iter, last) << "'\n";
        std::cout << "-------------------------\n";
    }
    return 0;
}

[1] 遗憾的是,除了魔术范例外,这些内容并未得到充分记录。

[2] 好吧,我使用as_string来获得适当的Identifier赋值,而不需要太多的工作。

[3] 在性能方面可能有更智能的方式,但现在让我们保持简单。


@Gregory81 如果没有使用on_success,这样做还可以,但在我看来,对于更大的语法规则,它并不是很“可扩展”。您需要考虑使用_语义动作_(请参见例如此答案,或者从此较大示例中获取的text_node_t构建器)。... - sehe
@Gregory81 我更喜欢将我的“咨询”限制在Stack Overflow上。这是因为我相信它对于寻找类似信息的未来用户最有用。此外,该系统是开放的,我经常会惊讶于其他人提出的优秀解决方案。每个人都受益。 - sehe
@sehe,1.好的,我理解你不想公开联系方式。2.我尝试编译了你提供的代码,似乎它使用了一些C++11特性,我的编译器(Visual Studio 2010)对某些结构不太满意。3.在你的代码中有几个我由于缺乏boost-spirit知识而不理解的东西。3.1什么是“expectation points”?你能指引我一些文档/教程/信息让我熟悉一下吗? - Grzegorz Wolszczak
@Gregory81 这是一个版本,可以在clang和g++中编译而无需使用-std=c++11,希望它也能在你的vs中工作。您可以在此处找到expectation parser的文档。它会在左侧参数成功且右侧参数失败时(例如,如果var后面没有标识符),引发异常。on_error允许您处理异常。 - llonesmiz
1
@sehe 感谢您的回答,它完美地概括了如何使用on_success,这在Conjure示例中可能有些难以理解,因为涉及到大量文件。 - llonesmiz
显示剩余5条评论

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接