将AST节点转换为向量/数字

4

你好,我有一些Python 3代码的抽象语法树(AST)集合。我已经尝试多天来找出将节点转换为可用的向量/数字表示的最佳方法。

例如,这里是一个AST(未标注字段)的转储:

Module([Import([alias('time', None), alias('sys', None), alias('pygame', None)]), Import([alias('random', None)]), Import([alias('sequence', None)]), Assign([Name('S_SIZE', Store()), Tuple([Name('S_WID', Store()), Name('S_HGT', Store())], Store())], Tuple([Num(600), Num(400)], Load())), Assign([Name('screen', Store())], Call(Attribute(Attribute(Name('pygame', Load()), 'display', Load()), 'set_mode', Load()), [Name('S_SIZE', Load())], [])), Assign([Name('NUMB_COUNT', Store())], Num(200)), Assign([Name('nlist', Store())], ListComp(BinOp(Call(Attribute(Name('random', Load()), 'random', Load()), [], []), Mult(), Name('S_HGT', Load())), [comprehension(Name('_', Store()), Call(Name('range', Load()), [Name('NUMB_COUNT', Load())], []), [], 0)])), Assign([Name('num', Store())], Call(Attribute(Name('sequence', Load()), 'NumGroup', Load()), [Name('nlist', Load()), Name('S_WID', Load()), Name('S_HGT', Load())], [])), FunctionDef('draw_all', arguments([], None, [], [], None, []), [Expr(Call(Attribute(Name('screen', Load()), 'fill', Load()), [Tuple([Num(0), Num(0), Num(0)], Load())], [])), Expr(Call(Attribute(Name('num', Load()), 'draw', Load()), [Name('screen', Load())], [])), Expr(Call(Attribute(Attribute(Name('pygame', Load()), 'display', Load()), 'flip', Load()), [], []))], [], None), FunctionDef('bubble_sort', arguments([arg('nlist', None), arg('i', None), arg('end_ind', None)], None, [], [], None, []), [If(Compare(Name('i', Load()), [Eq()], [Name('end_ind', Load())]), [Return(Tuple([Num(0), BinOp(Name('end_ind', Load()), Sub(), Num(1))], Load()))], []), If(Compare(Attribute(Subscript(Name('nlist', Load()), Index(Name('i', Load())), Load()), 'val', Load()), [Gt()], [Attribute(Subscript(Name('nlist', Load()), Index(BinOp(Name('i', Load()), Add(), Num(1))), Load()), 'val', Load())]), [Expr(Call(Attribute(Name('nlist', Load()), 'swap', Load()), [Name('i', Load()), BinOp(Name('i', Load()), Add(), Num(1))], []))], []), Return(Tuple([BinOp(Name('i', Load()), Add(), Num(1)), Name('end_ind', Load())], Load()))], [], None), If(Compare(Name('__name__', Load()), [Eq()], [Str('__main__')]), [Expr(Call(Attribute(Name('pygame', Load()), 'init', Load()), [], [])), Assign([Name('i', Store())], Num(0)), Assign([Name('end_ind', Store())], BinOp(Call(Name('len', Load()), [Name('num', Load())], []), Sub(), Num(1))), While(Num(1), [For(Name('event', Store()), Call(Attribute(Attribute(Name('pygame', Load()), 'event', Load()), 'get', Load()), [], []), [If(Compare(Attribute(Name('event', Load()), 'type', Load()), [Eq()], [Attribute(Name('pygame', Load()), 'QUIT', Load())]), [Expr(Call(Attribute(Name('sys', Load()), 'exit', Load()), [], []))], [])], []), For(Name('n', Store()), Name('num', Load()), [Expr(Call(Attribute(Name('n', Load()), 'set_color', Load()), [Tuple([Num(255), Num(255), Num(255)], Load())], []))], []), If(Compare(Name('end_ind', Load()), [NotEq()], [Num(0)]), [Expr(Call(Attribute(Subscript(Name('num', Load()), Index(Name('i', Load())), Load()), 'set_color', Load()), [Tuple([Num(0), Num(255), Num(0)], Load())], []))], []), If(Compare(Name('end_ind', Load()), [NotEq()], [Num(0)]), [Assign([Tuple([Name('i', Store()), Name('end_ind', Store())], Store())], Call(Name('bubble_sort', Load()), [Name('num', Load()), Name('i', Load()), Name('end_ind', Load())], []))], []), Expr(Call(Attribute(Name('num', Load()), 'update', Load()), [], [])), Expr(Call(Name('draw_all', Load()), [], []))], [])], [])])

我希望将其转化为这样的格式,以便将其输入TensorFlow:

[1,2,3,1,2,13,56,12,53,41,31...etc]

我找到了所有节点的副本(已转换为字典):
NODE_LIST = [
'Module','Interactive','Expression','FunctionDef','ClassDef','Return',
'Delete','Assign','AugAssign','Print','For','While','If','With','Raise',
'TryExcept','TryFinally','Assert','Import','ImportFrom','Exec','Global',
'Expr','Pass','Break','Continue','attributes','BoolOp','BinOp','UnaryOp',
'Lambda','IfExp','Dict','Set','ListComp','SetComp','DictComp',
'GeneratorExp','Yield','Compare','Call','Repr','Num','Str','Attribute',
'Subscript','Name','List','Tuple','Load','Store','Del',
'AugLoad','AugStore','Param','Ellipsis','Slice','ExtSlice','Index','And','Or',
'Add','Sub','Mult','Div','Mod','Pow','LShift','RShift','BitOr','BitXor',
'BitAnd','FloorDiv','Invert','Not','UAdd','USub','Eq','NotEq','Lt',
'LtE','Gt','GtE','Is','IsNot','In','NotIn','comprehension','ExceptHandler',
'arguments','keyword','alias']

NODE_MAP = {x: i for (i, x) in enumerate(NODE_LIST)}

例如,
{'Module':1,'Interactive':2,...etc}

我尝试过使用ASTWalkers和生成器,但我仍然找不到一个好的方法来完成这个问题。任何帮助都将不胜感激 :)
编辑: 我认为我可能在寻找 ast.NodeVisitor 的 visit_Name 方法。
class ToInteger(ast.NodeVisitor):

    def visit_Name(self, node):
        print(node.id)
        print(NODE_MAP[node.id])

这正是我需要的(输出片段):

Module
0
Import
18
alias
91
alias
91
alias
91
Import
18
alias

我现在的主要问题是提取NODE_MAP [node.id],因为只有返回修改后的树才能使用return。

我可能在 visit_Name(self, node): 中发现了一些问题,正在调查。 - Saddy
我目前并不特别担心嵌套节点。虽然我相信将来可以使用generic_visit解决这个问题。 - Saddy
你说得对。我现在正在尝试使用NodeVisitorvisit_Name,看起来它正在运行?(更新我的问题以显示这一点) - Saddy
1个回答

1

ast.dump 显示 node.__class__.__name__,所以我猜这个字符串是你想要映射到数字的字符串,而这个数字是由NODE_LIST中的索引决定的。

class CustomNodeVisitor(ast.NodeVisitor):
    def visit(self, node):
        print(node.__class__.__name__)
        return ast.NodeVisitor.visit(self, node)

1
是的,我正在运行一个基于研究论文的实验。不过那篇论文写了很久,所以我正在尝试将它更新到Python 3和TF 2.0(也许还要添加一些功能)。 - Saddy
你能告诉我你所指的论文是哪一篇吗? - Kalana Mihiranga

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接