推荐保留预处理器指令的C语言前端。

4
我希望开始一个涉及转换C代码的项目,但我想包括预处理指令。我不想重复造轮子,编写自己的C解析器。因此,有谁知道能够解析C预处理器和C代码,并生成可用于重新生成(或漂亮打印)原始源代码的AST的前端?例如:
#define FILENAME "filename"
#include <stdio.h>

FILE *f=0;
...
if (file_is_open) {
#ifdef CAN_OPEN_IT
    f = fopen(FILENAME, "r");
#else
    printf("Unable to open file.\n");
#endif
}

上述代码应该被解析成一些内存表示形式,以便用于重新生成源代码。换句话说,它不应该像普通的C代码那样分两个阶段处理,首先处理PP指令,然后解析纯粹的C代码。相反,它应该代表整个编译时逻辑,包括预处理器变量。

4个回答

3

1
我不相信Clang会在其AST中捕获预处理指令。 - Ira Baxter

1

我们的DMS软件重构工具包C前端(和C++前端),可以:

  • 解析各种方言的(可编译的)C源代码为ASTs
  • 在大多数情况下保留预处理器指令作为AST节点
  • 可以从ASTs生成可编译的C代码(带注释和预处理器指令)
  • 可以将成千上万个文件收集到单个映像中,以允许跨文件分析和转换
  • 提供完整的符号表构建和访问
  • 提供过程式访问ASTs,具有大型AST操作库,包括导航、检查、插入、删除、替换、匹配等
  • 使用C符号写的模式进行源到源的转换,与ASTs相匹配

对于C(尚未支持C++),DMS还提供了:

  • 控制和数据流分析
  • 本地和全局指针分析
  • 全局调用图构建

DMS已被用于处理极大的C应用程序,以提取事实并从原始源代码生成新的派生代码。

(编辑:2016年2月)

它可以处理OP的示例(稍作修正使其有效)。以下是稍作修改的源代码:

#define FILENAME "filename"
#include <stdio.h>

FILE *f;
main() {
  f=0;
if (file_is_open) {
#ifdef CAN_OPEN_IT
f = fopen(FILENAME, "r");
#else
printf("Unable to open file.\n");
#endif
}

}

这里是生成的抽象语法树:
C~GCC4 Domain Parser Version 3.0.1(28449)
Copyright (C) 1996-2013 Semantic Designs, Inc; All Rights Reserved; SD Confidential
Powered by DMS (R) Software Reengineering Toolkit
AST Optimizations: remove constant tokens, remove unary productions, compact sequences
Using encoding Unicode-UTF-8?ANSI +CRLF +1 /^I
(translation_unit@C~GCC4=2#4a7e0e0^0 Line 1 Column 1 File C:/temp/test.c
 (declaration_seq@C~GCC4=605#4a77580^1#4a7e0e0:1 {4} Line 1 Column 1 File C:/temp/test.c
  (control_line@C~GCC4=1094#4a775c0^1#4a77580:1 Line 1 Column 1 File C:/temp/test.c
   ('#'@C~GCC4=1548#4a771c0^1#4a775c0:1[Keyword:0] Line 1 Column 1 File C:/temp/test.c)'#'
   (IDENTIFIER@C~GCC4=1531#4a77200^1#4a775c0:2[`FILENAME'] Line 1 Column 9 File C:/temp/test.c)IDENTIFIER
   (<!MacroDefinition>@C~GCC4=1603#4a77180^2#4a775c0:3#4a7f300:1[`FILENAME'] Line 1 Column 18 File C:/temp/test.c
$VOID$ [Child 1]
   |(STRING_LITERAL@C~GCC4=1525#4a77160^2#4a77180:2#4a7f300:2[`filename'] Line 1 Column 18 File C:/temp/test.c)STRING_LITERAL
$VOID$ [Child 3]
   )<!MacroDefinition>#4a77180
   (new_line@C~GCC4=1578#4a77260^1#4a775c0:4[Keyword:0] Line 1 Column 28 File C:/temp/test.c)new_line
  )control_line#4a775c0
  (control_line@C~GCC4=1104#4a77460^1#4a77580:2 Line 2 Column 1 File C:/temp/test.c
   ('#'@C~GCC4=1548#4a77340^1#4a77460:1[Keyword:0] Line 2 Column 1 File C:/temp/test.c)'#'
   (ANGLED_HEADER_NAME@C~GCC4=1589#4a77380^1#4a77460:2[`stdio.h'] Line 2 Column 10 File C:/temp/test.c)ANGLED_HEADER_NAME
   (new_line@C~GCC4=1578#4a773c0^1#4a77460:3[Keyword:0] Line 2 Column 19 File C:/temp/test.c)new_line
  )control_line#4a77460
  (simple_declaration@C~GCC4=631#4a774c0^1#4a77580:3 Line 4 Column 1 File C:/temp/test.c
   (IDENTIFIER@C~GCC4=1531#4a77360^1#4a774c0:1[`FILE'] Line 4 Column 1 File C:/temp/test.c)IDENTIFIER
   (declarator@C~GCC4=850#4a77520^1#4a774c0:2 Line 4 Column 6 File C:/temp/test.c
   |(ptr_operator@C~GCC4=866#4a77560^1#4a77520:1 Line 4 Column 6 File C:/temp/test.c)ptr_operator
   |(IDENTIFIER@C~GCC4=1531#4a77480^1#4a77520:2[`f'] Line 4 Column 7 File C:/temp/test.c)IDENTIFIER
   )declarator#4a77520
  )simple_declaration#4a774c0
  (function_definition@C~GCC4=966#4a77be0^1#4a77580:4 Line 5 Column 1 File C:/temp/test.c
   (direct_declarator@C~GCC4=852#4a77440^1#4a77be0:1 Line 5 Column 1 File C:/temp/test.c
   |(IDENTIFIER@C~GCC4=1531#4a774e0^1#4a77440:1[`main'] Line 5 Column 1 File C:/temp/test.c)IDENTIFIER
   |(parameter_declaration_clause@C~GCC4=900#4a77220^1#4a77440:2 Line 5 Column 6 File C:/temp/test.c)parameter_declaration_clause
   )direct_declarator#4a77440
   (compound_statement@C~GCC4=507#4a77b20^1#4a77be0:2 Line 5 Column 8 File C:/temp/test.c
   |(statement_seq@C~GCC4=511#4a77d20^1#4a77b20:1 {2} Line 6 Column 3 File C:/temp/test.c
   | (AMBIGUITY<statement=358>@C~GCC4=1602#4a77680^1#4a77d20:1{2} Line 6 Column 3 File C:/temp/test.c
   |  (expression_statement@C~GCC4=503#4a7e040^1#4a77680:1 Line 6 Column 3 File C:/temp/test.c
   |   (assignment_expression@C~GCC4=457#4a77f00^1#4a7e040:1 Line 6 Column 3 File C:/temp/test.c
   |   |(assignment_target@C~GCC4=470#4a77a00^1#4a77f00:1 Line 6 Column 3 File C:/temp/test.c
   |   | (IDENTIFIER@C~GCC4=1531#4a77400^2#4a77a00:1#4a77fc0:1[`f'] Line 6 Column 3 File C:/temp/test.c)IDENTIFIER
   |   |)assignment_target#4a77a00
   |   |(INT_LITERAL@C~GCC4=1471#4a77a60^2#4a77f00:2#4a77f60:1[0] Line 6 Column 5 File C:/temp/test.c)INT_LITERAL
   |   )assignment_expression#4a77f00
   |  )expression_statement#4a7e040
   |  (simple_declaration@C~GCC4=630#4a7e060^1#4a77680:2 Line 6 Column 3 File C:/temp/test.c
   |   (init_declarator@C~GCC4=835#4a77fc0^1#4a7e060:1 Line 6 Column 3 File C:/temp/test.c
   |   |(IDENTIFIER@C~GCC4=1531#4a77400^2... [ALREADY PRINTED] ...)
   |   |(initializer@C~GCC4=983#4a77f60^1#4a77fc0:2 Line 6 Column 4 File C:/temp/test.c
   |   | (INT_LITERAL@C~GCC4=1471#4a77a60^2... [ALREADY PRINTED] ...)
   |   |)initializer#4a77f60
   |   )init_declarator#4a77fc0
   |  )simple_declaration#4a7e060
   | )AMBIGUITY#4a77680
   | (selection_statement@C~GCC4=527#4a77b40^1#4a77d20:2 Line 7 Column 1 File C:/temp/test.c
   |  (IDENTIFIER@C~GCC4=1531#4a7e0c0^1#4a77b40:1[`file_is_open'] Line 7 Column 5 File C:/temp/test.c)IDENTIFIER
   |  (compound_statement@C~GCC4=507#4a77ae0^1#4a77b40:2 Line 7 Column 19 File C:/temp/test.c
   |   (statement@C~GCC4=490#4a7f840^1#4a77ae0:1 Line 8 Column 1 File C:/temp/test.c
   |   |(if_directive@C~GCC4=1088#4a7f1c0^1#4a7f840:1 Line 8 Column 1 File C:/temp/test.c
   |   | ('#'@C~GCC4=1548#4a7f240^1#4a7f1c0:1[Keyword:0] Line 8 Column 1 File C:/temp/test.c)'#'
   |   | (IDENTIFIER@C~GCC4=1531#4a7ee60^1#4a7f1c0:2[`CAN_OPEN_IT'] Line 8 Column 8 File C:/temp/test.c)IDENTIFIER
   |   | (new_line@C~GCC4=1578#4a7f1e0^1#4a7f1c0:3[Keyword:0] Line 8 Column 19 File C:/temp/test.c)new_line
   |   |)if_directive#4a7f1c0
   |   |(AMBIGUITY<statement=358>@C~GCC4=1602#4a77d40^1#4a7f840:2{2} Line 9 Column 5 File C:/temp/test.c
   |   | (expression_statement@C~GCC4=503#4a7f4a0^1#4a77d40:1 Line 9 Column 5 File C:/temp/test.c
   |   |  (assignment_expression@C~GCC4=457#4a7f3c0^1#4a7f4a0:1 Line 9 Column 5 File C:/temp/test.c
   |   |   (assignment_target@C~GCC4=470#4a7eec0^1#4a7f3c0:1 Line 9 Column 5 File C:/temp/test.c
   |   |   |(IDENTIFIER@C~GCC4=1531#4a7eee0^2#4a7eec0:1#4a7f400:1[`f'] Line 9 Column 5 File C:/temp/test.c)IDENTIFIER
   |   |   )assignment_target#4a7eec0
   |   |   (postfix_expression@C~GCC4=201#4a7f2e0^1#4a7f3c0:2 Line 9 Column 9 File C:/temp/test.c
   |   |   |(IDENTIFIER@C~GCC4=1531#4a7f120^2#4a7f2e0:1#4a7f160:1[`fopen'] Line 9 Column 9 File C:/temp/test.c)IDENTIFIER
   |   |   |(expression_list@C~GCC4=228#4a7f260^2#4a7f2e0:2#4a7f160:2 Line 9 Column 15 File C:/temp/test.c
   |   |   | (<!MacroCall>@C~GCC4=1607#4a7f300^1#4a7f260:1[`FILENAME'] Line 9 Column 15 File C:/temp/test.c
   |   |   |  (<!MacroDefinition>@C~GCC4=1603#4a77180^2... [ALREADY PRINTED] ...)
   |   |   |  (STRING_LITERAL@C~GCC4=1525#4a77160^2... [ALREADY PRINTED] ...)
   |   |   |  $VOID$ [Child 3]
   |   |   |  (STRING_LITERAL@C~GCC4=1525#4a7f2c0^1#4a7f300:4[`filename'] Line 1 Column 18 File C:/temp/test.c)STRING_LITERAL
   |   |   |  $VOID$ [Child 5]
   |   |   | )<!MacroCall>#4a7f300
   |   |   | (STRING_LITERAL@C~GCC4=1525#4a7f140^1#4a7f260:2[`r'] Line 9 Column 25 File C:/temp/test.c)STRING_LITERAL
   |   |   |)expression_list#4a7f260
   |   |   )postfix_expression#4a7f2e0
   |   |  )assignment_expression#4a7f3c0
   |   | )expression_statement#4a7f4a0
   |   | (simple_declaration@C~GCC4=630#4a7f480^1#4a77d40:2 Line 9 Column 5 File C:/temp/test.c
   |   |  (init_declarator@C~GCC4=835#4a7f400^1#4a7f480:1 Line 9 Column 5 File C:/temp/test.c
   |   |   (IDENTIFIER@C~GCC4=1531#4a7eee0^2... [ALREADY PRINTED] ...)
   |   |   (initializer@C~GCC4=983#4a7f3e0^1#4a7f400:2 Line 9 Column 7 File C:/temp/test.c
   |   |   |(postfix_expression@C~GCC4=201#4a7f160^1#4a7f3e0:1 Line 9 Column 9 File C:/temp/test.c
   |   |   | (IDENTIFIER@C~GCC4=1531#4a7f120^2... [ALREADY PRINTED] ...)
   |   |   | (expression_list@C~GCC4=228#4a7f260^2... [ALREADY PRINTED] ...)
   |   |   |)postfix_expression#4a7f160
   |   |   )initializer#4a7f3e0
   |   |  )init_declarator#4a7f400
   |   | )simple_declaration#4a7f480
   |   |)AMBIGUITY#4a77d40
   |   |(else_directive@C~GCC4=1091#4a7f4c0^1#4a7f840:3 Line 10 Column 1 File C:/temp/test.c
   |   | ('#'@C~GCC4=1548#4a7f500^1#4a7f4c0:1[Keyword:0] Line 10 Column 1 File C:/temp/test.c)'#'
   |   | (new_line@C~GCC4=1578#4a7f4e0^1#4a7f4c0:2[Keyword:0] Line 10 Column 6 File C:/temp/test.c)new_line
   |   |)else_directive#4a7f4c0
   |   |(expression_statement@C~GCC4=503#4a7f7c0^1#4a7f840:4 Line 11 Column 5 File C:/temp/test.c
   |   | (postfix_expression@C~GCC4=201#4a77ba0^1#4a7f7c0:1 Line 11 Column 5 File C:/temp/test.c
   |   |  (IDENTIFIER@C~GCC4=1531#4a7f640^1#4a77ba0:1[`printf'] Line 11 Column 5 File C:/temp/test.c)IDENTIFIER
   |   |  (STRING_LITERAL@C~GCC4=1525#4a77c20^1#4a77ba0:2[`Unable to open file.
'] Line 11 Column 12 File C:/temp/test.c)STRING_LITERAL
   |   | )postfix_expression#4a77ba0
   |   |)expression_statement#4a7f7c0
   |   |(endif_directive@C~GCC4=1092#4a7f7e0^1#4a7f840:5 Line 12 Column 1 File C:/temp/test.c
   |   | ('#'@C~GCC4=1548#4a7f720^1#4a7f7e0:1[Keyword:0] Line 12 Column 1 File C:/temp/test.c)'#'
   |   | (new_line@C~GCC4=1578#4a7f700^1#4a7f7e0:2[Keyword:0] Line 12 Column 7 File C:/temp/test.c)new_line
   |   |)endif_directive#4a7f7e0
   |   )statement#4a7f840
   |  )compound_statement#4a77ae0
   | )selection_statement#4a77b40
   |)statement_seq#4a77d20
   )compound_statement#4a77b20
  )function_definition#4a77be0
 )declaration_seq#4a77580
)translation_unit#4a7e0e0

您可以在第8行看到预处理器指令"if_directive"。

是的,DMS也可以对这棵树进行漂亮的打印。以下命令运行解析器以生成AST,然后运行DMS漂亮打印机仅从树中重新生成源代码。往返精度很高;您可以重新编译并获得相同的结果。注释也被保留。

C:\DMS\Domains\C\GCC4\Tools\PrettyPrinter>run domainprettyprinter \temp\test.c
C~GCC4 PrettyPrinter Version 1.2.13
Copyright (C) 2004-2013 Semantic Designs, Inc; All Rights Reserved; SD Confidential
Powered by DMS (R) Software Reengineering Toolkit

#define FILENAME "filename"
#include <stdio.h>
FILE *f;

main()
{
  f = 0;
  if (file_is_open)
    {
      #ifdef CAN_OPEN_IT
        f = fopen(FILENAME, "r");
      #else
        printf("Unable to open file.\n");
      #endif
    }
}

您可以看到DMS如何处理C ++。目前,它可以处理GCC和MS方言的所有C ++14。


1

拿GNU gcc编译器来说,预处理源代码所需的标志是gcc -E mysource.c,详情请参见这里。至于美化代码,有一个叫做indent的工具,可以在这里了解其使用方法,虽然该文档有点老,但仍值得一提。还有一个名为cflow的工具,可以生成源代码的映射。

如果我理解错了您的需求,请见谅...


为什么要踩我?我提到了缩进和cflow...但问题并不清楚,为什么需要AST,当问题的上下文包括“漂亮打印”时。如果能够在踩我的同时留下评论解释原因,而不是忽略它,这将符合SO的精神。 - t0mm13b
踩负评是常有的事情,它们会带来一些麻烦。通常情况下,它们不会对你的声誉造成不可挽回的损害。 - Jonathan Leffler
@Jonathan:快问,之前我在https://dev59.com/EUvSa4cB1Zd3GeqPfZTI#2142845这里有3个赞,但现在显示为5个,而不是30个,为什么? - t0mm13b
1
抱歉如果我的表达不够清晰,我正在寻找一种能解析C和预处理器代码的工具,这并不一定是一个漂亮的打印机。但我提到它的原因是因为漂亮的打印机可能解析CPP代码。我想要的是能生成包括CPP逻辑的AST(抽象语法树)的工具。我并不关心漂亮的打印本身。 - Steve
@Steve:好的,我能给出的最佳答案是查看Antlr的语法解析规则,网址在这里...http://www.antlr.org/grammar/list...使用Antlr可以生成AST并具有多种语言接口,例如C#、C、CPP、Java都可以使用Antlr库进行解析,如果这正是你所需要的... :) - t0mm13b
显示剩余2条评论

-1

1
这似乎是关于生成用C实现的解析器的(ANTLR)解析器生成器。OP想要解析C语言。我错过了什么吗? - Ira Baxter

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接