从C/C++代码中删除注释

Question

从C/C++代码中删除注释

c++ccomments

87

有没有一种简单的方法可以从C/C++源文件中删除注释，而不进行任何预处理呢？（即，我认为可以使用gcc -E，但这将展开宏。）我只想要没有注释的源代码，不应该改变任何其他内容。

编辑：

更喜欢现有的工具。我不想用正则表达式自己编写代码，因为我预见到会有太多的问题。

- Mike

5

这实际上是使用简单的词法分析器和语法分析器进行良好的练习！ - Greg Hewgill

64

这实际上是一项使用非常复杂的词法分析器和语法分析器的良好练习。 - anon

4

@Pascal: 我不相信Dobbs博士，gcc也同意：“错误：粘贴“ / ”和“ / ”不能生成有效的预处理令牌”，这是可以预料的，因为在预处理之前会发生注释去除。 - Christoph

2

@Neil：抱歉，不行。解析器处理语句的结构。从语言的角度来看，注释是单个标记，不参与任何更大的结构。它与空格字符没有任何区别（实际上，在翻译的第三阶段中，每个注释都将被替换为单个空格字符）。至于将预处理器集成到编译器中，解释起来就简单多了：预处理器通常会产生非常大的输出，因此有效地向编译器传递信息可以大大提高编译速度。 - Jerry Coffin

7

@Neil：也许这样做是最好的——你似乎只是在重复同一个主张，没有支持证据。你甚至从未指出需要什么语义分析才能正确解析评论，只是一再强调它是必需的（但标准不仅不要求，而且实际上也不允许）。您用三字符组替换、拼接行，然后将源代码分解为标记和一系列空格序列（包括注释）。如果您试图考虑更多的语义，那么您就是错的... - Jerry Coffin

显示剩余14条评论

12个回答

18

这取决于你的评论有多恶劣。我有一个程序scc用于去除C和C++注释。我还有一个测试文件，我尝试了GCC（在MacOS X上的4.2.1版本）并使用了当前选定答案中的选项 - 但是GCC似乎不能完美地处理测试用例中一些被切割殆尽的注释。

注意：这不是一个现实生活中的问题 - 人们不会写出这样可怕的代码。

请考虑测试用例的子集（共135行中的36行）：

/\
*\
Regular
comment
*\
/
The regular C comment number 1 has finished.

/\
\/ This is not a C++/C99 comment!

This is followed by C++/C99 comment number 3.
/\
\
\
/ But this is a C++/C99 comment!
The C++/C99 comment number 3 has finished.

/\
\* This is not a C or C++ comment!

This is followed by regular C comment number 2.
/\
*/ This is a regular C comment *\
but this is just a routine continuation *\
and that was not the end either - but this is *\
\
/
The regular C comment number 2 has finished.

This is followed by regular C comment number 3.
/\
\
\
\
* C comment */

在我的Mac上，从GCC (gcc -fpreprocessed -dD -E subset.c) 得到的输出是:

/\
*\
Regular
comment
*\
/
The regular C comment number 1 has finished.

/\
\/ This is not a C++/C99 comment!

This is followed by C++/C99 comment number 3.
/\
\
\
/ But this is a C++/C99 comment!
The C++/C99 comment number 3 has finished.

/\
\* This is not a C or C++ comment!

This is followed by regular C comment number 2.
/\
*/ This is a regular C comment *\
but this is just a routine continuation *\
and that was not the end either - but this is *\
\
/
The regular C comment number 2 has finished.

This is followed by regular C comment number 3.
/\
\
\
\
* C comment */

的输出结果是：

The regular C comment number 1 has finished.

/\
\/ This is not a C++/C99 comment!

This is followed by C++/C99 comment number 3.
/\
\
\
/ But this is a C++/C99 comment!
The C++/C99 comment number 3 has finished.

/\
\* This is not a C or C++ comment!

This is followed by regular C comment number 2.

The regular C comment number 2 has finished.

This is followed by regular C comment number 3.

'scc -C'（识别双斜杠注释）的输出结果是：

The regular C comment number 1 has finished.

/\
\/ This is not a C++/C99 comment!

This is followed by C++/C99 comment number 3.

The C++/C99 comment number 3 has finished.

/\
\* This is not a C or C++ comment!

This is followed by regular C comment number 2.

The regular C comment number 2 has finished.

This is followed by regular C comment number 3.

SCC源代码现已在GitHub上公开

SCC的当前版本是6.60（日期为2016-06-12），但Git版本是在2017-01-18（美国/太平洋时区）创建的。该代码可从GitHub获取，网址为https://github.com/jleffler/scc-snapshots。您还可以找到之前的版本（4.03、4.04、5.05）和两个预发布版本（6.16、6.50）的快照，这些都被标记为release/x.yz。

该代码仍然主要在RCS下开发。作者仍在思考如何使用子模块或类似机制来处理通用库文件，例如 stderr.c 和 stderr.h文件（也可以在https://github.com/jleffler/soq中找到）。

SCC 6.60版本试图理解C++11、C++14和C++17结构，如二进制常量、数字标点、原始字符串和十六进制浮点数。它默认使用C11模式操作。（请注意，上述提到的-C标志的含义，在主体回答中描述的4.0x版本和目前最新版本6.60之间发生了变化。）

- Jonathan Leffler

5

相信我，乔纳森，他们确实这样做了。我清理了代码，发现有2000行的代码被注释掉了。我简直无法相信一个人怎么能写出这样混乱的代码。 - Halil

请问您能否发布这个程序并在此处提供链接吗？（如果它是自由软件） - Totor

@Totor：这是免费/自由（默认为GPL v3）软件。给我发电子邮件，我会把它发送给你（我的电子邮件地址在我的个人资料中）。我只是没有一个定期发布代码的地方（很可悲，不是吗！）。 - Jonathan Leffler

@JonathanLeffler 为什么不把你的代码发布在像 GitHub 这样的平台上呢？ - Mads Hansen

@JonathanLeffler，你能把它放在gists.github.com上吗？我需要它。 - noɥʇʎԀʎzɐɹƆ

显示剩余3条评论

8

gcc -fpreprocessed -dD -E 对我来说不起作用，但这个程序可以做到：

#include <stdio.h>

static void process(FILE *f)
{
 int c;
 while ( (c=getc(f)) != EOF )
 {
  if (c=='\'' || c=='"')            /* literal */
  {
   int q=c;
   do
   {
    putchar(c);
    if (c=='\\') putchar(getc(f));
    c=getc(f);
   } while (c!=q);
   putchar(c);
  }
  else if (c=='/')              /* opening comment ? */
  {
   c=getc(f);
   if (c!='*')                  /* no, recover */
   {
    putchar('/');
    ungetc(c,f);
   }
   else
   {
    int p;
    c = 0;
    putchar(' ');               /* replace comment with space */
    do
    {
     p=c;
     c=getc(f);
    } while (c!='/' || p!='*');
   }
  }
  else
  {
   putchar(c);
  }
 }
}

int main(int argc, char *argv[])
{
 process(stdin);
 return 0;
}

- lhf

7

不处理三连符。 - OmnipotentEntity

7

有一个名为 stripcmt 的程序可以完成此操作：

StripCmt 是一个简单的实用程序，用 C 语言编写，可从 C、C++ 和 Java 源文件中删除注释。它遵循 Unix 文本处理程序的传统，既可以作为先进先出（FIFO）过滤器运行，也可以在命令行上接受参数。

（取自 hlovdal 对此问题的回答）

- che

1

代码仍然存在一些错误。例如，它无法处理像 int /* comment // */ main() 这样的代码。 - pynexj

处理类似于 //注释掉下一行\ 的注释时，会出现错误。 - sleepsort

它可以处理这些情况。只要/*、//、*/不分成两行，它就能完美地工作。 - qeatzy

4

这是一个 Perl 脚本，用于删除 // 单行注释和 /* 多行注释 */。

  #!/usr/bin/perl

  undef $/;
  $text = <>;

  $text =~ s/\/\/[^\n\r]*(\n\r)?//g;
  $text =~ s/\/\*+([^*]|\*(?!\/))*\*+\///g;

  print $text;

需要将源文件作为命令行参数传递。将脚本保存到一个文件中，例如 remove_comments.pl，并使用以下命令调用它：perl -w remove_comments.pl [您的源文件]

希望这对您有所帮助

- Vladimir

2

似乎无法处理包含“/*”、“//”等的字符串。深入了解后果自负。 - akavel

3

我也遇到过这个问题。我找到了这个工具 (Cpp-Decomment) ，对我很有用。但是它会忽略注释行是否延伸到下一行。例如：

// this is my comment \
comment continues ...

在这种情况下，我找不到程序中的解决方法，所以只好手动搜索被忽略的行并进行修复。我相信应该有一个选项可以解决这个问题，或者你可以更改程序的源文件来实现。

- Halil

2

因为您使用C语言，您可能希望使用一些与C语言“自然”相似的东西。您可以使用C预处理器来仅删除注释。下面给出的示例适用于来自GCC的C预处理器。它们也应该以相同或类似的方式与其他C预处理器一起工作。

对于C语言，请使用：

cpp -dD -fpreprocessed -o output.c input.c

它还可以用于从JSON中删除注释，例如像这样：

cpp -P -o - - <input.json >output.json

如果您的C预处理器没有直接访问权限，您可以尝试将cpp替换为cc -E，这会调用C编译器并告诉它在预处理阶段停止。如果您的C编译器二进制文件不是cc，则可以将cc替换为您的C编译器二进制文件的名称，例如clang。请注意，并非所有预处理器都支持-fpreprocessed。

- Christian Hujer

1

我使用标准C库编写了一个C程序，大约200行，用于删除C源代码文件中的注释。 qeatzy/removeccomments

行为

跨越多行或占据整行的C样式注释将被清除。
位于行中间的C样式注释保持不变。例如：void init(/* do initialization */) {...}
占据整行的C++样式注释将被清除。
通过检查"和\"来尊重C字符串文字。
处理行连续。如果前一行以\结尾，则当前行是前一行的一部分。
行号保持不变。零化的行或部分行变为空。

测试和性能分析

我使用包含许多注释的最大cpython源代码进行了测试。在这种情况下，它可以正确且快速地完成任务，比gcc快2-5倍。

time gcc -fpreprocessed -dD -E Modules/unicodeobject.c > res.c 2>/dev/null
time ./removeccomments < Modules/unicodeobject.c > result.c

使用方法

/path/to/removeccomments < input_file > output_file

- qeatzy

0

最近我写了一些Ruby代码来解决这个问题。我考虑了以下几种异常情况：

字符串中的注释
一行上的多行注释，修复贪婪匹配。
多行注释跨越多行

这里是代码:

它使用以下代码来预处理每一行，以防止这些注释出现在字符串中。如果它出现在你的代码中，那么，很不幸。你可以用更复杂的字符串替换它。

MUL_REPLACE_LEFT = "MUL_REPLACE_LEFT"
MUL_REPLACE_RIGHT = "MUL_REPLACE_RIGHT"
SIG_REPLACE = "SIG_REPLACE"

用法：ruby -w inputfile outputfile

- chunyang.wen

0

我相信如果你使用一个语句，就可以轻松地从C中删除注释

perl -i -pe ‘s/\\\*(.*)/g’ file.c This command Use for removing * C style comments 
perl -i -pe 's/\\\\(.*)/g' file.cpp This command Use for removing \ C++ Style Comments

这个命令的唯一问题是它不能删除包含多行的注释。但是，通过使用这个正则表达式，您可以轻松实现逻辑以删除多行注释。

- Poseidon_Geek

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Josh Lee · Accepted Answer

在您的源文件上运行以下命令：

gcc -fpreprocessed -dD -E test.c

感谢KennyTM找到正确的标志。以下是完整性的结果：

test.c：

#define foo bar
foo foo foo
#ifdef foo
#undef foo
#define foo baz
#endif
foo foo
/* comments? comments. */
// c++ style comments

gcc -fpreprocessed -dD -E test.c:

#define foo bar
foo foo foo
#ifdef foo
#undef foo
#define foo baz
#endif
foo foo