Java正则表达式:从字符串中删除两种不同类型的注释

3
我有一段文本,其中有两种类型的注释。一种是用%分隔的注释,另一种是以/* 开始并以*/ 结束的注释。例如:
输入1: Sarah was going out. % Remember she usually doesn't go out % It was very cold. 期望输出1: Sarah was going out. It was very cold. 输入2: Sarah was going out. /* Remember she usually doesn't go out */ It was very cold. 期望输出2: Sarah was going out. It was very cold. 输入3: Charles knocked on the door and a woman opened it. % Hmm, is this good... /* Not sure */ Perhaps this should happen in chapter 10 instead? % She looked at him. - Yes?, she said. 期望输出3: Charles knocked on the door and a woman opened it. She looked at him. - Yes?, she said. 输入4: Charles knocked on the door and a woman opened it. % Hmm, is this good... /* Not sure to 100% */ Perhaps this should happen in chapter 10 instead? % She looked at him. - Yes?, she said. 期望输出4: Charles knocked on the door and a woman opened it. */ Perhaps this should happen in chapter 10 instead? 基本上,当遇到开头的注释标记时,应删除直到其相应的结束注释标记为止(即使这意味着删除另一种类型的注释标记)。
如果打开了一个注释,无论是用%还是/*,但从未关闭,就会假定该注释将继续到文本结束。但是,如果只存在这种类型的结束标记*/(因为打开者在另一个注释中并因此被删除),则应将其保留在文本中。

我认为输入4应该导致查尔斯敲门,一个女人打开了门。*/也许这应该在第10章发生?%她看着他。-是的?她说。。你似乎想要.replaceAll("%[^%]*%|/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/","")。不过,最后一个%可能需要可选,如"%[^%]*%?|/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/" - Wiktor Stribiżew
好的,谢谢你指出这一点。我忘了提到,如果没有结束标记(或者只有一个结束标记,因为开头在另一个注释中并被删除了,就像这种情况),那么标记应该保留在那里。所以输入/输出4是正确的。 - dadadima
所以,根据你的说法(“标记应该留在那里”),在输入4中不应删除“%她看着他。- 是的?”,她说。 - Wiktor Stribiżew
你又说对了。如果一个注释被打开,无论是用%还是/*,但从未关闭,那么这个注释会一直延续到文本的结尾。 - dadadima
1
啊哈,那么你可能想要使用.replaceAll("%[^%]*%?|/\\*[^*]*(?:\\*(?!/)[^*]*)*(?:\\*/)?","") - Wiktor Stribiżew
非常感谢,就是这样!你能简要解释一下背后的逻辑吗?我不是很擅长正则表达式,但我想学习。此外,我想给你答案的荣誉,但我认为我不能在评论中这样做。如果您没有时间/不想回答,我可以回复自己引用您并给您信用。您不会得到积分! :) - dadadima
1个回答

3

你可以使用

.replaceAll("%[^%]*%?|/\\*[^*]*(?:\\*(?!/)[^*]*)*(?:\\*/)?","")

请查看正则表达式演示

详情

  • %[^%]*%? - 匹配带有可选尾部定界符的%...%类别的注释:
    • % - % 字符
    • [^%]* - 除%外的0个或多个字符
    • %? - 可选的%字符
  • | - 或
  • /\*[^*]*(?:\*(?!/)[^*]*)*(?:\*/)? - 匹配带有可选尾部定界符的 /*...*/类别的注释:
    • /\* - /* 字符串
    • [^*]* - 除*外的0个或多个字符
    • (?:\*(?!/)[^*]*)* - 0或多次以下匹配:
      • \*(?!/) - 不跟随/*
      • [^*]* - 除*外的0个或多个字符
    • (?:\*/)? - 一个可选的*/子字符串。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接