JavaScript中用于删除/**/注释的正则表达式

Question

JavaScript中用于删除/**/注释的正则表达式

3

如何使用 JavaScript 中的正则表达式从字符串中删除类似此内容的内容？

/*
    multi-line comment
*/

这是我尝试过的内容：

var regex = /(\/\*(^\*\/)*\*\/)/g;
string = string.replace(regex, '');

- user2981107

2

错误的工具，请使用esprima和escodegen- JavaScript不是正则表达式。 - Benjamin Gruenbaum

我会说这看起来非常像C风格的注释。如果不是，那么就没有规则，只需使用/\/\*.*\*\//表示贪婪匹配，使用/\/\*.*?\*\//表示非贪婪匹配。 - user557597

1

正则表达式无法正确删除JavaScript中的多行注释。虽然可以接近，但总会有边缘情况会出现问题。 - zzzzBov

但是，如果按照C语言的规则，注释和引用要排在最前面。 - user557597

3个回答

4

所有使用正则表达式的答案在这里都完全失败了，有几种情况。

var myString = '/*Hello World!*/'; // inside a string
var a = "/*b", c = /.*/g; // inside a string partially, and inside a regex literal

// /*
alert("This will not fire with the regular expressions, but works in JS");
// */
var/**/b = 5; // perfectly valid, replacing a comment with nothing is simply incorrect

对于一些比较明显的情况，正则表达式并不足以正确解析注释，需要了解语言的语法。

那么，如果正则表达式失败了，还剩什么？解析器。难吗？其实不难。

让我们自己来看看 JavaScript 语法！关于注释的部分如下所述：

MultiLineComment ::
    /* MultiLineCommentCharsopt */

这是好的，这意味着当我们在多行注释内时，直到遇到 */ 才退出注释，然后立即退出。

但是注释什么时候出现？几乎可以出现在除了文字字面量外的任何地方。在我们有的 5 种文字字面量之外，多行注释标记只能出现在字符串字面量和正则表达式字面量中。

function parse(code){
    // state
    var isInRegExp = false;
    var isInString = false;
    var terminator = null; // to hold the string terminator
    var escape = false; // last char was an escape
    var isInComment = false;

    var c = code.split(""); // code

    var o = []; // output
    for(var i = 0; i < c.length; i++){
        if(isInString) {  // handle string literal case
             if(c[i] === terminator && escape === false){
                  isInString = false;
                  o.push(c[i]);
             } else if (c[i] === "\\") { // escape
                  escape = true;
             } else {
                  escape = false;
                  o.push(c[i]); 
             }
        } else if(isInRegExp) { // regular expression case
             if(c[i] === "/" && escape === false){
                 isInRegExp = false;
                 o.push(c[i]);
             } else if (c[i] === "\\") {
                 escape = true;
             } else { 
                escape = false;
                o.push(c[i]);
             }
        } else if (isInComment) { // comment case
              if(c[i] === "*" && c[i+1] === "/"){
                  isInComment = false;
                  i++;
                  // Note - not pushing comments to output
              }
        } else {   // not in a literal
              if(c[i] === "/" && c[i+1] === "/") { // single line comment
                   while(c[i] !== "\n" && c[i] !== undefined){ //end or new line
                       i++;
                   }
              } else if(c[i] === "/" && c[i+1] === "*"){ // start comment
                    isInComment = true;
                    o.push(" "); // add a space per spec
                    i++; // don't catch /*/
              } else if(c[i] === "/"){ // start regexp literal
                    isInRegExp = true;
                    o.push(c[i]);
              } else if(c[i] === "'" || c[i] === '"'){ // string literal
                    isInString = true;
                    o.push(c[i]);
                    separator = c[i];
              } else { // plain ol' code
                    o.push(c[i]);
              }
        }
    }
    return o.join("");
}

我刚刚在控制台中写了这段代码，虽然比较长，但你能看出它是多么简单吗？从概念上讲很简单——它只是跟踪代码的位置，并根据其消耗单词。

我们来试一下：

parse("var a = 'hello world'"); // var a = 'hello world' 
parse("var/**/a = 'hello world'"); // var a = 'hello world' 
parse("var myString = '/*Hello World!*/';"); // var myString = '/*Hello World!*/';
parse('var a = "/*b", c = /.*/g;'); // var a = "/*b", c = /.*/g;
parse("var a; /* remove me please! */"); // var a;
parse("var x = /* \n \n Hello World Multiline String \n \n */ 5"); // var x =   5

- Benjamin Gruenbaum

解析器相当简单，如果需要帮助，请告诉我。 - Benjamin Gruenbaum

你有几个引号字符有点混乱，我想我已经修复了它们，但请再检查一下，因为嵌套引号很容易出错。 - zzzzBov

@zzzzBov 哦，很酷，谢谢，希望只是例子有问题。我在 JSFiddle 上测试过了，看起来可以工作。 - Benjamin Gruenbaum

-1

以下代码将从Javascript中删除命令和垃圾信息。

 var regex = /^(\s*[^\s]*\s*)$/g;
 string = string.replace(regex, '');

希望这能有所帮助...

- PAC

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- zzzzBov · Accepted Answer

如果您想匹配以/*开头，后跟任意数量不包含*/的文本，然后是*/，那么可以使用正则表达式，但这不会正确删除JavaScript中的块注释。

一个简单的模式失败示例是：

var a = '/*'; /* block comment */

请注意，即使在字符串中包含，第一个/*也会被匹配。如果您能保证要搜索的内容不包含这种不一致性，或者仅是使用正则表达式查找手动更改的位置，则应该相对安全。否则，请勿使用正则表达式，因为在这种情况下它们是错误的工具；您已经被警告了。

构建正则表达式的方法，只需将我的第一句话分解成其组成部分即可。

/开始正则表达式文字
\/\*匹配文字/*
[\s\S]*?以非贪心方式匹配任何字符
\*\/匹配文字*/
/结束正则表达式

将所有这些组合在一起，就得到了：

/\/\*[\s\S]*?\*\//

非贪婪匹配是必要的，以防止在文件中存在多个块注释时捕获结束注释符（*/）：

/* foo */
var foo = 'bar';
/* fizz */
var fizz = 'buzz';

使用非贪婪匹配，

/* foo */

并且

/* fizz */

如果没有非贪婪匹配，将进行最佳匹配。

/* foo */
var foo = 'bar';
/* fizz */

将被匹配。