JavaCC 带转义字符的引号

3

在包含转义字符的引用字符串中,常规的分词方式是什么?以下是一些示例:

1) "this is good"
2) "this is\"good\""
3) "this \is good"
4) "this is bad\"
5) "this is \\"bad"
6) "this is bad
7)  this is bad"
8)  this is bad

以下是一个样例解析器,但它并不完全正确;除了示例4和5之外,它对所有示例都有预期结果,但这两个示例的解析失败。
options
{
  LOOKAHEAD = 3;
  CHOICE_AMBIGUITY_CHECK = 2;
  OTHER_AMBIGUITY_CHECK = 1;
  STATIC = false;
  DEBUG_PARSER = false;
  DEBUG_LOOKAHEAD = false;
  DEBUG_TOKEN_MANAGER = true;
  ERROR_REPORTING = true;
  JAVA_UNICODE_ESCAPE = false;
  UNICODE_INPUT = false;
  IGNORE_CASE = false;
  USER_TOKEN_MANAGER = false;
  USER_CHAR_STREAM = false;
  BUILD_PARSER = true;
  BUILD_TOKEN_MANAGER = true;
  SANITY_CHECK = true;
  FORCE_LA_CHECK = true;
}

PARSER_BEGIN(MyParser)
import java.io.ByteArrayInputStream;
import java.io.UnsupportedEncodingException;
public class MyParser {
    public static void main(String[] args) throws UnsupportedEncodingException, ParseException{
        //note that this conversion to an input stream is only good for small strings
        MyParser parser = new MyParser(new ByteArrayInputStream(args[0].getBytes("UTF-8")));
        parser.enable_tracing();
        parser.myProduction();
        System.out.println("Must have worked!");
    }
}
PARSER_END(MyParser)

TOKEN:
{
<QUOTED: 
    "\"" 
    (
        "\\" ~[]    //any escaped character
        |           //or
        ~["\""]      //any non-quote character
    )* 
    "\""
>
}


void myProduction() :
{}
{
    <QUOTED>
    <EOF>
}

你可以通过输入来运行MyParser并进行解析。如果成功,它将打印“必须已经工作!”;如果失败,它将抛出错误。
我该如何更改此解析器以在示例4和5上正确失败?
1个回答

16

要修复您的正则表达式,请使其

TOKEN: {
<QUOTED: 
    "\"" 
    (
         "\\" ~[]     //any escaped character
    |                 //or
        ~["\"","\\"]  //any character except quote or backslash
    )* 
    "\"" > 
}

1
谢谢,运行得很好。我也认为这将是一个有用的谷歌搜索结果。 - Nate Glenn
4
当你回到一个答案并想着:“哦,是啊,点个赞!”然后发现你上次来的时候已经点过赞了。^_^ - Ajax

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接