提取嵌套花括号内的数据

3
我想单独提取第一对大括号和第二对大括号之间的内容。现在我完全陷入了困境,有人能帮助我吗?我的文件read.txt包含以下数据。我只是将其读取为字符串“s”。
  BufferedReader br=new BufferedReader(new FileReader("read.txt"));
    while(br.ready())
    {
        String s=br.readLine();
        System.out.println(s);

    }

输出

{ { "John", "ran" },                { "NOUN", "VERB" } },
{ { "The", "dog", "jumped"},        { "DET", "NOUN", "VERB" } },
{ {  "Mike","lives","in","Poland"}, {"NOUN","VERB","DET","NOUN"} },

我的输出应该是这样的:
  "John", "ran"    
  "NOUN", "VERB" 
  "The", "dog", "jumped"  
  "DET", "NOUN", "VERB" 
  "Mike","lives","in","Poland" 
  "NOUN","VERB","DET","NOUN"

1
正则表达式不适合这个任务。你需要编写(或查找)一个合适的解析器。这并不难,因为格式多少是固定的。 - The Paramagnetic Croissant
@Rod_Algonquin 我已经修改了我的问题,因为它不够清晰。现在已经解决了。 - Rose
4个回答

7
请使用这个正则表达式:
(?<=\{)(?!\s*\{)[^{}]+

请查看正则表达式演示中的匹配结果。
在Java中:
Pattern regex = Pattern.compile("(?<=\\{)(?!\\s*\\{)[^{}]+");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    // matched text: regexMatcher.group()
}

解释

  • 正则表达式中的“后顾断言”(?<=\{)表示当前位置之前必须是一个左花括号{
  • 负向先行断言(?!\s*\{)表示接下来的内容不能是可选的空格然后是左花括号{
  • [^{}]+匹配不包含花括号的任意字符

2
这让我很困惑。 - Scary Wombat
FYI:添加解释。 :) - zx81
@TheParamagneticCroissant 你为什么这么说? - zx81
@zx81,(?<=\{)(?!\s*\{)[^{}]+ 应该做/意味着某些明智且易于推理的事情... - The Paramagnetic Croissant
@zx81 看起来你认为我不懂正则表达式。其实我懂,只是我觉得它们很糟糕。 - The Paramagnetic Croissant
显示剩余4条评论

3
如果你按照 "}," 分割,则可以在单个字符串中得到一组词,然后只需替换大括号。
根据你的代码。
BufferedReader br=new BufferedReader(new FileReader("read.txt"));
while(br.ready())
{
    String s=br.readLine();
    String [] words = s.split ("},");

    for (int x = 0; x < words.length; x++) {
        String printme = words[x].replace("{", "").replace("}", "");
    }

}

有一个小修改有点可怕。字符串printme = words[x].replace("{","").replace("}",""); - Rose

1
您可以始终删除开头的大括号,然后按“}”拆分,这将使您得到所需的字符串列表。(如果所有内容都在一个字符串中)
String s = input.replace("{","");
String[] splitString = s.split("},");

首先删除开放括号:

Would first remove open brackets:

"John", "ran" },                "NOUN", "VERB" } },
"The", "dog", "jumped"},        "DET", "NOUN", "VERB" } },
"Mike","lives","in","Poland"},"NOUN","VERB","DET","NOUN"} },

然后会按照}进行分割。
"John", "ran"
"NOUN", "VERB" }
"The", "dog", "jumped"
"DET", "NOUN", "VERB" }
"Mike","lives","in","Poland"
"NOUN","VERB","DET","NOUN"}

然后您只需要使用另一个替换来整理它们!

有一个小错误,因为括号还没有关闭。System.out.println(splitString[i].replace("}", "")); - chopss

1
另一种方法是搜索不包含内部{}字符的{...}子字符串,并仅获取其中不包含{}的部分。

描述这种子字符串的正则表达式可能如下所示:

\\{(?<content>[^{}]+)\\}

解释:

  • \\{是转义后的字符{,现在表示字面意义上的 { (通常它表示量词{x,y}的开始,因此需要进行转义)
  • (?<content>...)是命名捕获组,它将仅存储{}之间的部分,并且稍后我们将能够使用此部分(而不是整个匹配项,其中也包括{ }
  • [^{}]+表示一个或多个非{ }字符
  • \\}转义后的字符},这意味着它表示}

演示:

String input = "{ { \"John\", \"ran\" },                { \"NOUN\", \"VERB\" } },\r\n" + 
        "{ { \"The\", \"dog\", \"jumped\"},        { \"DET\", \"NOUN\", \"VERB\" } },\r\n" + 
        "{ {  \"Mike\",\"lives\",\"in\",\"Poland\"}, {\"NOUN\",\"VERB\",\"DET\",\"NOUN\"} },";

Pattern p = Pattern.compile("\\{(?<content>[^{}]+)\\}");
Matcher m = p.matcher(input);
while(m.find()){
    System.out.println(m.group("content").trim());
}

输出:

"John", "ran"
"NOUN", "VERB"
"The", "dog", "jumped"
"DET", "NOUN", "VERB"
"Mike","lives","in","Poland"
"NOUN","VERB","DET","NOUN"

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接