提取嵌套花括号内的数据

Question

提取嵌套花括号内的数据

3

我想单独提取第一对大括号和第二对大括号之间的内容。现在我完全陷入了困境，有人能帮助我吗？我的文件read.txt包含以下数据。我只是将其读取为字符串“s”。

  BufferedReader br=new BufferedReader(new FileReader("read.txt"));
    while(br.ready())
    {
        String s=br.readLine();
        System.out.println(s);

    }

输出

{ { "John", "ran" },                { "NOUN", "VERB" } },
{ { "The", "dog", "jumped"},        { "DET", "NOUN", "VERB" } },
{ {  "Mike","lives","in","Poland"}, {"NOUN","VERB","DET","NOUN"} },

我的输出应该是这样的：

  "John", "ran"    
  "NOUN", "VERB" 
  "The", "dog", "jumped"  
  "DET", "NOUN", "VERB" 
  "Mike","lives","in","Poland" 
  "NOUN","VERB","DET","NOUN"

- Rose

1

正则表达式不适合这个任务。你需要编写（或查找）一个合适的解析器。这并不难，因为格式多少是固定的。 - The Paramagnetic Croissant

@Rod_Algonquin 我已经修改了我的问题，因为它不够清晰。现在已经解决了。 - Rose

4个回答

3

如果你按照 "}," 分割，则可以在单个字符串中得到一组词，然后只需替换大括号。

根据你的代码。

BufferedReader br=new BufferedReader(new FileReader("read.txt"));
while(br.ready())
{
    String s=br.readLine();
    String [] words = s.split ("},");

    for (int x = 0; x < words.length; x++) {
        String printme = words[x].replace("{", "").replace("}", "");
    }

}

- Scary Wombat

有一个小修改有点可怕。字符串printme = words[x].replace("{","").replace("}",""); - Rose

1

您可以始终删除开头的大括号，然后按“}”拆分，这将使您得到所需的字符串列表。（如果所有内容都在一个字符串中）

String s = input.replace("{","");
String[] splitString = s.split("},");

首先删除开放括号：

Would first remove open brackets:

"John", "ran" },                "NOUN", "VERB" } },
"The", "dog", "jumped"},        "DET", "NOUN", "VERB" } },
"Mike","lives","in","Poland"},"NOUN","VERB","DET","NOUN"} },

然后会按照}进行分割。

"John", "ran"
"NOUN", "VERB" }
"The", "dog", "jumped"
"DET", "NOUN", "VERB" }
"Mike","lives","in","Poland"
"NOUN","VERB","DET","NOUN"}

然后您只需要使用另一个替换来整理它们！

- James Hunt

有一个小错误，因为括号还没有关闭。System.out.println(splitString[i].replace("}", "")); - chopss

1

另一种方法是搜索不包含内部{或}字符的{...}子字符串，并仅获取其中不包含{和}的部分。

描述这种子字符串的正则表达式可能如下所示：

\\{(?<content>[^{}]+)\\}

解释：

\\{是转义后的字符{，现在表示字面意义上的 { （通常它表示量词{x,y}的开始，因此需要进行转义）
(?<content>...)是命名捕获组，它将仅存储{和}之间的部分，并且稍后我们将能够使用此部分（而不是整个匹配项，其中也包括{ }）
[^{}]+表示一个或多个非{ }字符
\\}转义后的字符}，这意味着它表示}

演示：

String input = "{ { \"John\", \"ran\" },                { \"NOUN\", \"VERB\" } },\r\n" + 
        "{ { \"The\", \"dog\", \"jumped\"},        { \"DET\", \"NOUN\", \"VERB\" } },\r\n" + 
        "{ {  \"Mike\",\"lives\",\"in\",\"Poland\"}, {\"NOUN\",\"VERB\",\"DET\",\"NOUN\"} },";

Pattern p = Pattern.compile("\\{(?<content>[^{}]+)\\}");
Matcher m = p.matcher(input);
while(m.find()){
    System.out.println(m.group("content").trim());
}

输出：

"John", "ran"
"NOUN", "VERB"
"The", "dog", "jumped"
"DET", "NOUN", "VERB"
"Mike","lives","in","Poland"
"NOUN","VERB","DET","NOUN"

- Pshemo

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- zx81 · Accepted Answer

请使用这个正则表达式：

(?<=\{)(?!\s*\{)[^{}]+

请查看正则表达式演示中的匹配结果。

在Java中：

Pattern regex = Pattern.compile("(?<=\\{)(?!\\s*\\{)[^{}]+");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    // matched text: regexMatcher.group()
}

解释

正则表达式中的“后顾断言”(?<=\{)表示当前位置之前必须是一个左花括号{
负向先行断言(?!\s*\{)表示接下来的内容不能是可选的空格然后是左花括号{
[^{}]+匹配不包含花括号的任意字符