Java分割带引号的字符串并保留中间的单词

3

我需要一些关于Java代码的帮助,用于拆分以下输入:

word1 key="value with space" word3 -> [ "word1", "key=\"value with space\"", "word3" ]
word1 "word2 with space" word3 -> [ "word1", "word2 with space", "word3" ]
word1 word2 word3 -> [ "word1" , "word2", "word3" ]

第一个样本输入比较困难。第二个单词在字符串中间有引号,而不是在开头。我发现了几种处理中间示例的方法,例如在Java中按空格拆分字符串,但在引号之间除外(即将\"hello world\"视为一个标记)中所述。

3个回答

1
与其使用正则表达式,你可以对字符串进行简单的迭代:
public static String[] splitWords(String str) {
        List<String> array = new ArrayList<>(); 
        boolean inQuote = false; // Marker telling us if we are between quotes
        int previousStart = -1;  // The index of the beginning of the last word
        for (int i = 0; i < str.length(); i++) {
            char c = str.charAt(i);
            if (Character.isWhitespace(c)) {
                if (previousStart != -1 && !inQuote) {
                    // end of word
                    array.add(str.substring(previousStart, i));
                    previousStart = -1;
                }
            } else {
                // possibly new word
                if (previousStart == -1) previousStart = i;
                // toggle state of quote
                if (c == '"')
                    inQuote = !inQuote;
            }
        }
        // Add last segment if there is one
        if (previousStart != -1) 
            array.add(str.substring(previousStart));
        return array.toArray(new String [array.size()]);
    }

这种方法的优势在于能够正确地识别多次出现在空格附近以外的引号。例如,下面是一个单一字符串:
a"b c"d"e f"g

这解决了我的问题!请查看我的代码编辑,修复了一些打字错误。谢谢Mad Physicist! - Mike Cooper

0

这可以通过正则表达式和替换的混合使用来完成。首先找到被引号包围的文本并用非空格字符替换。然后,您可以基于空格拆分字符串并将关键文本替换回去。

    String s1 = "word1 key=\"value with space\" word3";

    List<String> list = new ArrayList<String>();
    Matcher m = Pattern.compile("\"([^\"]*)\"").matcher(s1);
    while (m.find())
        s1 = s1.replace(m.group(1), m.group(1).replace(" ", "||")); // replaces the spaces between quotes with ||

    for(String s : s1.split(" ")) {
        list.add(s.replace("||", " ")); // switch back the text to a space.
        System.out.println(s.replace("||", " ")); // just to see output
    }

0
可以通过在正则表达式中使用前瞻来完成分割:
String[] words = input.split(" +(?=(([^\"]*\"){2})*[^\"]*$)");

这是一些测试代码:

String[] inputs = { "word1 key=\"value with space\" word3","word1 \"word2 with space\" word3", "word1 word2 word3"};
for (String input : inputs) {
    String[] words = input.split(" +(?=(([^\"]*\"){2})*[^\"]*$)");
    System.out.println(Arrays.toString(words));
}

输出:

[word1, key="value with space", word3]
[word1, "word2 with space", word3]
[word1, word2, word3]

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接