JavaScript如何将字符串按空格或引号分割成数组？

Question

JavaScript如何将字符串按空格或引号分割成数组？

38

var str = 'single words "fixed string of words"';
var astr = str.split(" "); // need fix

我希望数组变成这样：

var astr = ["single", "words", "fixed string of words"];

- Remi

9个回答

33

str.match(/\w+|"[^"]+"/g)

//single, words, "fixed string of words"

- YOU

8

这似乎是根据 '.'、'-' 和空格进行分割。建议改为 str.match(/\S+|"[^"]+"/g)。 - Awalias

这还有另一个问题，如果要处理转义引号。例如：'single words "fixed string of \"quoted\" words"'即使使用了Awalias的更正，这仍然会产生：["single", "words", ""fixed", "string", ""of", "words""]您需要处理转义引号，但不能跳出并获取转义的反斜杠。我认为最终会比您希望使用regexp处理更加复杂。 - jep

2

@Awalias，我下面有更好的答案。你的正则表达式示例实际上应该是/[^\s"]+|"([^"]*)"/g。你的正则表达式仍然会在引号区域中分割空格。我添加了一个答案来修复这个问题，并像OP要求的那样从结果中删除引号。 - dallin

1

如果您想允许转义引号，请参见此其他SO问题。 - bitinerant

12

这里使用了split和正则表达式匹配的混合技巧。

var str = 'single words "fixed string of words"';
var matches = /".+?"/.exec(str);
str = str.replace(/".+?"/, "").replace(/^\s+|\s+$/g, "");
var astr = str.split(" ");
if (matches) {
    for (var i = 0; i < matches.length; i++) {
        astr.push(matches[i].replace(/"/g, ""));
    }
}

这个代码会返回预期结果，虽然使用单个正则表达式应该能够完成全部操作。

// ["single", "words", "fixed string of words"]

更新，这是S.Mark提出的方法的改进版本。

var str = 'single words "fixed string of words"';
var aStr = str.match(/\w+|"[^"]+"/g), i = aStr.length;
while(i--){
    aStr[i] = aStr[i].replace(/"/g,"");
}
// ["single", "words", "fixed string of words"]

- Sean Kinsey

改进版存在问题，如果使用非单词字符如“#”，它将会消失。 - Tuhis

这是一个很好的答案，但如果你想通过正则表达式完成所有操作并删除引号，我添加了一个新的答案，它可以做到这一点，而且不需要在每个结果中循环以后再去除引号。 - dallin

5

这里可能会有一个完整的解决方案：https://github.com/elgs/splitargs

- user1663023

3

ES6支持的解决方案:

除了引号内部分割，其他地方按空格分割
删除引号，但不删除反斜杠转义的引号
转义的引号变成引号
可以在任何位置使用引号

代码:

str.match(/\\?.|^$/g).reduce((p, c) => {
        if(c === '"'){
            p.quote ^= 1;
        }else if(!p.quote && c === ' '){
            p.a.push('');
        }else{
            p.a[p.a.length-1] += c.replace(/\\(.)/,"$1");
        }
        return  p;
    }, {a: ['']}).a

输出：

[ 'single', 'words', 'fixed string of words' ]

- Tsuneo Yoshioka

2

这将把它分割成一个数组，并从任何剩余的字符串中剥离周围的引号。最初的回答。

const parseWords = (words = '') =>
    (words.match(/[^\s"]+|"([^"]*)"/gi) || []).map((word) => 
        word.replace(/^"(.+(?="$))"$/, '$1'))

- tim.breeding

0

这个解决方案适用于双引号 (") 和单引号 (')：

代码：

str.match(/[^\s"']+|"([^"]*)"/gmi)

// ["single", "words", "fixed string of words"]

这里展示了这个正则表达式的工作方式：https://regex101.com/r/qa3KxQ/2

- julianYaman

0

在我找到@dallin的答案之前（此线程：https://dev59.com/yXE85IYBdhLWcg3wXCIv#18647776），我在JavaScript中处理包含未引用和引用术语/短语的字符串时遇到了困难。

在研究这个问题时，我进行了许多测试。

由于我发现很难找到这些信息，因此我整理了相关信息（如下），这可能对其他寻求有关JavaScript处理包含引用单词的字符串的答案的人有用。

let q = 'apple banana "nova scotia" "british columbia"';

提取【仅】引用的单词和短语：

// https://dev59.com/iGct5IYBdhLWcg3wFpi1
const r = q.match(/"([^']+)"/g);
console.log('r:', r)
// r: Array [ "\"nova scotia\" \"british columbia\"" ]
console.log('r:', r.toString())
// r: "nova scotia" "british columbia"

// ----------------------------------------

// [alternate regex] https://www.regextester.com/97161
const s = q.match(/"(.*?)"/g);
console.log('s:', s)
// s: Array [ "\"nova scotia\"", "\"british columbia\"" ]
console.log('s:', s.toString())
// s: "nova scotia","british columbia"

提取[所有]未引用、引用的单词和短语:

// https://dev59.com/yXE85IYBdhLWcg3wXCIv
const t = q.match(/\w+|"[^"]+"/g);
console.log('t:', t)
// t: Array(4) [ "apple", "banana", "\"nova scotia\"", "\"british columbia\"" ]
console.log('t:', t.toString())
// t: apple,banana,"nova scotia","british columbia"

// ----------------------------------------------------------------------------

// https://dev59.com/yXE85IYBdhLWcg3wXCIv
// [@dallon 's answer (this thread)] https://dev59.com/yXE85IYBdhLWcg3wXCIv#18647776

var myRegexp = /[^\s"]+|"([^"]*)"/gi;
var myArray = [];

do {
    /* Each call to exec returns the next regex match as an array. */
    var match = myRegexp.exec(q);    // << "q" = my query (string)
    if (match != null)
    {
        /* Index 1 in the array is the captured group if it exists.
         * Index 0 is the matched text, which we use if no captured group exists. */
        myArray.push(match[1] ? match[1] : match[0]);
    }
} while (match != null);

console.log('myArray:', myArray, '| type:', typeof(myArray))
// myArray: Array(4) [ "apple", "banana", "nova scotia", "british columbia" ] | type: object
console.log(myArray.toString())
// apple,banana,nova scotia,british columbia

使用集合（而不是数组）：

// https://dev59.com/5F4b5IYBdhLWcg3wiiV4
var mySet = new Set(myArray);
console.log('mySet:', mySet, '| type:', typeof(mySet))
// mySet: Set(4) [ "apple", "banana", "nova scotia", "british columbia" ] | type: object

遍历集合元素：

mySet.forEach(x => console.log(x));
/* apple
 * banana
 * nova scotia
 * british columbia
 */

// https://dev59.com/2WQo5IYBdhLWcg3wBLUo
myArrayFromSet = Array.from(mySet);

for (let i=0; i < myArrayFromSet.length; i++) {
    console.log(i + ':', myArrayFromSet[i])
}
/*
 0: apple
 1: banana
 2: nova scotia
 3: british columbia 
 */

侧边栏

上面的 JavaScript 响应来自于 FireFox 开发者工具 (F12，从网页中取得)。我创建了一个空白的 HTML 文件，它调用一个 .js 文件，我用 Vim 来编辑，作为我的 IDE。简单的 JavaScript IDE
根据我的测试，克隆的集合似乎是深拷贝。ES6 Map 或 Set 的浅拷贝

- Victoria Stuart

-1

我也注意到了字符的消失。我认为你可以把它们包含进去 - 比如，要包括 "+" 这个词，可以使用类似 "[\w\+]" 而不是只用 "\w"。

- user655489

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- dallin · Accepted Answer

接受的答案并不完全正确。它会在非空格字符（如.和-）处分隔，并将引号留在结果中。更好的方法是使用捕获组来排除引号，例如：

//The parenthesis in the regex creates a captured group within the quotes
var myRegexp = /[^\s"]+|"([^"]*)"/gi;
var myString = 'single words "fixed string of words"';
var myArray = [];

do {
    //Each call to exec returns the next regex match as an array
    var match = myRegexp.exec(myString);
    if (match != null)
    {
        //Index 1 in the array is the captured group if it exists
        //Index 0 is the matched text, which we use if no captured group exists
        myArray.push(match[1] ? match[1] : match[0]);
    }
} while (match != null);

现在，myArray将恰好包含OP所要求的内容：

single,words,fixed string of words