将一个字符串数组拆分为一个字符串数组的数组

Question

将一个字符串数组拆分为一个字符串数组的数组

4

我正在寻找一种方法来拆分这个字符串数组：

["this", "is", "a", "test", ".", "I", "wonder", "if", "I", "can", "parse", "this",
"text", "?", "Without", "any", "errors", "!"]

将文本按标点符号分组：

[
  ["this", "is", "a", "test", "."],
  ["I", "wonder", "if", "I", "can", "parse", "this", "text", "?"],
  ["Without", "any", "errors", "!"]
]

有没有一种简单的方法来做这件事？最明智的方法是遍历数组，将每个索引添加到临时数组中，并在找到标点符号时将该临时数组附加到容器数组中吗？

我想使用 slice 或 map，但我无法确定是否可能。

- randy newfield

2个回答

2

@ndn给出了这个问题的最佳答案，但我建议另一种方法，可能适用于其他问题。通常情况下，像您提供的这种数组是通过在空格或标点符号上分割字符串获得的。例如：

s = "this is a test. I wonder if I can parse this text? Without any errors!"
s.scan /\w+|[.?!]/
  #=> ["this", "is", "a", "test", ".", "I", "wonder", "if", "I", "can",
  #    "parse", "this", "text", "?", "Without", "any", "errors", "!"]

当这种情况出现时，您可能会发现以其他方式直接操作字符串更加方便。例如，在这里，您可以首先使用带有正则表达式的String#split来将字符串s分成句子：

r1 = /
     (?<=[.?!]) # match one of the given punctuation characters in capture group 1
     \s*   # match >= 0 whitespace characters to remove spaces
     /x    # extended/free-spacing regex definition mode

a = s.split(r1)
  #=> ["this is a test.", "I wonder if I can parse this text?",
  #    "Without any errors!"]

然后将句子分开：

r2 = /
     \s+       # match >= 1 whitespace characters
     |         # or
     (?=[.?!]) # use a positive lookahead to match a zero-width string
               # followed by one of the punctuation characters
     /x

b = a.map { |s| s.split(r2) }
  #=> [["this", "is", "a", "test", "."],
  #    ["I", "wonder", "if", "I", "can", "parse", "this", "text", "?"],
  #    ["Without", "any", "errors", "!"]]

- Cary Swoveland

不幸的是，这个解决方案似乎在输出中丢失了标点符号。 - Wand Maker

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- ndnenkov · Accepted Answer

看一下Enumerable#slice_after：

x.slice_after { |e| '.?!'.include?(e) }.to_a