使用BufferReader将文本拆分为单词

Question

使用BufferReader将文本拆分为单词

3

我有一个解决问题的问题。我必须使用bufferedReader仅将单词添加到treeset中（并输出treeset的大小），但问题是我无法通过编译器速度测试限制。文本仅包含字母和空格（它可以是空行）。我必须找到一个新的解决方案，但似乎不是这个：

BufferedReader read = new BufferedReader(new InputStreamReader(System.in));
Set<String> text = new TreeSet<String>();
String words[], line;
while ((line = read.readLine()) != null) {
    words = line.split("\\s+");
    for (int i = 0; i < words.length && words[0].length() > 0; i++) {
        text.add(words[i]);
    }
}
System.out.println(text.size());

是否有其他“split”方法可供使用，以便编译器使用更少的“思考时间”？

- chris09trt

你能否使用Scanner类替代BufferedReader？ - Pavindu

不确定您是否想在循环保护中使用"words[0].length() > 0"条件，因为这会阻止添加任何东西，即使字符串以空格开头，即使后面有单词。将其作为循环内的条件语句。（只需使用for each循环，无需使用数组索引）。 - Andy Turner

3个回答

0

基于您提供的假设，我会将所有内容添加到集合中，并在最后删除不需要的值。这有望减少检查条件所需的时间（实际上并不多）。

BufferedReader read = new BufferedReader(new InputStreamReader(System.in));
Set<String> text = new TreeSet<String>();
String words[], line;
while ((line = read.readLine()) != null) {
  words = line.split("\\s+");
  for(String value: words) {
    text.add(value);
  }
}
text.remove(" ");
text.remove("");
text.remove(null);
System.out.println(text.size());

- nafas

0

当然，您可以将BufferedReader流式传输到TreeSet中：

Collection<String> c = read.lines().flatMap(line -> Stream.of(line.split("\\s+")).filter(word -> word.length() > 0)).collect(Collectors.toCollection(TreeSet::new));

- g00se

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- vszholobov · Accepted Answer

一行内

words = line.split("\\s+");

您使用正则表达式进行拆分，这比按一个字符拆分要慢得多（在我的计算机上慢了5倍）。 Java split String performances

如果单词仅由一个空格分隔，则解决方案很简单。

words = line.split(" ");

只需使用这行代码替换，您的程序将运行更快。

如果单词之间可以用多个空格分隔，则在循环后添加这样一行。

text.remove("");

您仍然可以用1个字符的分隔符替换正则表达式分隔符。

public class Test {
    public static void main(String[] args) throws IOException {
        // string contains 1, 2 and two spaces between 1 and 2. text size should be 2
        String txt = "1  2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" +
            "1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" +
            "1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" +
            "1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" +
            "1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" +
            "1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1";

        InputStream inpstr = new ByteArrayInputStream(txt.getBytes());

        BufferedReader read = new BufferedReader(new InputStreamReader(inpstr));
        Set<String> text = new TreeSet<>();
        String[] words;
        String line;
        long startTime = System.nanoTime();
        while ((line = read.readLine()) != null) {
            //words = line.split("\\s+"); -- runs 5 times slower
            words = line.split(" ");
            for (int i = 0; i < words.length; i++) {
                text.add(words[i]);
            }
        }
        text.remove("");  // add only if words can be separated with multiple spaces

        long endTime = System.nanoTime();
        System.out.println((endTime - startTime) + " " + text.size());
    }
}

同时，您可以使用以下内容替换您的for loop：

text.addAll(Arrays.asList(words));