如何从Java文本文件中读取单个单词（或行）？

Question

如何从Java文本文件中读取单个单词（或行）？

5

我正在尝试编写一个程序，可以从文本文件中读取单个单词并将它们存储到String变量中。我知道如何使用FileReader或FileInputStream读取单个char，但对于我想要的内容，这种方法不起作用。一旦我输入了想要比较的单词，我就会使用.equals与程序中的其他字符串变量进行比较，所以最好是导入为字符串。如果我以这种方式将整行文本文件作为字符串输入，则只需将一个单词放在文件的每一行上即可。如何从文本文件中输入单词并将其存储到字符串变量中？

编辑：好吧，那个重复的有点帮助。它可能适合我，但我的问题略有不同，因为那个问题只介绍了如何读取单行。我正在尝试读取线路中的单独单词。因此，基本上是将该行字符串拆分。

- Ashwin Gupta

1

这可能是Java中读取下一个单词的重复问题。 - Eric Leibenguth

@EricLeibenguth，有点像，阅读上面的编辑。 - Ashwin Gupta

1

不，不要错过答案的细节：使用Scanner.nextLine()获取下一行，使用Scanner.next()获取下一个单词。 - Eric Leibenguth

4个回答

9

要从文本文件中读取行，您可以使用以下代码（使用try-with-resources）：

String line;

try (
    InputStream fis = new FileInputStream("the_file_name");
    InputStreamReader isr = new InputStreamReader(fis, Charset.forName("UTF-8"));
    BufferedReader br = new BufferedReader(isr);
) {
    while ((line = br.readLine()) != null) {
        // Do your thing with line
    }
}

同样的内容，更加紧凑、难以阅读的版本：

String line;

try (BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream("the_file_name"), Charset.forName("UTF-8")))) {
    while ((line = br.readLine()) != null) {
        // Do your thing with line
    }
}

要将一行文本分成单独的单词，您可以使用String.split：

while ((line = br.readLine()) != null) {
    String[] words = line.split(" ");
    // Now you have a String array containing each word in the current line
}

- spork

好的，谢谢！这正是我在寻找的。 - Ashwin Gupta

2

你必须使用StringTokenizer！这里有一个示例并阅读此String Tokenizer

private BufferedReader innerReader; 
public void loadFile(Reader reader)
        throws IOException {
    if(reader == null)
    {
        throw new IllegalArgumentException("Reader not valid!");
    }
        this.innerReader = new BufferedReader(reader);
    String line;
    try
    {
    while((line = innerReader.readLine()) != null)
    {
        if (line == null || line.trim().isEmpty())
            throw new IllegalArgumentException(
                    "line empty");
        //StringTokenizer use delimiter for split string
        StringTokenizer tokenizer = new StringTokenizer(line, ","); //delimiter is ","
        if (tokenizer.countTokens() < 4)
            throw new IllegalArgumentException(
                    "Token number not valid (<= 4)");
        //You can change the delimiter if necessary, string example
        /*
        Hello / bye , hi
        */
        //reads up "/"
        String hello = tokenizer.nextToken("/").trim();
        //reads up ","
        String bye = tokenizer.nextToken(",").trim();
        //reads up to end of line
        String hi = tokenizer.nextToken("\n\r").trim();
        //if you have to read but do not know if there will be a next token do this
        while(tokenizer.hasMoreTokens())
        {
          String mayBe = tokenizer.nextToken(".");
        }
    }
    } catch (Exception e) {
        throw new IllegalArgumentException(e);
    }
}

- Michele Lacorte

好的，谢谢。我可能需要对这个字符串分解器进行一些研究，因为我以前从未见过它。我会在几分钟后回到这个问题。 - Ashwin Gupta

我做了更多的更改，希望对你有所帮助。 - Michele Lacorte

1

感谢 @MicheleLacorte 的回答。这很棒，我一定会研究一下，但现在 sporks 的回答更符合我的需求，对我来说更容易理解（我还不太擅长哈哈）。 - Ashwin Gupta

1

在Java8中，您可以执行以下操作：

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import java.util.stream.Collectors;

public class Foo {
    public List<String> readFileIntoListOfWords() {
        try {
            return Files.readAllLines(Paths.get("somefile.txt"))
                .stream()
                .map(l -> l.split(" "))
                .flatMap(Arrays::stream)
                .collect(Collectors.toList());
        }
        catch (IOException e) {
            e.printStackTrace();
        }
        return Collections.emptyList();
    }
}

虽然我怀疑 split 的参数需要更改，例如从单词末尾去除标点符号。

- beresfordt

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Misha · Accepted Answer

这些都是非常复杂的答案。我相信它们都很有用。但我更喜欢那个简洁优雅的Scanner：

public static void main(String[] args) throws Exception{
    Scanner sc = new Scanner(new File("fileName.txt"));
    while(sc.hasNext()){
        String s = sc.next();
        //.....
    }
}