如何在C++中将整个文件读入std::string？

Question

如何在C++中将整个文件读入std::string？

261

如何将文件一次性读入std::string？即，整个文件一次性读取。

文本或二进制模式应由调用者指定。解决方案应符合标准、可移植且高效。它不应不必要地复制字符串数据，并且在读取字符串时应避免重新分配内存。

一种方法是使用stat函数获取文件大小，调整std::string的大小并使用fread()将内容读入std::string的const_cast<char*>()转换后的data()中。这需要std::string的数据是连续的，这在标准中没有要求，但它似乎适用于所有已知的实现。更糟糕的是，如果以文本模式读取文件，则std::string的大小可能不等于文件的大小。

一个完全正确、符合标准和可移植的解决方案可以使用std::ifstream的rdbuf()读入std::ostringstream，然后再将其读入std::string。但是，这可能会复制字符串数据并/或不必要地重新分配内存。

所有相关的标准库实现是否都足够聪明，能避免所有不必要的开销？
是否有其他方法可以实现？
我是否错过了一些隐藏的Boost函数，已经提供所需的功能？

void slurp(std::string& data, bool is_binary)

- wilbur_m

1

文本模式和二进制模式是MSDOS和Windows特定的技巧，旨在解决Windows中换行符由两个字符（CR / LF）表示的事实。在文本模式下，它们被视为一个字符（'\n'）。 - Ferruccio

2

虽然不是完全重复，但这与以下内容密切相关：如何为std :: string对象预分配内存？（与Konrad上面的声明相反，该内容包括代码来执行此操作，直接将文件读入目标，而不进行额外的复制）。 - Jerry Coffin

2

“连续性不是标准所必需的” - 事实上，从某种程度上来说是必需的。一旦您在字符串上使用op[]，它必须被合并成一个连续的可写缓冲区，因此如果您首先使用.resize()调整大小足够大，那么写入&str[0]是绝对安全的。而且在C++11中，字符串总是连续的。 - Tino Didriksen

4

相关链接：如何在C++中读取文件？ -- 对不同的方法进行了基准测试和讨论。而且，被接受答案中的 rdbuf 并不是最快的，read 才是。 - legends2k

2

如果文件的编码/解释错误，则所有这些解决方案都会导致字符串格式不正确。当我将JSON文件序列化为字符串时，一直出现奇怪的问题，直到我手动将其转换为UTF-8；无论我尝试什么解决方案，我始终只得到第一个字符！这是需要注意的事情！ :) - kayleeFrye_onDeck

显示剩余2条评论

24个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Roflcopter4 · Answer 1

我知道这是一个古老的问题，有很多答案，但没有一个提到我认为最明显的方法。是的，我知道这是C++，使用libc是邪恶和错误的，但无所谓。对于这样一个简单的事情，使用libc是可以的。

基本上：只需打开文件，获取其大小（不一定按顺序），然后读取即可。

#include <string>
#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <sys/stat.h>

static constexpr char filename[] = "foo.bar";

int main(void)
{
    FILE *fp = ::fopen(filename, "rb");
    if (!fp) {
        ::perror("fopen");
        ::exit(1);
    }

    // Stat isn't strictly part of the standard C library, 
    // but it's in every libc I've ever seen for a hosted system.
    struct stat st;
    if (::fstat(::fileno(fp), &st) == (-1)) {
        ::perror("fstat");
        ::exit(1);
    }

    // You could simply allocate a buffer here and use std::string_view, or
    // even allocate a buffer and copy it to a std::string. Creating a
    // std::string and setting its size is simplest, but will pointlessly
    // initialize the buffer to 0. You can't win sometimes.
    std::string str;
    str.reserve(st.st_size + 1U);
    str.resize(st.st_size);
    ::fread(str.data(), 1, st.st_size, fp);
    str[st.st_size] = '\0';
    ::fclose(fp);
}

这个解决方案似乎并不比其他一些解决方案更糟，而且（在实践中）完全可移植。当然，也可以抛出异常而不是立即退出。调整std::string的大小总是0初始化它，这让我非常恼火，但无可奈何。

请注意，这仅适用于C++17及更高版本。早期版本（应该）禁止编辑std::string::data()。如果使用早期版本，请考虑将str.data()替换为&str[0]。

- Martin Cote · Answer 2

您可以使用“std::getline”函数，并将“eof”指定为分隔符。但是，生成的代码有点晦涩难懂：

std::string data;
std::ifstream in( "test.txt" );
std::getline( in, data, std::string::traits_type::to_char_type( 
                  std::string::traits_type::eof() ) );

- Andrew · Answer 3

从多个地方获取信息...这应该是最快和最好的方法：

#include <filesystem>
#include <fstream>
#include <string>

//Returns true if successful.
bool readInFile(std::string pathString)
{
  //Make sure the file exists and is an actual file.
  if (!std::filesystem::is_regular_file(pathString))
  {
    return false;
  }
  //Convert relative path to absolute path.
  pathString = std::filesystem::weakly_canonical(pathString);
  //Open the file for reading (binary is fastest).
  std::wifstream in(pathString, std::ios::binary);
  //Make sure the file opened.
  if (!in)
  {
    return false;
  }
  //Wide string to store the file's contents.
  std::wstring fileContents;
  //Jump to the end of the file to determine the file size.
  in.seekg(0, std::ios::end);
  //Resize the wide string to be able to fit the entire file (Note: Do not use reserve()!).
  fileContents.resize(in.tellg());
  //Go back to the beginning of the file to start reading.
  in.seekg(0, std::ios::beg);
  //Read the entire file's contents into the wide string.
  in.read(fileContents.data(), fileContents.size());
  //Close the file.
  in.close();
  //Do whatever you want with the file contents.
  std::wcout << fileContents << L" " << fileContents.size();
  return true;
}

这将宽字符读入std::wstring中，但如果您只想要常规字符和std::string，您可以轻松适应。

- Xavier · Answer 4

为了性能，我还没有找到比下面的代码更快的东西。

std::string readAllText(std::string const &path)
{
    assert(path.c_str() != NULL);
    FILE *stream = fopen(path.c_str(), "r");
    assert(stream != NULL);
    fseek(stream, 0, SEEK_END);
    long stream_size = ftell(stream);
    fseek(stream, 0, SEEK_SET);
    void *buffer = malloc(stream_size);
    fread(buffer, stream_size, 1, stream);
    assert(ferror(stream) == 0);
    fclose(stream);
    std::string text((const char *)buffer, stream_size);
    assert(buffer != NULL);
    free((void *)buffer);
    return text;
}

- Sergey Abbakumov · Answer 5

你可以使用我开发的rst C++库来实现这个功能：

#include "rst/files/file_utils.h"

std::filesystem::path path = ...;  // Path to a file.
rst::StatusOr<std::string> content = rst::ReadFile(path);
if (content.err()) {
  // Handle error.
}

std::cout << *content << ", " << content->size() << std::endl;

- Ritesh Saha · Answer 6

#include <string>
#include <fstream>

int main()
{
    std::string fileLocation = "C:\\Users\\User\\Desktop\\file.txt";
    std::ifstream file(fileLocation, std::ios::in | std::ios::binary);

    std::string data;

    if(file.is_open())
    {
        std::getline(file, data, '\0');

        file.close();
    }
}

- hanshenrik · Answer 7

这是我使用的函数，并且在处理大文件（1GB+）时，由于某些原因，当你知道文件大小时，std::ifstream::read() 比 std::ifstream::rdbuf() 更快。所以整个“先检查文件大小”的事实上是一种速度优化。

#include <string>
#include <fstream>
#include <sstream>
std::string file_get_contents(const std::string &$filename)
{
    std::ifstream file($filename, std::ifstream::binary);
    file.exceptions(std::ifstream::failbit | std::ifstream::badbit);
    file.seekg(0, std::istream::end);
    const std::streampos ssize = file.tellg();
    if (ssize < 0)
    {
        // can't get size for some reason, fallback to slower "just read everything"
        // because i dont trust that we could seek back/fourth in the original stream,
        // im creating a new stream.
        std::ifstream file($filename, std::ifstream::binary);
        file.exceptions(std::ifstream::failbit | std::ifstream::badbit);
        std::ostringstream ss;
        ss << file.rdbuf();
        return ss.str();
    }
    file.seekg(0, std::istream::beg);
    std::string result(size_t(ssize), 0);
    file.read(&result[0], std::streamsize(ssize));
    return result;
}

- Paul Sumpner · Answer 8

#include <string>
#include <sstream>

using namespace std;

string GetStreamAsString(const istream& in)
{
    stringstream out;
    out << in.rdbuf();
    return out.str();
}

string GetFileAsString(static string& filePath)
{
    ifstream stream;
    try
    {
        // Set to throw on failure
        stream.exceptions(fstream::failbit | fstream::badbit);
        stream.open(filePath);
    }
    catch (system_error& error)
    {
        cerr << "Failed to open '" << filePath << "'\n" << error.code().message() << endl;
        return "Open fail";
    }

    return GetStreamAsString(stream);
}

用法：

const string logAsString = GetFileAsString(logFilePath);

- kiroma · Answer 9

一个基于CTT解决方案的更新函数：

#include <string>
#include <fstream>
#include <limits>
#include <string_view>
std::string readfile(const std::string_view path, bool binaryMode = true)
{
    std::ios::openmode openmode = std::ios::in;
    if(binaryMode)
    {
        openmode |= std::ios::binary;
    }
    std::ifstream ifs(path.data(), openmode);
    ifs.ignore(std::numeric_limits<std::streamsize>::max());
    std::string data(ifs.gcount(), 0);
    ifs.seekg(0);
    ifs.read(data.data(), data.size());
    return data;
}

有两个重要的区别：

tellg()不能保证返回从文件开头算起的字节偏移量。正如Puzomor Croatia所指出的那样，它更像是一个令牌，可以在fstream调用中使用。gcount()则确实返回最后提取的未格式化字节数量。因此，我们打开文件，使用ignore()提取并丢弃其所有内容以获取文件大小，并基于此构建输出字符串。

其次，我们通过直接写入字符串来避免将文件数据从std::vector<char>复制到std::string。

就性能而言，这应该是绝对最快的方法，提前分配适当大小的字符串并调用read()一次。有趣的是，在gcc上使用ignore()和countg()而不是ate和tellg()编译成几乎相同的东西，一位一位地。

- Dumbo · Answer 10

对于小到中等大小的文件，我使用以下方法，它们非常快速。其中返回字符串的方法可以用于将字节数组“转换”为字符串。

auto read_file_bytes(std::string_view filepath) -> std::vector<std::byte> {
    std::ifstream ifs(filepath.data(), std::ios::binary | std::ios::ate);

    if (!ifs)
        throw std::ios_base::failure("File does not exist");

    auto end = ifs.tellg();
    ifs.seekg(0, std::ios::beg);

    auto size = std::size_t(end - ifs.tellg());

    if (size == 0) // avoid undefined behavior
        return {};

    std::vector<std::byte> buffer(size);

    if (!ifs.read((char *) buffer.data(), buffer.size()))
        throw std::ios_base::failure("Read error");

    return buffer;
}

auto read_file_string(std::string_view filepath) -> std::string {
    auto bytes = read_file_bytes(filepath);
    return std::string(reinterpret_cast<char *>(bytes.begin().base()), bytes.size());
}