在C语言中将文本文件读入行数组

Question

在C语言中将文本文件读入行数组

carraysfiletext

6

使用C语言，我想以一种方式读取文本文件的内容，使得最终得到一个字符串数组，其中第n个字符串表示文本文件的第n行。文件的行可以任意长。

有没有一种优雅的方法来实现这个功能？我知道一些巧妙的技巧可以直接将文本文件读入单个适当大小的缓冲区中，但将其分解成行会使它更加棘手（至少据我所知）。

非常感谢！

- Zach Conn

6个回答

1

可以通过循环fgets读取文件的行数，然后创建一个二维数组，第一维是行数+1。然后，将文件重新读入数组中。

不过，您需要定义元素的长度。或者，计算最长行的大小。

示例代码：

inFile = fopen(FILENAME, "r");
lineCount = 0;
while(inputError != EOF) {
    inputError = fscanf(inFile, "%s\n", word);
    lineCount++;
}
fclose(inFile);
  // Above iterates lineCount++ after the EOF to allow for an array
  // that matches the line numbers

char names[lineCount][MAX_LINE];

fopen(FILENAME, "r");
for(i = 1; i < lineCount; i++)
    fscanf(inFile, "%s", names[i]);
fclose(inFile);

- Hyppy

0

你可以使用这种方式

#include <stdlib.h> /* exit, malloc, realloc, free */
#include <stdio.h>  /* fopen, fgetc, fputs, fwrite */

struct line_reader {
    /* All members are private. */
    FILE    *f;
    char    *buf;
    size_t   siz;
};

/*
 * Initializes a line reader _lr_ for the stream _f_.
 */
void
lr_init(struct line_reader *lr, FILE *f)
{
    lr->f = f;
    lr->buf = NULL;
    lr->siz = 0;
}

/*
 * Reads the next line. If successful, returns a pointer to the line,
 * and sets *len to the number of characters, at least 1. The result is
 * _not_ a C string; it has no terminating '\0'. The returned pointer
 * remains valid until the next call to next_line() or lr_free() with
 * the same _lr_.
 *
 * next_line() returns NULL at end of file, or if there is an error (on
 * the stream, or with memory allocation).
 */
char *
next_line(struct line_reader *lr, size_t *len)
{
    size_t newsiz;
    int c;
    char *newbuf;

    *len = 0;           /* Start with empty line. */
    for (;;) {
        c = fgetc(lr->f);   /* Read next character. */
        if (ferror(lr->f))
            return NULL;

        if (c == EOF) {
            /*
             * End of file is also end of last line,
        `    * unless this last line would be empty.
             */
            if (*len == 0)
                return NULL;
            else
                return lr->buf;
        } else {
            /* Append c to the buffer. */
            if (*len == lr->siz) {
                /* Need a bigger buffer! */
                newsiz = lr->siz + 4096;
                newbuf = realloc(lr->buf, newsiz);
                if (newbuf == NULL)
                    return NULL;
                lr->buf = newbuf;
                lr->siz = newsiz;
            }
            lr->buf[(*len)++] = c;

            /* '\n' is end of line. */
            if (c == '\n')
                return lr->buf;
        }
    }
}

/*
 * Frees internal memory used by _lr_.
 */
void
lr_free(struct line_reader *lr)
{
    free(lr->buf);
    lr->buf = NULL;
    lr->siz = 0;
}

/*
 * Read a file line by line.
 * http://rosettacode.org/wiki/Read_a_file_line_by_line
 */
int
main()
{
    struct line_reader lr;
    FILE *f;
    size_t len;
    char *line;

    f = fopen("foobar.txt", "r");
    if (f == NULL) {
        perror("foobar.txt");
        exit(1);
    }

    /*
     * This loop reads each line.
     * Remember that line is not a C string.
     * There is no terminating '\0'.
     */
    lr_init(&lr, f);
    while (line = next_line(&lr, &len)) {
        /*
         * Do something with line.
         */
        fputs("LINE: ", stdout);
        fwrite(line, len, 1, stdout);
    }
    if (!feof(f)) {
        perror("next_line");
        exit(1);
    }
    lr_free(&lr);

    return 0;
}

- Jeegar Patel

0

对于 C（而不是 C++），你可能最终会使用 fgets()。但是，由于您的任意长度行，您可能会遇到问题。

- Amber

0

也许使用链表是最好的方法？编译器不喜欢一个没有大小概念的数组。使用链表，你可以处理非常大的文本文件，而不必担心为数组分配足够的内存。

不幸的是，我还没有学会如何使用链表，但也许其他人可以帮助你。

- jonescb

链表的任意长度是一个很吸引人的特性，但代价是你放弃了随机访问。比如你不能直接获取第5行，必须先获取0-4行。但是使用链表作为中间结构是个好主意，你可以轻松地构建数组。 - Dan Olson

很遗憾，在这种情况下，由于我在问题中忽略了一些细节（简而言之，我需要随机访问），链表并不是非常适合。当然，我可以把所有内容读入链表，然后将其复制到数组中，但我希望有一种更优雅的方法。 - Zach Conn

0

如果您有一种好的方法将整个文件读入内存，那么您就已经接近成功了。在完成这个步骤之后，您可以扫描文件两次。第一次计算行数，第二次设置行指针并替换 '\n'（如果文件是以 Windows 二进制模式读取，则可能还需要替换 '\r'）为 '\0'。在两次扫描之间分配一个指针数组，现在您已经知道需要多少个指针了。

- Bill Forster

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Dan Cristoloveanu · Accepted Answer

7

将其分成行意味着解析文本并使用0替换所有EOL（我指的是\n和\r）字符。通过这种方式，您实际上可以重复使用缓冲区，并将每行的开头仅存储到单独的char *数组中（只需进行2次遍历即可完成所有操作）。

通过这种方式，您可以为整个文件大小+2个解析执行一次读取，这可能会提高性能。

- Dan Cristoloveanu

这绝对是最好的方法，尽管可能需要对整个文件进行多次处理。您需要计算行数（以便可以分配正确大小的数组），将 \n 替换为 0，然后将每行的开头分配到数组中的正确位置。当然，您可以在两次处理中完成此操作。 - Dan Olson

一个非常好的想法。我要试一试。 - Zach Conn

+1 不计算从文件复制到缓冲区的初始副本，您可以使用 realloc() 和 strtok() 进行单次遍历。 - pmg

1

为什么需要两次扫描？使用malloc()分配数组所需的空间。从缓冲区开始。对于每个'\n'，替换为0，并将下一个字符的地址放入数组中。跟踪数组大小；如果它即将溢出，则使用realloc()重新分配空间。 - David Thornley

1

这是一次遍历。使用fseek/ftell查找文件大小。动态分配内存，一次性读取文件。进行一次遍历，在新行位置放置NUL，使其成为字符串。在遍历文件时，将每行的开头push_back。 - EvilTeach

显示剩余2条评论