使用fgets和realloc()函数

Question

使用fgets和realloc()函数

3

我正在尝试使用fgets()从文本文件中读取一行并使用malloc()动态分配char*来存储它，但我不确定如何使用realloc()，因为我不知道这一行文本的长度，也不想猜测一个可能的最大大小。

#include "stdio.h"
#include "stdlib.h"
#define INIT_SIZE 50

void get_line (char* filename)

    char* text;
    FILE* file = fopen(filename,"r");

    text = malloc(sizeof(char) * INIT_SIZE);

    fgets(text, INIT_SIZE, file);

    //How do I realloc memory here if the text array is full but fgets
    //has not reach an EOF or \n yet.

    printf(The text was %s\n", text);

    free(text);

int main(int argc, char *argv[]) {
    get_line(argv[1]);
}

我计划对这行文本进行其他操作，但为了简单起见，我只是将其打印出来并释放了内存。

另外：使用文件名作为第一个命令行参数来启动主函数。

- Sphero

1

根据代码中的注释，您对初始缓冲区过短的情况感兴趣，是吗？ - undefined

看一下这个SO问题。我猜你基本上就是想要这个。 - undefined

读取无限长度的行的缺点是允许极端代理占用系统资源，不要成为黑客的目标。设置一个合理的上限是防御性编程。 - undefined

你是否希望有一种语言，只需简单地使用string line = Console.ReadLine()就能实现？ - undefined

3个回答

1

一种可能的解决方案是使用两个缓冲区：一个临时缓冲区用于调用fgets；另一个是重新分配的缓冲区，并将临时缓冲区附加到其末尾。

也许可以像这样实现：

char temp[INIT_SIZE];  // Temporary string for fgets call
char *text = NULL;     // The actual and full string
size_t length = 0;     // Current length of the full string, needed for reallocation

while (fgets(temp, sizeof temp, file) != NULL)
{
    // Reallocate
    char *t = realloc(text, length + strlen(temp) + 1);  // +1 for terminator
    if (t == NULL)
    {
        // TODO: Handle error
        break;
    }

    if (text == NULL)
    {
        // First allocation, make sure string is properly terminated for concatenation
        t[0] = '\0';
    }

    text = t;

    // Append the newly read string
    strcat(text, temp);

    // Get current length of the string
    length = strlen(text);

    // If the last character just read is a newline, we have the whole line
    if (length > 0 && text[length - 1] == '\n')
    {
        break;
    }
}

_{[免责声明：上述代码未经测试，可能存在错误]}

- Some programmer dude

1

声明 void get_line (char* filename) 后，你不能使用在 get_line 函数之外读取和存储的行，因为你没有返回指向行的指针，也没有传递任何指针的地址，这些指针可以用来在调用函数中进行任何分配和读取。

对于任何将未知数量的字符读入单个缓冲区的函数，一个很好的模型（显示返回类型和有用参数）总是 POSIX 的 getline。你可以使用 fgetc 或 fgets 和一个固定的缓冲区来实现自己的函数。效率上，使用 fgets 仅在最小化所需的 realloc 调用次数方面具有优势。（两个函数将共享相同的低级输入缓冲区大小，例如请参见 gcc 源码中的 IO_BUFSIZ 常量--如果我记得正确的话，它在最近更名后现在是 LIO_BUFSIZE，但基本上归结为 Linux 上的一个 8192 字节 IO 缓冲区和 Windows 上的 512 字节）

只要您使用malloc、calloc或realloc动态分配原始缓冲区，就可以使用fgets连续读取固定缓冲区中的内容，并将读取的字符添加到您分配的行中，检查最后一个字符是否为'\n'或EOF以确定何时完成。每次迭代都用fgets读取固定大小的字符缓冲区，并在进行时使用realloc重新分配您的行，将新字符附加到末尾。

在重新分配时，始终使用临时指针进行realloc。这样，如果内存不足，realloc返回NULL（或由于任何其他原因而失败），您不会使用NULL覆盖当前分配块的指针，从而创建内存泄漏。

一种灵活的实现方式是将固定缓冲区作为VLA进行大小调整，使用定义的SZINIT（如果用户传递0）或用户提供的大小来分配line的初始存储空间（作为指向指针的指针），然后根据需要重新分配，成功时返回读取的字符数，失败时返回-1（与POSIX getline相同）：

/** fgetline, a getline replacement with fgets, using fixed buffer.
 *  fgetline reads from 'fp' up to including a newline (or EOF)
 *  allocating for 'line' as required, initially allocating 'n' bytes.
 *  on success, the number of characters in 'line' is returned, -1
 *  otherwise
 */
ssize_t fgetline (char **line, size_t *n, FILE *fp)
{
    if (!line || !n || !fp) return -1;

#ifdef SZINIT
    size_t szinit = SZINIT > 0 ? SZINIT : 120;
#else
    size_t szinit = 120;
#endif

    size_t idx = 0,                 /* index for *line */
        maxc = *n ? *n : szinit,    /* fixed buffer size */
        eol = 0,                    /* end-of-line flag */
        nc = 0;                     /* number of characers read */
    char buf[maxc];     /* VLA to use a fixed buffer (or allocate ) */

    clearerr (fp);                  /* prepare fp for reading */
    while (fgets (buf, maxc, fp)) { /* continuall read maxc chunks */
        nc = strlen (buf);          /* number of characters read */
        if (idx && *buf == '\n')    /* if index & '\n' 1st char */
            break;
        if (nc && (buf[nc - 1] == '\n')) {  /* test '\n' in buf */
            buf[--nc] = 0;          /* trim and set eol flag */
            eol = 1;
        }
        /* always realloc with a temporary pointer */
        void *tmp = realloc (*line, idx + nc + 1);
        if (!tmp)       /* on failure previous data remains in *line */
            return idx ? (ssize_t)idx : -1;
        *line = tmp;    /* assign realloced block to *line */
        memcpy (*line + idx, buf, nc + 1);  /* append buf to line */
        idx += nc;                  /* update index */
        if (eol)                    /* if '\n' (eol flag set) done */
            break;
    }
    /* if eol alone, or stream error, return -1, else length of buf */
    return (feof (fp) && !nc) || ferror (fp) ? -1 : (ssize_t)idx;
}

(注意: 由于 nc 已经保存了 buf 中当前字符的数量，因此可以使用 memcpy 将 buf 的内容附加到 *line 中，而无需再次扫描终止的 nul-character) 请仔细查看并告诉我是否有进一步的问题。

基本上，您可以将其用作 POSIX getline 的替代品（虽然它不会像 POSIX getline 一样高效，但也不算太差）。

- David C. Rankin

当罕见的输入错误发生时，例如在 while (fgets ... 的第二个循环中，预期结果为 -1 时 fgetline() 返回了一个非 -1 的值。也许在循环之前加上 clearerr()，并将 (feof(fp) && !nc) 改为 (feof(fp) && !nc) || ferror(fp)。 - undefined

嗯... clearerr() 是个不错的选择，可以确保流处于可尝试读取的状态。但是我还在为第二次循环输入错误这种特殊情况感到困惑。也就是说，我们进入了一个非EOF的流错误状态，导致 feof (fp) 的测试结果为假，并且在第一次循环中没有读取任何字符，或者已经读取了字符但出现了非EOF的流错误。我还在努力理清这个问题。我测试了空文件、空字符串、空字符串加换行符等等... 但仍然没有找到那个角落。无论如何，这两个都是很好的补充。 - undefined

由于输入错误，罕见的返回可能发生在任何读取操作中。当发生输入错误时，标准C库函数会返回NULL/0/EOF，即使之前的某些输入是成功的。其中一些函数（全部？）也会在设置错误标志之前返回相同的值。按照这个模式，如果第一个fgets()成功读取了一个没有'\n'的缓冲区，但第二个fgets()导致输入错误，则预期的返回值将为-1。所有这些“输入错误”处理都是特定领域的代码，可能超出了OP目前的能力范围，但很高兴听到您对此感兴趣。 - undefined

1

哦，是的。没有比纠结于用户输入的细微差别更好的时间了。如果深入研究，你不得不开始查看各种libc源码，只为了弄清楚它是如何处理的，以及是否在那里进行处理。 - undefined

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Mathieu · Accepted Answer

您需要的是 getline 函数。

使用方法如下：

char *line = NULL;
size_t n;
getline(&line, &n, stdin);

如果您真的想自己实现此功能，可以编写类似以下内容的代码：

#include <stdlib.h>
#include <stdio.h>

char *get_line()
{
    int c;
    /* what is the buffer current size? */
    size_t size = 5;
    /* How much is the buffer filled? */
    size_t read_size = 0;
    /* firs allocation, its result should be tested... */
    char *line = malloc(size);
    if (!line) 
    {
        perror("malloc");
        return line;
    }

    line[0] = '\0';

    c = fgetc(stdin);
    while (c != EOF && c!= '\n')
    {            
        line[read_size] = c;            
        ++read_size;
        if (read_size == size)
        {
            size += 5;
            char *test = realloc(line, size);
            if (!test)
            {
                perror("realloc");
                return line;
            }
            line = test;
        }
        c = fgetc(stdin);
    }
    line[read_size] = '\0';
    return line;
}