我是否正确地认为这段代码引入了未定义的行为?
#include <stdio.h>
#include <stdlib.h>
FILE *f = fopen("textfile.txt", "rb");
fseek(f, 0, SEEK_END);
long fsize = ftell(f);
fseek(f, 0, SEEK_SET); //same as rewind(f);
char *string = malloc(fsize + 1);
fread(string, fsize, 1, f);
fclose(f);
string[fsize] = 0;
我问这个问题的原因是,这段代码被发布为一个已接受和高赞的答案,回答了以下问题:C编程:如何将整个文件内容读入缓冲区 但是,根据以下文章:如何在C++中将整个文件读入内存(尽管标题如此,但也涉及C语言,请跟我走):
Suppose you were writing C, and you had a
FILE*
(that you know points to a file stream, or at least a seekable stream), and you wanted to determine how many characters to allocate in a buffer to store the entire contents of the stream. Your first instinct would probably be to write code like this:
// Bad code; undefined behaviour fseek(p_file, 0, SEEK_END); long file_size = ftell(p_file);
Seems legit. But then you start getting weirdness. Sometimes the reported size is bigger than the actual file size on disk. Sometimes it’s the same as the actual file size, but the number of characters you read in is different. What the hell is going on?
There are two answers, because it depends on whether the file has been opened in text mode or binary mode.
Just in case you donlt know the difference: in the default mode – text mode – on certain platforms, certain characters get translated in various ways during reading. The most well-known is that on Windows, newlines get translated to
\r\n
when written to a file, and translated the other way when read. In other words, if the file containsHello\r\nWorld
, it will be read asHello\nWorld
; the file size is 12 characters, the string size is 11. Less well-known is that0x1A
(orCtrl-Z
) is interpreted as the end of the file, so if the file containsHello\x1AWorld
, it will be read asHello
. Also, if the string in memory isHello\x1AWorld
and you write it to a file in text mode, the file will beHello
. In binary mode, no translations are done – whatever is in the file gets read in to your program, and vice versa.Immediately you can guess that text mode is going to be a headache – on Windows, at least. More generally, according to the C standard:
The
ftell
function obtains the current value of the file position indicator for the stream pointed to by stream. For a binary stream, the value is the number of characters from the beginning of the file. For a text stream, its file position indicator contains unspecified information, usable by the fseek function for returning the file position indicator for the stream to its position at the time of the ftell call; the difference between two such return values is not necessarily a meaningful measure of the number of characters written or read.In other words, when you’re dealing with a file opened in text mode, the value that
ftell()
returns is useless… except in calls tofseek()
. In particular, it doesn’t necessarily tell you how many characters are in the stream up to the current point.So you can’t use the return value from
ftell()
to tell you the size of the file, the number of characters in the file, or for anything (except in a later call tofseek()
). So you can’t get the file size that way.Okay, so to hell with text mode. What say we work in binary mode only? As the C standard says: "For a binary stream, the value is the number of characters from the beginning of the file." That sounds promising.
And, indeed, it is. If you are at the end of the file, and you call
ftell()
, you will find the number of bytes in the file. Huzzah! Success! All we need to do now is get to the end of the file. And to do that, all you need to do isfseek()
withSEEK_END
, right?Wrong.
Once again, from the C standard:
Setting the file position indicator to end-of-file, as with
fseek(file, 0, SEEK_END)
, has undefined behavior for a binary stream (because of possible trailing null characters) or for any stream with state-dependent encoding that does not assuredly end in the initial shift state.To understand why this is the case: Some platforms store files as fixed-size records. If the file is shorter than the record size, the rest of the block is padded. When you seek to the “end”, for efficiency’s sake it just jumps you right to the end of the last block… possibly long after the actual end of the data, after a bunch of padding.
So, here’s the situation in C:
- You can’t get the number of characters with
ftell()
in text mode.- You can get the number of characters with
ftell()
in binary mode… but you can’t seek to the end of the file withfseek(p_file, 0, SEEK_END)
.
我没有足够的知识来判断谁是正确的,在此我想问一个问题,如果前面提到的接受的答案与本文章相冲突,请为我澄清。
malloc()
的返回值,如果它失败了,你将会有未定义行为。 - Sourav Ghosh