GCC 4.7源字符编码和执行字符编码对于字符串字面值有什么影响?

15

Linux/x86_64上的GCC 4.7是否有默认字符编码来验证和解码C源文件中字符串文字的内容? 这可以配置吗?

此外,当将字符串数据从字符串文字链接到输出的数据段时,是否有默认执行字符编码? 这可以配置吗?

在任何配置中,是否可能有源字符编码与执行字符编码不同的情况?(也就是说,gcc是否会在字符编码之间转换?)

1个回答

16

我不知道这些选项实际上有多好用(目前没有使用它们;我仍然更喜欢将字符串字面值视为“仅限ASCII”,因为本地化字符串来自外部文件,所以它们主要是格式字符串或文件名之类的东西),但它们已经添加了一些选项,例如

-fexec-charset=charset
Set the execution character set, used for string and character constants. The default
is UTF-8. charset can be any encoding supported by the system's iconv library routine. 

-fwide-exec-charset=charset
Set the wide execution character set, used for wide string and character constants.
The default is UTF-32 or UTF-16, whichever corresponds to the width of wchar_t. As
with -fexec-charset, charset can be any encoding supported by the system's iconv
library routine; however, you will have problems with encodings that do not fit
exactly in wchar_t.

-finput-charset=charset
Set the input character set, used for translation from the character set of the
input file to the source character set used by GCC. If the locale does not specify,
or GCC cannot get this information from the locale, the default is UTF-8. This can
be overridden by either the locale or this command line option. Currently the command
line option takes precedence if there's a conflict. charset can be any encoding
supported by the system's iconv library routine. 

1
我想知道当源代码和可执行文件编码为默认的UTF-8时,它是否会验证字符串字面量为格式良好的UTF-8,并在其中包含无效字节序列时引发错误 - 还是只允许无效字节通过。 - Andrew Tomazos
@AndrewTomazos 我也非常感兴趣。你最终确定它是否执行了这个验证吗? - alrav
@alrav,由于这不是标准定义的内容,所以可能会在下一个编译器版本中自由更改。我不会相信某个人10年前提出的答案。 - Mark Ransom

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接