“ANSI”是Notepad++所使用的编码方式，那么Ruby中应该如何称呼它？

Question

“ANSI”是Notepad++所使用的编码方式，那么Ruby中应该如何称呼它？

rubycharacter-encodingnotepad++diacriticscodepages

30

我有一堆 .txt 文件，Notepad++ 在其下拉菜单 "编码" 中显示它们是 "ANSI"。

它们里面有德语字符 [äöüß]，在 Notepad++ 中能够正确显示。

但当我使用 File.read 'this is a German text example.txt' 命令在 irb 中打开时，这些字符无法正确显示。

因此，有没有人知道应该给 Encoding.default_external= 函数传递什么参数呢？

(我假设这是解决方法，对吗?)

如果设置为 'utf-8' 或 'cp850'，则会将包含 "äöüß" 的 "ANSI" 文件读取为 "\xE4\xF6\xFC\xDF" ...

(请毋需犹豫地在答案中提及显然的事情，我几乎完全没有经验，只能够稍稍了解并提出这个问题。)

- Owen_AR

这取决于您的操作系统语言环境。对于德语或英语，它是Windows-1252。尽管Notepad++可能不遵循此规则，只是将其用作Windows-1252的别名。它肯定不是任何ISO编码。请参见http://en.wikipedia.org/wiki/Windows_ANSI_code_page#ANSI_code_page。 - Esailija

谢谢，我认为它是cp1252编码，没错。 - Owen_AR

3个回答

9

我在Notepad++论坛上找到了这个问题的答案，由CChris在2010年回答，他似乎是权威人士。

问题：编码ANSI？

答案：

那将是您计算机的系统代码页（代码页0）。

更多信息：

显示您当前的代码页。

>help chcp
Displays or sets the active code page number.

CHCP [nnn]

  nnn   Specifies a code page number.

Type CHCP without a parameter to display the active code page number.

>chcp
Active code page: 437

代码页标识符

Identifier  .NET Name  Additional information
437         IBM437     OEM United States

- Love and peace - Joe Codeswell

4

我认为它是'cp1252'，别名为'windows-1252'。

阅读Jörg的答案后，我回到ruby-doc.org上的Encoding页面，尝试找到他提到的特定编码的参考资料，这时我发现了Encodings.aliases方法。

所以我在这个答案的末尾编写了这个方法。

然后我在notepad++中查看输出，将其视为'ANSI'和utf-8，并将其与irb中的输出进行比较...

我只能在irb输出中找到两个地方，其中utf-8文件以与在notepad++中将其视为'ANSI'时完全相同的方式损坏，而这些地方是cp1252和cp1254。

cp1252显然是我的'文件系统'编码，所以我选择了它。

我编写了一个脚本来复制所有转换为utf-8的文件，尝试从1252和1254中选择。 utf-8正则表达式目前似乎可以处理两组文件。

现在我必须尝试记住在遇到所有这些编码问题之前我实际上想要实现什么目标。xD

def compare_encodings file1, file2
    file1_probs = []
    file2_probs = []

    txt = File.open('encoding_test_output.txt','w')

    Encoding.aliases.sort.each do |k,v|
        Encoding.default_external=k
        ename = [k.downcase, v.downcase].join "  ---  "
        s = ""
        begin
            s << "#{File.read(file1)}" 
        rescue
            s << "nope nope nope"
            file1_probs << ename
        end
        s << "\t| #{ename} |\t"
        begin
            s << "#{File.read(file2)}"
        rescue
            s << "nope nope nope"
            file2_probs << ename
        end
        Encoding.default_external= 'utf-8'
        txt.puts s.center(58)
        puts s.center(58)
    end
    puts
    puts "file1, \"#{file1}\" exceptions from trying to convert to:\n\n"
    puts file1_probs
    puts
    puts "file2, \"#{file2}\" exceptions from trying to convert to:\n\n"
    puts file2_probs
    txt.close
end

compare_encodings "utf-8.txt", "np++'ANSI'.txt"

- Owen_AR

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Jörg W Mittag · Accepted Answer

他们可能指的是ISO/IEC 8859-1（又称为Latin-1）、ISO-8859-1、ISO/IEC 8859-15（又称为Latin-9）或Windows-1252（又称为CP 1252）。这4种编码都将ä放在位置0xE4。