如何在 Emacs 中配置,以便将所有符号在保存文件时投影到单个编码(例如 utf-8),对于带有混合编码的已损坏文件(例如 utf-8 和 latin-1)?
我编写了以下函数来自动化一些清理工作,但我想我可以在某处找到将一个编码中的符号“é”映射到utf-8中的“é”的信息,以改进这个函数(或者已经有人编写了这样的函数)。
(defun jyby/cleanToUTF ()
"Cleaning to UTF"
(interactive)
(progn
(save-excursion (replace-regexp "अ" ""))
(save-excursion (replace-regexp "आ" ""))
(save-excursion (replace-regexp "ॆ" ""))
)
)
(global-unset-key [f11])
(global-set-key [f11] 'jyby/cleanToUTF)
我有许多文件因为混合编码而“损坏”(由于从具有不良字体配置的浏览器复制粘贴而导致),生成下面的错误。有时我会通过手动查找和替换每个问题符号来清理它们,用“”或适当的字符来代替,或更快地指定“utf-8-unix”作为编码(这将提示下一次我编辑和保存文件时出现相同的消息)。 在任何这种受损文件中,任何重音字符都会被一个序列所取代,该序列在每次保存时加倍,最终使文件大小加倍。我正在使用GNU Emacs 24.2.1。
These default coding systems were tried to encode text
in the buffer `test_accents.org':
(utf-8-unix (30 . 4194182) (33 . 4194182) (34 . 4194182) (37
. 4194182) (40 . 4194181) (41 . 4194182) (42 . 4194182) (45
. 4194182) (48 . 4194182) (49 . 4194182) (52 . 4194182))
However, each of them encountered characters it couldn't encode:
utf-8-unix cannot encode these: ...
Click on a character (or switch to this window by `C-x o'
and select the characters by RET) to jump to the place it appears,
where `C-u C-x =' will give information about it.
Select one of the safe coding systems listed below,
or cancel the writing with C-g and edit the buffer
to remove or modify the problematic characters,
or specify any other coding system (and risk losing
the problematic characters).
raw-text emacs-mule no-conversion