用Perl正则表达式替换一些变音符号

Question

用Perl正则表达式替换一些变音符号

4

我希望能够替换文件中部分带变音符号的字符为它们对应的ASCII字符。请注意，我只想替换每行第一个"@"字符之前的变音符号，而不是全部删除。

在下面的文件示例（a.glo）中，需要将四个加粗的 "é" 替换为 "e"。我使用的正则表达式可能有些丑陋：

(\\glossaryentry\{(\w|\s|\.)*)(é|è|ê|ë|É|È|Ê|Ë|ē)+

它可以与像www.regex101.com/和notepad++这样的在线测试器一起使用!

但是，当我在Windows命令行中输入时，没有任何变化:

perl -pi -i.bak -e "s/(\\glossaryentry\{(\w|\s|\.)*)(é|è|ê|ë|É|È|Ê|Ë|ē)+/$1e/g" a.glo

(值得一提的是，在我的系统上，Perl的版本是5.20.2)

a.glo:

\glossaryentry{AHRF@ {\memgloterm{AHRF}}{\memglodesc{法国大革命历史年鉴}} {\memgloref{}}|memjustarg}{1}

\glossaryentry{Ass. plén.@ {\memgloterm{Ass. plén.}}{\memglodesc{全会}} {\memgloref{}}|memjustarg}{1}

\glossaryentry{Ch. réun.@ {\memgloterm{Ch. réun.}}{\memglodesc{联席会议}} {\memgloref{}}|memjustarg}{1}

\glossaryentry{chron.@ {\memgloterm{chron.}}{\memglodesc{编年史}} {\memgloref{}}|memjustarg}{1}

\glossaryentry{Circ. min.@ {\memgloterm{Circ. min.}}{\memglodesc{部长通告}} {\memgloref{}}|memjustarg}{1}

\glossaryentry{éd.@ {\memgloterm{éd.}}{\memglodesc{编辑, 由...编辑}} {\memgloref{}}|memjustarg}{1}

\glossaryentry{Int J Semiot Law@ {\memgloterm{Int J Semiot Law}}{\memglodesc{国际法律符号学杂志}} {\memgloref{}}|memjustarg}{1}

\glossaryentry{Oxford J Legal Studies@ {\memgloterm{Oxford J Legal Studies}}{\memglodesc{牛津法律研究杂志}} {\memgloref{}}|memjustarg}{1}

\glossaryentry{préc.@ {\memgloterm{préc.}}{\memglodesc{见上文}} {\memgloref{}}|memjustarg}{1}

\glossaryentry{Rev. adm.@ {\memgloterm{Rev. adm.}}{\memglodesc{行政审查杂志}} {\memgloref{}}|memjustarg}{1}

- Carg

1

请参阅如何在Perl中将带有重音符号、umlauts等的字母转换为它们的ASCII对应项？。您尝试过在命令行上使用单引号而不是双引号吗？ - Håkon Hægland

我在编程过程中遇到了错误（“'\s' is not recognized as an internal or external command”），其中包含单引号。是的，a.glo 是 UTF-8 编码。 - Carg

是的，也许在Windows上需要双引号。你尝试过使用utf8编译指示吗？在命令行中添加-Mutf8选项。 - Håkon Hægland

4

尝试将文件内容缩减为只包含两个字母'eé'，然后运行命令'perl -Mutf8 -pe "s/(é|è|ê|ë|É|È|Ê|Ë|ē)+/$1e/g" a.glo'。你得到了什么结果？ - Håkon Hægland

Malformed UTF-8 character (unexpected non-continuation byte 0x7c, immediately after start byte 0xcb) at -e line 1. ee├®

- Carg

显示剩余8条评论

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- user557597 · Accepted Answer

我在windows电脑上尝试过，它可以工作。
但我认为文件必须以正确的编码方式打开。
我把你的文本示例保存为ANSI文本格式。
使用下面的perl命令可以将a.glo文件中的所有包含特定字符的\glossaryentry标签替换为带有'e'的标签：
perl -pi -i.bak -e "s/(\glossaryentry\{[\w\s.]*)[\x{E9}\x{E8}\x{EA}\x{EB}\x{C9}\x{C8}\x{CA}\x{CB}\x{113}]+/$1e/g" a.glo

 # (\\glossaryentry\{[\w\s.]*)[\x{E9}\x{E8}\x{EA}\x{EB}\x{C9}\x{C8}\x{CA}\x{CB}\x{113}]+

 (                             # (1 start)
      \\ glossaryentry \{
      [\w\s.]* 
 )                             # (1 end)
 [\x{E9}\x{E8}\x{EA}\x{EB}\x{C9}\x{C8}\x{CA}\x{CB}\x{113}]+