为什么我的Perl测试在使用encoding 'utf8'后失败？

Question

为什么我的Perl测试在使用encoding 'utf8'后失败？

9

我对这个测试脚本感到困惑：

#!perl

use strict;
use warnings;
use encoding 'utf8';
use Test::More 'no_plan';

ok('áá' =~ m/á/, 'ok direct match');

my $re = qr{á};
ok('áá' =~ m/$re/, 'ok qr-based match');

like('áá', $re, 'like qr-based match');

三个测试都失败了，但我本来期望use encoding 'utf8'会将字面上的áá和基于qr的正则表达式升级为utf8字符串，从而使测试通过。

如果我删除use encoding这行，测试会按预期通过，但我无法弄清楚为什么它们在utf8模式下会失败。

我正在使用Mac OS X（系统版本）上的Perl 5.8.8。

- melo

3个回答

2

在我的电脑上（使用perl 5.10），它可以正常工作。也许你应该尝试用use utf8替换use encoding 'utf8'。

你使用的是哪个版本的perl？我认为旧版本的正则表达式中存在UTF-8的bug。

- Leon Timmermans

我还将 use encoding 'utf8' 更改为 use utf8，这在 Linux 5.8.8 上对我起作用了。 - mpeters

2

Test::More文档中包含了解决这个问题的方法，我今天刚刚发现（并且这篇文章在谷歌搜索结果中排名较高）。

utf8 / "Wide character in print"

If you use utf8 or other non-ASCII characters with Test::More you might get a "Wide character in print" warning. Using binmode STDOUT, ":utf8" will not fix it. Test::Builder (which powers Test::More) duplicates STDOUT and STDERR. So any changes to them, including changing their output disciplines, will not be seem by Test::More. The work around is to change the filehandles used by Test::Builder directly.
my $builder = Test::More->builder;
binmode $builder->output,         ":utf8";
binmode $builder->failure_output, ":utf8";
binmode $builder->todo_output,    ":utf8";

我在我的测试代码中添加了这段样板代码，效果非常好。

- Michael

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Aristotle Pagaltzis · Accepted Answer

不要使用encoding pragma，它是有问题的。（Juerd Waalboer在YAPC :: EU 2k8上提到了这个问题）。

它同时做了至少两件不应该放在一起的事情：

它为您的源文件指定了编码。
它为您的文件输入输出指定了编码。

更让人气愤的是，它以错误的方式执行#1：重新解释\xNN序列为未解码的八位字节，而不是将其视为码位，并对其进行解码，导致您无法表达超出所指定编码范围外的字符，并且使您的源代码根据编码不同而意义不同。这就是惊人的错误。

只需使用ASCII或UTF-8编写源代码。在后一种情况下，使用utf8 pragma是正确的做法。如果您不想使用UTF-8，但是想包含非ASCII字符，请显式转义或解码它们。

并使用I/O层显式地设置它们，或使用open pragma自动正确转码I/O。