如何在Perl中自动生成的Unicode字符范围中跳过保留的Unicode字符？

Question

如何在Perl中自动生成的Unicode字符范围中跳过保留的Unicode字符？

3

我已经编写了一个Perl程序，在Perl中自动生成一系列Unicode字符。

#!/bin/perl -w

use strict;
use open qw/:std :encoding(UTF-8)/;

my ($beg, $end, $start, $finish, @chars);

print "Enter the beginning Unicode value of your Language's script: ";
chomp( $beg = <> );

print "Enter the last Unicode value of your Language's script: ";
chomp( $end = <> );

$beg =~ s/U\+(.*)/$1/;
$end =~ s/U\+(.*)/$1/;

$start  = hex($beg);
$finish = hex($end);

@chars = ( $start .. $finish );

foreach (@chars) {

    my $char = chr($_);

    next unless ($char);

    print "$char\n";
}

在使用值为U+0B80和U+0BFF运行此脚本时，我的输出是：

஀ ஁ ஂ ஃ ஄ அ ஆ இ ஈ உ ஊ ஋ ஌ ஍ எ ஏ ஐ ஑ ஒ ஓ ஔ க ஖ ஗ ஘ ங ச ஛ ஜ ஝ ஞ ட ஠ ஡ ஢ ண த ஥ ஦ ஧ ந ன ப ஫ ஬ ஭ ம ய ர ற ல ள ழ வ ஶ ஷ ஸ ஹ ஺ ஻ ஼ ஽ ா ி ீ ு ூ ௃ ௄ ௅ ெ ே ை ௉ ொ ோ ௌ ் ௎ ௏ ௐ ௑ ௒ ௓ ௔ ௕ ௖ ௗ ௘ ௙ ௚ ௛ ௜ ௝ ௞ ௟ ௠ ௡ ௢ ௣ ௤ ௥ ௦ ௧ ௨ ௩ ௪ ௫ ௬ ௭ ௮ ௯ ௰ ௱ ௲ ௳ ௴ ௵ ௶ ௷ ௸ ௹ ௺ ௻ ௼ ௽ ௾ ௿

所有这些盒子字符都是Unicode块中保留的空格。

我想删除所有这样的保留空格。有没有一种方法在perl中执行此操作？

代码行next unless($char)似乎行不通，因为即使预留空间似乎也有一个值（盒子字符）。

- One Face

3个回答

4

似乎您需要“未分配”类别：

next if $char =~ /\p{Unassigned}/;
# Or shorter:
next if $char =~ /\p{Cn}/;

- choroba

4

您还可以使用编译指示符charnames。

use charnames ();
use open qw/:std :encoding(UTF-8)/;

foreach (hex 'B80' .. hex 'B83' ) {
    next unless charnames::viacode($_);
    print chr $_;
}

输出：

ஂஃ

当您删除next时，它将变为：

஀஁ஂஃ

更新：我对Arunesh、choroba和我的答案中使用的三种技术进行了基准测试。charnames 显然表现不佳。

use charnames ();
use open qw/:std :encoding(UTF-8)/;
use Benchmark ':all';

cmpthese(
    '-2',
    {
        'charnames' => sub {
            foreach ( hex 'B80' .. hex 'BFF' ) {
                next unless charnames::viacode($_);
            }
        },
        'posix' => sub {
            foreach ( hex 'B80' .. hex 'BFF' ) {
                next unless ( chr($_) =~ /[[:print:]]/ );
            }
        },
        'unassigned' => sub {
            foreach ( hex 'B80' .. hex 'BFF' ) {
                next if ( chr($_) =~ /\p{Cn}/ );
            }
        },
    }
);

__END__
              Rate  charnames      posix unassigned
charnames   28.4/s         --      -100%      -100%
posix      27115/s     95239%         --       -14%
unassigned 31656/s    111205%        17%         --

- simbabque

1

我强烈建议不要使用这种技术。基准测试非常清楚。但是我会保留它，因为这是一个我不知道的功能，并且corelist说自Perl 5.6.0以来一直存在。 - simbabque

请注意，如果多次运行基准测试，则 posix 和 unassigned 的结果几乎相同。 - simbabque

2

如果您想使用 charnames::viacode 或 charnames::vianame，请执行 use charnames ();。 - ysth

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Arunesh Singh · Accepted Answer

您想打印只可见的字符。请参阅此处。

next unless ($char=~/[[:print:]]/);