如何使用Perl将不可打印的ASCII字符转换为可读文本

3

我正在尝试使用Perl 5.28和Linux(Debian 8)在Linux设备上通过USB测试连接的一些探头。当我读取探头的大文件缓冲区时,经常会出现不可读的ASCII符号,如\0\x02。我想将这些符号转换为可读的标记文本。我已经编写了一个小的子例程,但对于需要测试每个条目的大型翻译列表来说,它似乎有点笨重。是否有更好的方法做到这一点?

示例脚本

#!/usr/bin/env perl -w

# test-escape.pl --- test none readable chars

use strict;

sub escBuf() {
    my $buf = shift;
    my @numNul = $buf =~ /\0/g;
    my @numCR  = $buf =~ /\r/g;
    $buf =~ s/\r/\n/g;
    $buf =~ s/\x00/<NUL>/g;
    $buf =~ s/\x01/<SOH>/g;
    $buf =~ s/\x02/<STX>/g;
    $buf =~ s/\x03/<ETX>/g;
    $buf =~ s/\x04/<EOT>/g;
    $buf =~ s/\x05/<ENQ>/g;
    $buf =~ s/\x06/<ACK>/g;
    $buf =~ s/\x07/<BEL>/g;
    $buf =~ s/\x08/<BS>/g;
    $buf =~ s/\x0B/<VT>/g;
    $buf =~ s/\x0C/<FF>/g;
    $buf =~ s/\x0E/<SO>/g;
    $buf =~ s/\x0F/<SI>/g;
    my $numNUL = @numNul;
    my $numCR  = @numCR;
    return ($buf, $numNUL, $numCR);
}

# Buffer example
my $buffer = "\x01\r\x02This is a test with\r\n ".
    "sometimes qiurks \0 inside \x0C stuff \0 and regular \x03\r\x04";

# Translate output 
my ($out, $numNUL, $numCR) = &escBuf($buffer); 

# Not printed correctly due to \0
# print "ORG.TEXT: '$buffer' \n\n";

# Result of the translation
print "ESC.TEXT: '$out' \n\n";
print "NUM.NUL:  $numNUL\n";
print "NUM.CR:   $numCR\n\n";

结果

/usr/bin/env perl -w "test-escape.pl"
ESC.TEXT: '<SOH>
<STX>This is a test with

 sometimes qiurks <NUL> inside <FF> stuff <NUL> and regular <ETX>
<EOT>' 

NUM.NUL:  2
NUM.CR:   3

编辑:采纳了ikegami提出的解决方案的代码

#!/usr/bin/env perl -w
# test-escape.pl --- test none readable chars

use strict;

# Dictionary of non printable signs
my %NONE_ASC_DICT = (
    "\x00" => "NUL", "\x01" => "SOH", "\x02" => "STX", "\x03" => "ETX",
    "\x04" => "EOT", "\x05" => "ENQ", "\x06" => "ACK", "\x07" => "BEL",
    "\x08" => "BS",
    # Essenital for parsing "\x09" => "TAB" "\x0a" => "LF" 
    "\x0b" => "VT",  "\x0c" => "FF", "\x0d" => "CR",
    "\x0e" => "SO",  "\x0f" => "SI",
    "\x10" => "DLE",
    "\x11" => "DC1", "\x12" => "DC2", "\x13" => "DC3", "\x14" => "DC4",
    "\x15" => "NAK", "\x16" => "SYN", "\x17" => "ETB", "\x18" => "CAN",
    "\x19" => "EM",  "\x1A" => "SUB", "\x1B" => "ESC", "\x1C" => "FS",
    "\x1D" => "GS",  "\x1E" => "RS",  "\x1F" => "US",  "\x7F" => "DEL",
);

# Mapping of the entries and corresponding predefined REGEX
my $NONE_ASC_CLASS = join "", map quotemeta, keys(%NONE_ASC_DICT);
my $NONE_ASC_REGEX = qr/([$NONE_ASC_CLASS])/;

# Translator subroutine
sub escBuffer() {
    my ($buf, $dict, $regex, $prefix, $suffix) = @_;

    # Set default sprefix suffix strings if not present
    $prefix   //= '<';  $suffix   //= '>';

    # Count the real quirks
    my @numNUL = $buf =~ /\0/g;
    my $numNUL = @numNUL;

    # Clean up mixed UNIX / DOS context
    $buf =~ s/\r\n/\n/g; 
    $buf =~ s/\r/\n/g;   # translate all remaining \r to \n 
    
    # Calc resulting number of lines
    my @numLF  = $buf =~ /\n/g; 
    my $numLF  = @numLF;

    # Translate the remaining non printables
    $buf =~ s/$regex/ $prefix.$dict->{$1}.$suffix /eg;

    # Result set translated buffer, count quirks, count lines
    return ($buf, $numNUL, $numCR);
}

# Buffer example
my $buffer = "\x01\r\x02This is a test with\r\n ".
    "sometimes qiurks \0 inside \x0C stuff \0 and regular \x03\r\x04";

# Translate output 
my ($out, $numNUL, $numLF) = &escBuffer
                               ($buffer, \%NONE_ASC_DICT, $NONE_ASC_REGEX); 

# Result of the translation
print "ESC.TEXT: '$out' \n\n";
print "NUM.NUL:  $numNUL\n";
print "NUM.LF:   $numLF\n\n";

2
s/([^[:print:]]|\\])/ sprintf("\\x%02x",ord($1)) /eg 将不可打印字符转换为十六进制,即 \xHH。不知道这是否足够满足您的需求。 - Steffen Ullrich
1个回答

2

使用表格。

设置:

my %map = (
   "\x00" => "<NUL>",
   ...,
);

my $class = join "", map quotemeta, keys(%map);
my $re = qr/([$class])/;

替换:

s/$re/$map{$1}/g

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接