我该如何在处理完一个元素后从Perl数组中删除它？

Question

我该如何在处理完一个元素后从Perl数组中删除它？

perlarraysparsing

3

我正在将一个postfix邮件日志文件读入数组中，并循环遍历以提取消息。在第一次通过时，我会检查“to=”行是否匹配并获取消息ID。构建MSGIDs数组后，我会再次循环遍历该数组以提取有关to=、from=和client=行的信息。

我想做的是，在从数组中提取数据后立即删除一行，以使处理速度更快（即少了一行要检查的内容）。

有什么建议吗？这是使用Perl编写的。

编辑：gbacon下面的答案足以让我开始使用可靠的解决方案。这是其核心部分：

my %msg;
while (<>) {
    my $line = $_;
    if (s!^.*postfix/\w+\[.+?\]: (\w+):\s*!!) {
            my $key = $1;
            push @{ $msg{$key}{$1} } => $2
                    while /\b(to|from|client|size|nrcpt)=<?(.+?)(?:>|,|\[|$)/g;
    }
    if ($line =~ s!^(\w+ \d+ \d+:\d+:\d+)\s(\w+.*)\s+postfix/\w+\[.+?\]: (\w+):\s*removed!!) {
            my $key = $3;
            push @{ $msg{$key}{date} } => $1;
            push @{ $msg{$key}{server} } => $2;
    }
}

use Data::Dumper;
$Data::Dumper::Indent = 1;
print Dumper \%msg;

我相信第二个正则表达式可以更加出色，但它已经能够完成我需要的工作。现在，我可以对所有消息进行哈希处理，并提取我感兴趣的那些。

感谢所有回答我的人。

- Justin ᚅᚔᚈᚄᚒᚔ

在我看来，哈希表可能是处理这个问题的更好方式？这样你就不必在迭代时显式地检查匹配项。你可以简单地使用“to=”行作为键。 - Vivin Paliath

6个回答

4

从数组中间移除元素是一项昂贵的操作，因此这不会真正加快处理速度。

更好的选择：

一次性完成所有操作
在构建ID数组时，包含指针（实际上是索引）到主数组中，这样你可以快速访问给定ID的元素

- Eli Bendersky

1

为什么不这样做：

my @extracted = map  extract_data($_), 
                grep msg_rcpt_to( $rcpt, $_ ), @log_data;

当你完成后，你将拥有一个按照日志中出现顺序提取的数据数组。

- daotoad

0

操纵数组内容的常见方法：

# start over with this list for each example:
my @list = qw(a b c d);

splice:

splice @list, 2, 1, qw(e);
# @list now contains: qw(a b e d)

pop 和 unshift：

pop @list;
# @list now contains: qw(a b c)

unshift @list;
# @list now contains: qw(b c d)

映射:

@list = map { $_ eq 'b' ? () : $_ } @list;
# list now contains: qw(a c d);

数组切片:

@list[3..4] = qw(e f);
# list now contais: qw(a b c e f);

for和foreach循环：

foreach (@list)
{
    # $_ is aliased to each element of the list in turn;
    # assignments will be propogated back to the original structure
    $_ = uc if m/[a-c]/;
}
# list now contains: qw(A B C d);

阅读有关所有这些函数的内容 perldoc perlfunc, 切片在 perldoc perldata, 和 for循环在 perldoc perlsyn.

- Ether

0

在Perl中，您可以使用splice()例程从数组中删除元素。

在循环遍历数组时，需要小心从数组中删除元素，因为数组的索引会发生变化。

- Ken Aspeslagh

0

假设您手头有索引，请使用splice：

splice(@array, $indextoremove, 1)

但要小心。一旦您删除一个元素，您的索引将变得无效。

- Vivin Paliath

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Greg Bacon · Accepted Answer

一次性完成：

#! /usr/bin/perl

use warnings;
use strict;

# for demo only
*ARGV = *DATA;

my %msg;
while (<>) {
  if (s!^.*postfix/\w+\[.+?\]: (\w+):\s*!!) {
    my $key = $1;
    push @{ $msg{$key}{$1} } => $2
      while /\b(to|from|client)=(.+?)(?:,|$)/g;
  }
}

use Data::Dumper;
$Data::Dumper::Indent = 1;
print Dumper \%msg;
__DATA__
Apr  8 14:22:02 MailSecure03 postfix/smtpd[32388]: BA1CE38965: client=mail.example.com[x.x.x.x]
Apr  8 14:22:03 MailSecure03 postfix/cleanup[32070]: BA1CE38965: message-id=<49dc4d9a.6020...@example.com>
Apr  8 14:22:03 MailSecure03 postfix/qmgr[19685]: BA1CE38965: from=<mailt...@example.com>, size=1087, nrcpt=2 (queue active)
Apr  8 14:22:04 MailSecure03 postfix/smtp[32608]: BA1CE38965: to=<us...@test.com>, relay=127.0.0.1[127.0.0.1]:10025, delay=1.7, delays=1/0/0/0.68, dsn=2.0.0, status=sent (250 OK, sent 49DC509B_360_15637_162D8438973)
Apr  8 14:22:04 MailSecure03 postfix/smtp[32608]: BA1CE38965: to=<us...@test.com>, relay=127.0.0.1[127.0.0.1]:10025, delay=1.7, delays=1/0/0/0.68, dsn=2.0.0, status=sent (250 OK, sent 49DC509B_360_15637_162D8438973)
Apr  8 14:22:04 MailSecure03 postfix/qmgr[19685]: BA1CE38965: removed
Apr  8 14:22:04 MailSecure03 postfix/smtpd[32589]: 62D8438973: client=localhost.localdomain[127.0.0.1]
Apr  8 14:22:04 MailSecure03 postfix/cleanup[32080]: 62D8438973: message-id=<49dc4d9a.6020...@example.com>
Apr  8 14:22:04 MailSecure03 postfix/qmgr[19685]: 62D8438973: from=<mailt...@example.com>, size=1636, nrcpt=2 (queue active)
Apr  8 14:22:04 MailSecure03 postfix/smtp[32417]: 62D8438973: to=<us...@test.com>, relay=y.y.y.y[y.y.y.y]:25, delay=0.19, delays=0.04/0/0.04/0.1, dsn=2.6.0, status=sent (250 2.6.0  <49dc4d9a.6020...@example.com> Queued mail for delivery)
Apr  8 14:22:04 MailSecure03 postfix/smtp[32417]: 62D8438973: to=<us...@test.com>, relay=y.y.y.y[y.y.y.y]:25, delay=0.19, delays=0.04/0/0.04/0.1, dsn=2.6.0, status=sent (250 2.6.0  <49dc4d9a.6020...@example.com> Queued mail for delivery)
Apr  8 14:22:04 MailSecure03 postfix/qmgr[19685]: 62D8438973: removed

该代码首先寻找队列ID（例如上面的BA1CE38965和62D8438973），我们将其存储在$key中。

接下来，我们查找当前行上所有匹配项（由于使用了 /g 开关），这些匹配项看起来像是to=<...>，client=mail.example.com等，有时会有分隔逗号，有时没有。

模式中值得注意的是：

\b - 仅匹配单词边界（防止匹配xxxto=<...>）
(to|from|client) - 匹配to、from或client
(.+?) - 使用非贪婪量化符匹配字段值
(?:,|$) - 匹配逗号或字符串末尾，但不捕获到$3中

非贪婪量化符(.+?) 强制匹配停止于遇到的第一个逗号，而不是最后一个。否则，在一行中：

to=<foo@example.com>, other=123

你会得到<foo@example.com>, other=123作为收件人！

然后，对于每个匹配的字段，我们将其push到一个数组的末尾（因为可能有多个收件人），这个数组与队列ID和字段名都有关联。看一下结果：

$VAR1 = {
  '62D8438973' => {
    'client' => [
      'localhost.localdomain[127.0.0.1]'
    ],
    'to' => [
      '<us...@test.com>',
      '<us...@test.com>'
    ],
    'from' => [
      '<mailt...@example.com>'
    ]
  },
  'BA1CE38965' => {
    'client' => [
      'mail.example.com[x.x.x.x]'
    ],
    'to' => [
      '<us...@test.com>',
      '<us...@test.com>'
    ],
    'from' => [
      '<mailt...@example.com>'
    ]
  }
};

现在假设你想打印出队列ID为BA1CE38965的消息的所有收件人：

my $queueid = "BA1CE38965";
foreach my $recip (@{ $msg{$queueid}{to} }) {
  print $recip, "\n":
}

也许您只想知道有多少收件人：

print scalar @{ $msg{$queueid}{to} }, "\n";

如果您愿意假定每个消息只有一个客户端，请使用以下方式访问该消息：

print $msg{$queueid}{client}[0], "\n";