使用select(2)和缓冲I / O的文件句柄组合是否安全？

Question

使用select(2)和缓冲I / O的文件句柄组合是否安全？

perliobuffer

5

我正在使用IO::Select来跟踪读取的变量数量的文件句柄。我遇到的文档强烈建议不要将select语句与<>（readline）结合使用来读取文件句柄。

我的情况是：

我每个文件句柄只会使用一次，即当select提供给我文件句柄时，它将被完全使用，然后从select中删除。我将接收一个哈希和可变数量的文件。如果这需要阻塞一段时间，我也不介意。

为了更好地理解，我是一个客户端，发送要由服务器处理的信息。每个文件句柄都是我与不同服务器通信的句柄。一旦服务器完成，每个服务器都会向我发送一个哈希结果。哈希中包含一个数字，表示要跟随的文件数。

我希望使用readline来与现有项目代码集成，以传输Perl对象和文件。

示例代码:

my $read_set = IO::Select()->new;
my $count = @agents_to_run; #array comes as an argument

for $agent ( @agents_to_run ) { 
    ( $sock, my $peerhost, my $peerport ) 
        = server($config_settings{ $agent }->
            { 'Host' },$config_settings{ $agent }->{ 'Port' };
    $read_set->add( $sock );

}

while ( $count > 0) {
    my @rh_set = IO::Select->can_read();

    for my $rh ( @{ $rh_set } ) {

            my %results = <$rh>;
            my $num_files = $results{'numFiles'};
            my @files = ();
            for (my i; i < $num_files; i++) {
                    $files[i]=<$rh>;
            }                 
            #process results, close fh, decrement count, etc
    }
}

- Lomky

你有一些示例代码可以展示你所做的吗？ - TLP

增加了一个我正在尝试做的示例。 - Lomky

1

{ my $oldfh = select $rh; $| = 1; select $oldfh; } 对于读取句柄是无用的。这是一件好事，因为如果 Perl 按照您所希望的在每次读取后清空缓冲区，您将会丢失数据！ - ikegami

2个回答

1

经过与@ikegami的讨论，我们确定在我极为特殊的情况下，readline实际上并不是一个问题。我仍然将ikegami的答案作为被接受的正确答案，因为它是处理一般情况的最佳方式，而且写得非常好。

由于以下事实，我的情况下Readline（又名<>）是可以接受的：

句柄只从select语句中返回一次，然后关闭/删除
我只通过文件句柄发送一条消息
我不关心读取句柄是否阻塞
我正在考虑来自select的超时和关闭句柄返回（错误检查未包含在上面的示例代码中）

- Lomky

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- ikegami · Accepted Answer

使用readline(又称为<>)有两个问题: 它是带缓冲的，而且它是阻塞的。

缓冲区是有问题的

更准确地说，使用无法检查的缓冲区是有问题的。

系统可以进行所有缓冲，因为您可以使用select查看其缓冲区。

Perl的IO系统不能进行任何缓冲，因为您无法查看它的缓冲区。

让我们来看一个使用readline在select循环中会发生什么的示例。

"abc\ndef\n"到达句柄。
select通知您有数据可供读取。
readline将尝试从句柄读取一部分。
"abc\ndef\n"将被放置在Perl的句柄缓冲区中。
readline将返回"abc\n"。

此时，您再次调用select，并希望它让您知道有更多内容可供读取("def\n")。但是，select将报告没有可读取的内容，因为select是系统调用，数据已经从系统中读取。这意味着您必须等待更多内容才能读取"def\n"。

以下程序说明了这一点：

use IO::Select qw( );
use IO::Handle qw( );

sub producer {
    my ($fh) = @_;
    for (;;) {
        print($fh time(), "\n") or die;
        print($fh time(), "\n") or die;
        sleep(3);
    }
}

sub consumer {
    my ($fh) = @_;
    my $sel = IO::Select->new($fh);
    while ($sel->can_read()) {
        my $got = <$fh>;
        last if !defined($got);
        chomp $got;
        print("It took ", (time()-$got), " seconds to get the msg\n");
    }
}

pipe(my $rfh, my $wfh) or die;
$wfh->autoflush(1);
fork() ? producer($wfh) : consumer($rfh);

输出：

It took 0 seconds to get the msg
It took 3 seconds to get the msg
It took 0 seconds to get the msg
It took 3 seconds to get the msg
It took 0 seconds to get the msg
...

这可以通过使用非缓冲的IO来解决：

sub consumer {
    my ($fh) = @_;
    my $sel = IO::Select->new($fh);
    my $buf = '';
    while ($sel->can_read()) {
        sysread($fh, $buf, 64*1024, length($buf)) or last;
        while ( my ($got) = $buf =~ s/^(.*)\n// ) {
            print("It took ", (time()-$got), " seconds to get the msg\n");
        }
    }
}

输出：

It took 0 seconds to get the msg
It took 0 seconds to get the msg
It took 0 seconds to get the msg
It took 0 seconds to get the msg
It took 0 seconds to get the msg
It took 0 seconds to get the msg
...

阻塞是不好的

让我们通过在 select 循环中使用 readline 的示例来看看会发生什么。

"abcdef" 到达句柄。
select 通知您有数据可读。
readline 将尝试从套接字读取一块数据。
"abcdef" 将放置在 Perl 的句柄缓冲区中。
readline 还没有收到换行符，因此它会尝试从套接字读取另一块数据。
当前没有更多可用数据，因此它会被阻塞。

这违反了使用 select 的目的。

[ 演示代码即将到来 ]

解决方案

您必须实现一个不会阻塞且只使用您可以检查的缓冲区的版本的 readline。第二部分很容易，因为您可以检查创建的缓冲区。

为每个句柄创建一个缓冲区。
当数据从句柄中到达时，只读取数据。当有数据等待时（从 select 中得知），sysread 将返回可用的数据而不等待更多数据到达。这使得 sysread 对于此任务非常适用。
将读取的数据附加到相应的缓冲区中。
对于缓冲区中的每个完整消息，提取并处理它。

添加一个句柄：

$select->add($fh);
$clients{fileno($fh)} = {
    buf  => '',
    ...
};

select 循环：

use experimental qw( refaliasing declared_refs );

while (my @ready = $select->can_read) {
    for my $fh (@ready) {
        my $client = $clients{fileno($fh)};
        my \$buf = \($client->{buf});  # Make $buf an alias for $client->{buf}

        my $rv = sysread($fh, $buf, 64*1024, length($buf));
        if (!$rv) {
            delete $clients{fileno($fh)};
            $sel->remove($fh);

            if (!defined($rv)) {
                ... # Handle error
            }
            elsif (length($buf)) {
                ... # Handle eof with partial message
            }
            else {
                ... # Handle eof
            }

            next;
        }

        while ( my ($msg) = $buf =~ s/^(.*)\n// )
            ... # Process message.
        }
    }
}

顺便提一下，使用线程会更容易实现这个功能，而且这甚至没有处理写入者的情况！

注意，如果您正在与子进程通信，IPC::Run 可以为您完成所有繁重的工作，并且异步 IO 可以用作 select 的替代方案。