并发编程：如何安全地共享缓存哈希？

Question

并发编程：如何安全地共享缓存哈希？

6

在print-columns中，仅保护写访问是否足够？还是必须保护读访问？

#!/usr/bin/env raku

my @w_list_items;
len_items_list( <car tree house mouse dog mountain snow roof> );
@w_list_items.say;

sub len_items_list ( @list ) {
   my Int $threads = Kernel.cpu-cores;
   while $threads > @list.elems {
       last if $threads < 2;
       $threads = $threads div 2;
   }

   my Int $size = @list.elems div $threads;
   my Array @portions = ( ^$threads ).map: { [ $size * $_, $size * ( $_ + 1 ) ] };
   @portions[*-1][1] = @list.elems;

   my Promise @promise;
   for @portions -> $range {
       my Int %cache;
       @promise.push: start {
           do for $range[0] ..^ $range[1] -> $i {
               my $len = print-columns( @list[$i], %cache );
               $i, $len;
           }
       };
   }

   @w_list_items = ();
   for await @promise -> @portion {
       for @portion {
           @w_list_items[.[0]] := .[1];
       }
   }
}

sub print-columns( $str, %cache? ) returns Int {
   my Int $width = 0;
   for $str.Str.NFC {
       if %cache.EXISTS-KEY( $_ ) {
           $width = $width + %cache.AT-KEY( $_ );
       }
       else {
           $width = $width + %cache.BIND-KEY( $_, char_width( $_ ) );
       }
   }
   $width;
}

sub char_width ( $cp ) {
   # dummy code:
   return 1;
}

- sid_com

1

主要问题在于列表太小，无法从并行化中获得有意义的结果。这可能比它的价值更昂贵。其次，如果您正在处理列表，最好查看hyper参数，这些参数正是与列表一起使用的。但第三个也是最重要的，最好不要在线程之间共享任何类型的东西；您正在共享％cache。如果需要共享信息，则最好设置通道，并通过无状态函数进行并行工作。 - jjmerelo

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Jonathan Worthington · Accepted Answer

如果哈希可以被直接写入（就像这里的情况一样），那么读取也必须受到保护。

考虑到示例中具体的情况，我可能会考虑几种解决方案（并根据我偏好的权衡来选择）。

每个线程缓存

如果不希望缓存增长得很大（以键数和计算值大小为标准），我可能让每个线程都建立自己的本地缓存。这意味着线程之间没有同步成本，因此它们不会相互阻塞。权衡是更多的内存使用量，因此对CPU缓存施加更大压力。

这是一个简单的更改；只需将声明移动到一个范围内即可：

       @promise.push: start {
           my Int %cache;
           do for $range[0] ..^ $range[1] -> $i {
               my $len = print-columns( @list[$i], %cache );
               $i, $len;
           }
       };

由于没有共享，因此很容易就可以推断出这种方法的正确性！

保护哈希的访问

使用OO::Monitors模块将缓存功能封装在一个类中，并使其在每个方法调用周围自动获取锁：

use OO::Monitors;

monitor Cache {
    has Int %!cache{Int};

    method lookup(Int $key --> Int) {
        %!cache{$key}
    }

    method add(Int $key, Int $count --> Nil) {
        %!cache{$key} := $count;
    }
}

创建它的一个实例：

       my Cache $cache .= new;
       @promise.push: start {
           do for $range[0] ..^ $range[1] -> $i {
               my $len = print-columns( @list[$i], $cache );
               $i, $len;
           }
       };

并使用它：

sub print-columns( $str, $cache ) returns Int {
   my Int $width = 0;
   for $str.Str.NFC -> $char {
       my $char-width;
       with $cache.lookup($char) {
           $char-width = $_;
       }
       else {
           $char-width = char_width($char);
           $cache.add($char, $char-width)
       }
       $width += $char-width;
   }
   $width;
}

这种方法不需要为每个线程分配缓存，从而减少了额外的内存开销，但代价是需要进行锁定和释放。 OO::Monitors 是一种结构化的使用 Lock 的方式 - 它在其实现中使用了 Lock。你也可以直接使用 Lock 构建解决方案，但这会增加难度。

使用不可变的 Map 并在缓存条目添加时复制它

另一种方法是，在与查找相比，缓存添加较少且缓存大小不太可能变得非常大时，使用不可变的 Map，并在缓存添加时创建一个带有新条目的新地图。不可变性意味着读取时不需要锁定。

使用像上一种方法中封装的缓存功能，可以通过将 Cache monitor 替换为以下类来尝试该方法：

class Cache {
    has $!current-cache = Map.new;

    method lookup(Int $key --> Int) {
        $!current-cache{$key}
    }

    method add(Int $key, Int $count --> Nil) {
        my $current = $!current-cache;
        unless $current{$key}:exists { # Another thread won
            $!current-cache = Map.new( (|$current, $key => $count) );
        }
    }
}

当然，这意味着会有更多的复制，但是胜利在于一旦缓存内容稳定，那么所有线程（以及CPU核心）正在处理的是相同的内存，而没有任何锁争用。甚至可以使用ASCII范围来预热缓存。

请注意，上述内容不是通常安全的并发安全基于复制的哈希的方法；至少它容易受到丢失更新的影响（但由于它是缓存，所以无所谓）。

使用无锁哈希

不幸的是，目前还没有人在Raku中实现过无锁哈希数据结构，但这将是在这种情况下使用的理想选择。我已经完成了Concurrent::Stack（无锁堆栈，大约是最简单的无锁数据结构）、Concurrent::Trie（无锁字典树，也很容易）、和Concurrent::Queue（已经足够难了，我从文献中选择了一个解决方案）。有关无锁哈希的论文，但它们不是好实现的东西！