假设我有一句文本:
$body = 'the quick brown fox jumps over the lazy dog';
我希望将该句子转化为“关键词”哈希表,但我想允许多个单词组成的关键词;以下是获取单个单词关键词的代码:
$words{$_}++ for $body =~ m/(\w+)/g;
完成后,我得到了一个类似以下的哈希表:
'the' => 2,
'quick' => 1,
'brown' => 1,
'fox' => 1,
'jumps' => 1,
'over' => 1,
'lazy' => 1,
'dog' => 1
下一步,为了得到两个单词的关键词,需要执行以下操作:
$words{$_}++ for $body =~ m/(\w+ \w+)/g;
但这只能得到每个“其他”对,结果如下所示:
'the quick' => 1,
'brown fox' => 1,
'jumps over' => 1,
'the lazy' => 1
我还需要一个单词的偏移量:
'quick brown' => 1,
'fox jumps' => 1,
'over the' => 1
有没有比下面更简单的方法?
my $orig_body = $body;
# single word keywords
$words{$_}++ for $body =~ m/(\w+)/g;
# double word keywords
$words{$_}++ for $body =~ m/(\w+ \w+)/g;
$body =~ s/^(\w+)//;
$words{$_}++ for $body =~ m/(\w+ \w+)/g;
$body = $orig_body;
# triple word keywords
$words{$_}++ for $body =~ m/(\w+ \w+ \w+)/g;
$body =~ s/^(\w+)//;
$words{$_}++ for $body =~ m/(\w+ \w+ \w+)/g;
$body = $orig_body;
$body =~ s/^(\w+ \w+)//;
$words{$_}++ for $body =~ m/(\w+ \w+ \w+)/g;