使用Perl解析Apache日志

Question

使用Perl解析Apache日志

5

更新于2013年5月10日

好的，现在我已经可以轻松筛选出IP地址了。下面的三件事情是我想做的，我原以为使用sort($keys)就可以解决问题，但我错了，然后尝试下面稍微复杂一点的方法也似乎不是解决方案。下一件我需要完成的事情是收集日期和浏览器版本。我将提供我的日志文件格式和当前的代码示例。

APACHE日志

24.235.131.196 - - [10/Mar/2004:00:57:48 -0500] "GET http://www.google.com/iframe.php HTTP/1.0" 500 414 "http://www.google.com/iframe.php" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"

我的代码

#!usr/bin/perl -w
use strict;

my %seen = ();
open(FILE, "< access_log") or die "unable to open file  $!";    

while( my $line = <FILE>) {
    chomp $line;

    # regex for ip address.
    if( $line =~ /(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/ ) {  
        $seen{$1}++;
    }

    #regex for date an example is [09\Mar\2009:05:30:23]
    if( $line =~ /\[[\d]{2}\\.*[\d]{4}\:[\d]{2}\:[\d]{2}\]*/) {
        print "\n\n $line matched : $_\n";
    }

}
close FILE;
my $i = 0;

# program bugs out if I uncomment the below line, 
# but to my understanding this is essentially what I'm trying to do.
# for my $key ( keys %seen ) (keys %date) {
for my $key ( keys %seen ) {
    my ($ip) = sort {$a cmp $b}($key); 
    # also I'd like to be able to sort the IP addresses and if 
    # I do it the proper numeric way it generates errors saying contents are not numeric. 
    print @$ip->[$i] . "\n";
    # print "The IPv4 address is : $key and has accessed the server $seen{$key} times. \n";
    $i++;
}

- user1739860

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- chrsblck · Accepted Answer

您很接近答案了。是的，我会使用哈希表（hash）。它通常被称为“已查看哈希表”。

#!usr/bin/perl 

use warnings;
use strict;

my $log = "web.log";
my %seen = ();

open (my $fh, "<", $log) or die "unable to open $log: $!"; 

while( my $line = <$fh> ) {
    chomp $line;

    if( $line =~ /(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/ ){
        $seen{$1}++;
    }
}
close $fh;

for my $key ( keys %seen ) {
    print "$key: $seen{$key}\n";
}

以下是一份带有输出的样本日志文件：

$ cat web.log 
[Mon Sep 21 02:35:24 1999] some msg blah blah
[Mon Sep 21 02:35:24 1999] 192.1.1.1
[Mon Sep 21 02:35:24 1999] 1.1.1.1
[Mon Sep 21 02:35:24 1999] 10.1.1.9
[Mon Sep 21 02:35:24 1999] 192.1.1.1
[Mon Sep 21 02:35:24 1999] 10.1.1.5
[Mon Sep 21 02:35:24 1999] 10.1.1.9
[Mon Sep 21 02:35:24 1999] 192.1.1.1
$ test.pl
1.1.1.1: 1
192.1.1.1: 3
10.1.1.9: 2
10.1.1.5: 1

需要注意的几点：

my @array = <FH>; 这将把整个文件存储在内存中，这不是一个好主意。特别是对于日志文件来说，在没有适当进行轮换的情况下，它们会变得非常大。使用for或foreach也会有同样的问题。从文件读取最佳实践是使用while。

你应该养成使用3参数词法范围的open的习惯，就像我上面的示例一样。

你的die语句不应该太"精确"。看看我的die消息。因为原因可能是权限、不存在、被锁定等。

更新

这将适用于你的日期。

my $line = '[09\Mar\2009:05:30:23]: plus some message';

#example is [09\Mar\2009:05:30:23]
if( $line =~ /(\[[\d]{2}\\.*\\[\d]{4}:[\d]{2}:[\d]{2}:[\d]{2}\])/ ){
   print "$line matched: $1\n"; 
}

更新2

你做错了几件事情。

我没有看到你将东西存储到日期哈希表中。

print "\n\n $line matched : $_\n";

应该看起来像您的seen hash，但这样做没有太多意义。您试图对存储的日期数据做什么？

$data{$1} = "some value, which is up to you";

在一个 for 循环中，不能循环遍历两个 hashes。

for my $foo (keys %h)(keys %h2) { # do stuff }

最后的排序部分，你只需要sort keys即可。

for my $key (sort keys %seen ) {