使用PHP/正则表达式解析NGINX错误日志

5
错误条目看起来像这样:
2011/06/10 13:30:10 [error] 23263#0: *1 directory index of "/var/www/ssl/" is forbidden, client: 86.186.86.232, server: hotelpublisher.com, request: "GET / HTTP/1.1", host: "hotelpublisher.com"

我需要解析:

date/time
error type
error message
client
server
request
host

第一步(解析日期)很容易使用substr。尽管我的REGEX不太好,我希望能听到更好的解决方案。仅通过,分裂也不太可行,因为错误可能也包含逗号。

最有效的方法是什么?

5个回答

4

如何处理:

$str = '2011/06/10 13:30:10 [error] 23263#0: *1 directory index of "/var/www/ssl/" is forbidden, client: 86.186.86.232, server: hotelpublisher.com, request: "GET / HTTP/1.1", host: "hotelpublisher.com"';
preg_match('~^(?P<datetime>[\d+/ :]+) \[(?P<errortype>.+)\] .*?: (?P<errormessage>.+), client: (?P<client>.+), server: (?P<server>.+), request: (?P<request>.+), host: (?P<host>.+)$~', $str, $matches);
print_r($matches);

输出:

Array
(
    [0] => 2011/06/10 13:30:10 [error] 23263#0: *1 directory index of "/var/www/ssl/" is forbidden, client: 86.186.86.232, server: hotelpublisher.com, request: "GET / HTTP/1.1", host: "hotelpublisher.com"
    [datetime] => 2011/06/10 13:30:10
    [1] => 2011/06/10 13:30:10
    [errortype] => error
    [2] => error
    [errormessage] => *1 directory index of "/var/www/ssl/" is forbidden
    [3] => *1 directory index of "/var/www/ssl/" is forbidden
    [client] => 86.186.86.232
    [4] => 86.186.86.232
    [server] => hotelpublisher.com
    [5] => hotelpublisher.com
    [request] => "GET / HTTP/1.1"
    [6] => "GET / HTTP/1.1"
    [host] => "hotelpublisher.com"
    [7] => "hotelpublisher.com"
)

有一个小错误,请看 errortype,不过我猜那也可以工作。 - Gajus
@Guy:应该是什么错误类型?只需要在“#”之前的数字吗? - Toto
它可以是错误/通知/警告等,它是方括号中的文本[] - Gajus
我在正则表达式中进行了一些小改进:const nginxError = "^(?P<time>[\d+/ :]+) \[(?P<severity>.+)\] .*?: (?P<message>.+), client: (?P<client>.+), server: (?P<server>.+), request: "(?P<method>\S+) (?P<path>\S+) (?P<version>.+?)", host: "(?P<host>.+)"$" - mcuadros

2
这是我做到的方法。
$error      = array();

$error['date']          = strtotime(substr($line, 0, 19));

$line                   = substr($line, 20);
$error_str              = explode(': ', strstr($line, ', client:', TRUE), 2);

$error['message']       = $error_str[1];

preg_match("|\[([a-z]+)\] (\d+)#(\d+)|", $error_str[0], $matches);

$error['error_type']    = $matches[1];


$args_str   = explode(', ', substr(strstr($line, ', client:'), 2));
$args       = array();

foreach($args_str as $a)
{
    $name_value = explode(': ', $a, 2);

    $args[$name_value[0]]   = trim($name_value[1], '"');
}

$error  = array_merge($error, $args);

die(var_dump( $error ));

这将会产生:

array(7) {
  ["date"]=>
  int(1307709010)
  ["message"]=>
  string(50) "*1 directory index of "/var/www/ssl/" is forbidden"
  ["error_type"]=>
  string(5) "error"
  ["client"]=>
  string(13) "86.186.86.232"
  ["server"]=>
  string(18) "hotelpublisher.com"
  ["request"]=>
  string(14) "GET / HTTP/1.1"
  ["host"]=>
  string(18) "hotelpublisher.com"
}

我想看到一些投票,以了解关于性能/可靠性的首选选项。


2
请尝试这段代码:
$str = '2011/06/10 13:30:10 [error] 23263#0: *1 directory index of "/var/www/ssl/" is forbidden, client: 86.186.86.232, server: hotelpublisher.com, request: "GET / HTTP/1.1", host: "hotelpublisher.com"';
preg_match('~^(\d{4}/\d{2}/\d{2}\s\d{2}:\d{2}:\d{2})\s\[([^]]*)\]\s[^:]*:\s(.*?)\sclient:\s([^,]*),\sserver:\s([^,]*),\srequest:\s"([^"]*)",\shost:\s"([^"]*)"~', $str, $m );
list($line, $dateTime, $type, $msg, $client, $server, $request, $host ) = $m;

var_dump($dateTime);
var_dump($type);
var_dump($msg);
var_dump($client);
var_dump($server);
var_dump($request);
var_dump($host);

输出

string(19) "2011/06/10 13:30:10"
string(5) "error"
string(60) "*1 directory index of "/var/www/ssl/" is forbidden,"
string(13) "86.186.86.232"
string(18) "hotelpublisher.com"
string(14) "GET / HTTP/1.1"
string(18) "hotelpublisher.com"

1
如果您无法访问格式化日志文件,则可以使用以下方法:
$regex = '~(\d{4}/\d{2}/\d{2}) (\d{2}:\d{2}:\d{2}) \[(\w+)\] (.*?) client: (\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}), server: (.*?), request: "(.*?)", host: "(.*?)"~';
preg_match($regex, $line, $matches);
list($all,$date,$time,$type,$message,$client,$server,$request,$host) = $matches;

如果您可以访问日志格式,将消息放在末尾而不是中间,然后您可以执行以下操作:
$log_arr = explode(', ', $line, 7);
list($date,$time,$type,$client,$server,$request,$host,$message) = $matches;

秘密在于explode有一个可选的第三个参数,用于限制要分割的元素数量。因此,将其设置为8,则该行的剩余部分将存储为返回数组中的最后一个元素。请参阅手册以获取更多信息。

0
请查看Nginx错误日志阅读器;这是一个用于解析Nginx错误日志文件的PHP阅读器/解析器。该脚本能够递归地读取错误日志并在用户友好的表格中显示它们。脚本配置包括每页要读取的字节数以及允许通过错误日志进行分页。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接