解析多个日志文件以查找字符串。

Question

解析多个日志文件以查找字符串。

3

我正在尝试解析日志目录中的多个日志文件，以搜索列表中任意数量的字符串和服务器名称。我感觉已经尝试了无数个选项，并且只使用一个日志文件运行良好..但是，当我尝试遍历目录中的所有日志文件时，似乎一点进展也没有。

if args.f:
    logs = args.f
else:
    try:
        logs = glob("/var/opt/cray/log/p0-current/*")
    except IndexError:
        print "Something is wrong. p0-current is not available."
        sys.exit(1)

valid_errors = ["error", "nmi", "CATERR"]

logList = []
for log in logs:
    logList.append(log)



#theLog = open("logList")
#logFile = log.readlines()
#logFile.close()
#printList = []

#for line in logFile:
#    if (valid_errors in line):
#        printList.append(line)
#
#for item in printList:
#    print item


#    with open("log", "r") as tmp_log:

#       open_log = tmp_log.readlines()
#           for line in open_log:
#               for down_nodes in open_log:
#                   if valid_errors in open_log:
#                       print valid_errors

down_nodes 是一个预先填好的列表，它在脚本中更高级别地包含了一些被标记为离线的服务器。

有一些我正在尝试的不同方案已被注释掉。

logList = []
for log in logs:
    logList.append(log)

我认为将每个单独的日志文件放入一个列表中，然后循环遍历该列表并使用open()接着使用readlines()可能是前进的方向，但我在这里缺少某种逻辑...也许我没有正确地思考。

我真的需要一些指针，请帮忙。

谢谢。

- jonnybinthemix

1

logs，由glob返回的已经是一个列表。 - Thierry Lathuille

2个回答

1

首先，您需要找到所有日志：

import os
import fnmatch

def find_files(pattern, top_level_dir):
    for path, dirlist, filelist in os.walk(top_level_dir):
        for name in fnmatch.filter(filelist, pattern)
            yield os.path.join(path, name)

例如，要在当前目录中查找所有*.txt文件：

txtfiles = find_files('*.txt', '.')

然后从这些名称中获取文件对象：

def open_files(filenames):
    for name in filenames:
        yield open(name, 'r', encoding='utf-8')

最终从文件中提取的单独行：

def lines_from_files(files):
    for f in files:
        for line in f:
            yield line

由于您想找到一些错误，因此检查可能如下所示：

import re

def find_errors(lines):
    pattern = re.compile('(error|nmi|CATERR)')
    for line in lines:
        if pattern.search(line):
            print(line)

现在您可以处理从给定目录生成的一系列行：

txt_file_names = find_files('*.txt', '.')
txt_files = open_files(txt_file_names)
txt_lines = lines_from_files(txt_files)
find_errors(txt_lines)

将日志处理为数据流的想法起源于David Beazley的演讲。

- JanHak

1

啊，这很有趣，谢谢。我在另一个脚本中使用了re.compile、re.match等，我想知道它是否在这里有用。但是我认为，由于我想显示整个日志行，而不是给定日志行的部分，所以只需将行转储出来会更容易。基本上，我要做的就是让脚本找到宕机的服务器，然后自动检查一些东西，看看它们是否因为常见原因而宕机...然后在屏幕上显示明显的问题。 - jonnybinthemix

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- wpercy · Accepted Answer

所以你最后的for循环是多余的，因为logs已经是一个字符串列表。有了这个信息，我们可以遍历logs并针对每个log执行某些操作。

for log in logs:
    with open(log) as f:
        for line in f.readlines():
            if any(error in line for error in valid_errors):
                #do stuff

这行代码 if any(error in line for error in valid_errors): 检查 line 是否包含 valid_errors 中的任何一个错误。语法使用了生成器，每次生成器会为 valid_errors 中的每个 error 生成一次 error。

关于涉及 down_nodes 的问题，我认为你不应该将其包含在同一个 any() 中。你可以尝试类似以下的写法：

if any(error in line for error in valid_errors) and \
    any(node in line for node in down_nodes):