我想了解例如多台(3+)电脑的ldd依赖关系列表,并将它们相互比较并突出显示区别。例如,如果我有以下字典:
my_ldd_outputs = {
01:"<ldd_output>",
02:"<ldd_output>",
...
09:"<ldd_output>",
10:"<ldd_output>"
}
我希望输出的结果看起来像这样。
<identical line 1>
<identical line 2>
<identical line 3>
<differing line 4> (computer 01 02)
<differing line 4> (computer 04 05 06 07)
<differing line 4> (computer 08 09 10)
<identical line 5>
<identical line 6>
...
我的第一次尝试涉及Python difflib,我的想法是首先获得一个数据结构,在该数据结构中,所有来自上述my_ldd_outputs
字典的ldd_output
列表(只是用\n
分割的结果)具有相同的长度,并且任何存在于另一个ldd_output
字符串中的缺失行都将添加上一个字符串。因此,如果两个文件看起来像这样:
ldd_1 = """
<identical line 1>
<identical line 2>
<differing line 3>
<identical line 4>
<extra line 5>
<identical line 6>
"""
ldd_2 = """
<identical line 1>
<identical line 2>
<differing line 3>
<identical line 4>
<identical line 6>
"""
我的目标是将这些文件存储为
ldd_1 = """
<identical line 1>
<identical line 2>
<differing line 3>
<identical line 4>
<extra line 5>
<identical line 6>
"""
ldd_2 = """
<identical line 1>
<identical line 2>
<differing line 3>
<identical line 4>
<None>
<identical line 6>
"""
最终只需要迭代转换后文件的每一行(现在它们都有相同的长度),并比较它们之间的差异,忽略任何<None>
条目,以便可以连续打印差异。
我创建了一个使用Python difflib
的函数,用<None>
字符串填充其他文件中缺失的行。然而,我不确定如何扩展该函数以包含任意数量的diffs。
def generate_diff(file_1, file_2):
#differing hashvalues from ldd can be ignored, we only care about version and path
def remove_hashvalues(input):
return re.sub("([a-zA-Z0-9_.-]{32}\/|\([a-zA-Z0-9_.-]*\))", "<>", input)
diff = [line.strip() for line in difflib.ndiff(remove_hashvalues(base).splitlines(keepends=True),remove_hashvalues(file_2).splitlines(keepends=True))]
list_1 = []
list_2 = []
i = 0
while i<len(diff):
if diff[i].strip():
if diff[i][0:2]=="- ":
lost = []
gained = []
while diff[i][0:2]=="- " or diff[i][0:2]=="? ":
if diff[i][0:2]=="- ": lost.append(diff[i][1:].strip())
i+=1
while diff[i][0:2]=="+ " or diff[i][0:2]=="? ":
if diff[i][0:2]=="+ ": gained.append(diff[i][1:].strip())
i+=1
while len(lost) != len(gained):
lost.append("<None>") if len(lost)<len(gained) else gained.insert(0,"<None>")
list_1+=lost; list_2+=gained
elif diff[i][0:2]=="+ ":
list_1.append("<None>"); list_2.append(diff[i][1:].strip())
if not diff[i][0:2]=="? ":
list_1.append(diff[i].strip()); list_2.append(diff[i].strip())
i+=1
return list_1, list_2
我也发现了这个工具,可以比较多个文件,但不幸的是它并不适用于比较代码。
编辑:我调整了@AyoubKaanich 的解决方案建议,创建了一个更简化的版本,能够实现我想要的功能:
from collections import defaultdict
import re
def transform(input):
input = re.sub("([a-zA-Z0-9_.-]{32}\/|\([a-zA-Z0-9_.-]*\))", "<>", input) # differing hashvalues can be ignored, we only care about version and path
return sorted(input.splitlines())
def generate_diff(outputs: dict):
mapping = defaultdict(set)
for target, output in outputs.items():
for line in transform(output):
mapping[line.strip()].add(target)
result = []
current_line = None
color_index = 0
for line in sorted(mapping.keys()):
if len(outputs) == len(mapping[line]):
if current_line: current_line = None
result.append((line))
else:
if current_line != line.split(" ")[0]:
current_line = line.split(" ")[0]
color_index+=1
result.append((f"\033[3{color_index%6+1}m{line}\033[0m",mapping[line]))
return result
唯一的缺点是这并不适用于字符串在任意部分变化而不仅仅是开头的情况,
difflib
专注于检测开头变化。但是对于ldd
的情况,由于依赖项始终在首位列出,因此按字母顺序排序并取字符串的第一节即可奏效。