使用Python中的difflib比较两个.txt文件

24
我正在尝试比较两个文本文件并输出第一个不匹配的字符串,但由于我对Python很陌生,所以遇到了困难。请问有人可以给我提供一种使用这个模块的示例方法吗?
当我尝试像下面这样做时:
result = difflib.SequenceMatcher(None, testFile, comparisonFile)

我收到一个错误提示,说对象的类型为 'file' 并且没有长度。

6个回答

35

首先,您需要将字符串传递给difflib.SequenceMatcher,而不是文件:

# Like so
difflib.SequenceMatcher(None, str1, str2)

# Or just read the files in
difflib.SequenceMatcher(None, file1.read(), file2.read())

那会修复你的错误。

要获取第一个不匹配的字符串,请参阅difflib文档


12
除了文档之外,你可以查看Doug Hellmann 号称的 Python 模块每周推荐之 difflib 条目:http://blog.doughellmann.com/2007/10/pymotw-difflib.html - mechanical_meat
2
@BlackVegetable 网页存档项目链接Python模块之周链接 - BarathVutukuri

10

这是一个使用Python difflib比较两个文件内容的快速示例...

import difflib

file1 = "myFile1.txt"
file2 = "myFile2.txt"

diff = difflib.ndiff(open(file1).readlines(),open(file2).readlines())
print ''.join(diff),

4
如何避免显示相同的行?我只想打印不同的行。 - JahMyst
2
@OlivierCervello import difflib, sys with open("a") as a: a_content = a.readlines() with open("b") as b: b_content = b.readlines() diff = difflib.unified_diff(a_content,b_content) print("***** 统一的差异 ************") print("行号"+'\t'+'文件1'+'\t'+'文件2') for i,line in enumerate(diff): if line.startswith("-"): print(i,'\t\t'+line) elif line.startswith("+"): print(i,'\t\t\t\t\t\t'+line) ' - kishorebjv

5

请问两个文件是否都存在?

我刚刚测试了一下,结果很完美。

获取结果的方法类似于:

import difflib

diff=difflib.ndiff(open(testFile).readlines(), open(comparisonFile).readlines())

try:
    while 1:
        print diff.next(),
except:
    pass

每行的第一个字符表示它们是否不同: 例如:'+' 表示接下来的行已添加,等等。


哎呀,你说得对,是个愚蠢的错误。但我仍然不确定如何从结果中获取所需的数据。我怎么知道它们是否不同?我该如何获取第一个不同的字符串?抱歉,问题有点多 :( - 101010110101

3

听起来你可能根本不需要使用difflib。如果你要逐行比较,请尝试像这样做:

test_lines = open("test.txt").readlines()
correct_lines = open("correct.txt").readlines()

for test, correct in zip(test_lines, correct_lines):
    if test != correct:
        print "Oh no! Expected %r; got %r." % (correct, test)
        break
else:
    len_diff = len(test_lines) - len(correct_lines)
    if len_diff > 0:
        print "Test file had too much data."
    elif len_diff < 0:
        print "Test file had too little data."
    else:
        print "Everything was correct!"

你不需要使用readlines,zip也可以与文件处理器一起使用。 - SilentGhost
如果文件有相同的行数但内容不同,这样做不会出错吗? - 101010110101

0

另一种更简单的方法是逐行检查两个文本文件是否相同。试试看。

fname1 = 'text1.txt'
fname2 = 'text2.txt'

f1 = open(fname1)
f2 = open(fname2)

lines1 = f1.readlines()
lines2 = f2.readlines()
i = 0
f1.seek(0)
f2.seek(0)
for line1 in f1:
    if lines1[i] != lines2[i]:
        print(lines1[i])
        exit(0)
    i = i+1

print("both are equal")

f1.close()
f2.close()

否则,Python中的filecmp模块中有一个预定义的文件可供使用。
import filecmp

fname1 = 'text1.txt'
fname2 = 'text2.txt'

print(filecmp.cmp(fname1, fname2))

:)


-1
# -*- coding: utf-8 -*-
"""
   

"""

def compare_lines_in_files(file1_path, file2_path):
    try:
        with open(file1_path, 'r', encoding='utf-8') as file1, open(file2_path, 'r', encoding='utf-8') as file2:
            lines_file1 = file1.readlines()
            lines_file2 = file2.readlines()

            mismatched_lines = []

            # Compare each line in file1 to all lines in file2
            for line_num, line1 in enumerate(lines_file1, start=1):
                line1 = line1.strip()  # Remove leading/trailing whitespace
                found_match = False

                for line_num2, line2 in enumerate(lines_file2, start=1):
                    line2 = line2.strip()  # Remove leading/trailing whitespace

                    # Perform a case-insensitive comparison
                    if line1.lower() == line2.lower():
                        found_match = True
                        break

                if not found_match:
                    mismatched_lines.append(f"Line {line_num} in File 1: '{line1}' has no match in File 2")

            # Compare each line in file2 to all lines in file1 (vice versa)
            for line_num2, line2 in enumerate(lines_file2, start=1):
                line2 = line2.strip()  # Remove leading/trailing whitespace
                found_match = False

                for line_num, line1 in enumerate(lines_file1, start=1):
                    line1 = line1.strip()  # Remove leading/trailing whitespace

                    # Perform a case-insensitive comparison
                    if line2.lower() == line1.lower():
                        found_match = True
                        break

                if not found_match:
                    mismatched_lines.append(f"Line {line_num2} in File 2: '{line2}' has no match in File 1")

            return mismatched_lines

    except FileNotFoundError:
        print("One or both files not found.")
        return []

# Paths to the two text files you want to compare
file1_path = r'C:\Python Space\T1.txt'
file2_path = r'C:\Python Space\T2.txt'

mismatched_lines = compare_lines_in_files(file1_path, file2_path)

if mismatched_lines:
    print("Differences between the files:")
    for line in mismatched_lines:
        print(line)
else:
    print("No differences found between the files.")

2
也许您可以帮助我们写出这段代码是如何工作的,以及它是如何帮助实现结果的。 - undefined
根据目前的写法,你的回答不够清晰。请编辑以添加更多细节,以帮助其他人理解这如何回答所提出的问题。你可以在帮助中心找到关于如何撰写好回答的更多信息。 - undefined

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接