看这个脚本。
sdiff.py @ hungrysnake.net
http://hungrysnake.net/doc/software__sdiff_py.html
Perl的sdiff(Algorithm::Diff)不考虑“匹配率”,但是Python的sdiff.py考虑了。=)
我有两个文本文件。
$ cat text1.txt
aaaaaa
bbbbbb
cccccc
dddddd
eeeeee
ffffff
$ cat text2.txt
aaaaaa
bbbbbb
xxxxxxx
ccccccy
zzzzzzz
eeeeee
ffffff
$ sdiff text1.txt text2.txt
aaaaaa aaaaaa
bbbbbb bbbbbb
cccccc | xxxxxxx
dddddd | ccccccy
> zzzzzzz
eeeeee eeeeee
ffffff ffffff
Sdiff不考虑“匹配率”=(
我通过sdiff.py得到了它。
$ sdiff.py text1.txt text2.txt
--- text1.txt (utf-8)
+++ text2.txt (utf-8)
1|aaaaaa 1|aaaaaa
2|bbbbbb 2|bbbbbb
| > 3|xxxxxxx
3|cccccc | 4|ccccccy
4|dddddd < |
| > 5|zzzzzzz
5|eeeeee 6|eeeeee
6|ffffff 7|ffffff
[ ] | +
[ <- ] 3|cccccc
[ -> ] 4|ccccccy
Sdiff.py思考“匹配率”=)
我想要Sdiff.py的结果。你不想吗?
c
类代码。但是你可以很容易地制作一个。在difflib的delta中,“更改的行”也有'- '
,但与实际删除的行相反,delta中的下一行标记为'? '
,表示前一个索引中的行“已更改”,而不是删除。 delta中此行的另一个目的是充当“指南”,指示行中的更改位置。'- '
,则根据delta的下几行有四种不同情况:
Case 1: 通过插入一些字符修改的行- The good bad
+ The good the bad
? ++++
- The good the bad
? ----
+ The good bad
案例三:通过删除、插入和/或替换一些字符来修改行:
- The good the bad and ugly
? ^^ ----
+ The g00d bad and the ugly
? ^^ ++++
- The good the bad and the ugly
+ Our ratio is less than 0.75!
'? '
的行显示了修改的类型和位置。ratio()
的值小于0.75,则difflib认为该行已删除。这是我通过一些测试得出的值。def sdiffer(s1, s2):
differ = difflib.Differ()
diffs = list(differ.compare(s1, s2))
i = 0
sdiffs = []
length = len(diffs)
while i < length:
line = diffs[i][2:]
if diffs[i].startswith(' '):
sdiffs.append(('u', line))
elif diffs[i].startswith('+ '):
sdiffs.append(('+', line))
elif diffs[i].startswith('- '):
if i+1 < length and diffs[i+1].startswith('? '): # then diffs[i+2] starts with ('+ '), obviously
sdiffs.append(('c', line))
i += 3 if i + 3 < length and diffs[i + 3].startswith('? ') else 2
elif diffs[i+1].startswith('+ ') and i+2<length and diffs[i+2].startswith('? '):
sdiffs.append(('c', line))
i += 2
else:
sdiffs.append(('-', line))
i += 1
return sdiffs
我不太清楚Perl的“Change”操作是什么。如果它类似于PHP DIFF输出,那么我可以通过这段代码解决我的问题:
def sdiffer(s1, s2):
differ = difflib.Differ()
diffs = list(differ.compare(s1, s2))
i = 0
sdiffs = []
length = len(diffs)
sequence = 0
while i < length:
line = diffs[i][2:]
if diffs[i].startswith(' '):
sequence +=1
sdiffs.append((sequence,'u', line))
elif diffs[i].startswith('+ '):
sequence +=1
sdiffs.append((sequence,'+', line))
elif diffs[i].startswith('- '):
sequence +=1
sdiffs.append((sequence,'-',diffs[i][2:]))
if i+1 < length and diffs[i+1].startswith('? '):
if diffs[i+3].startswith('?') and i+3 < length : # case 2
sequence +=1
sdiffs.append((sequence,'+',diffs[i+2][2:]))
i+=3
elif diffs[i+2].startswith('?') and i+2 < length: # case 3
sequence +=1
sdiffs.append((sequence,'+',diffs[i+2][2:]))
i+=2
elif diffs[i+1].startswith('+ ') and i+2<length and diffs[i+2].startswith('? '): # case 1
sequence +=1
sdiffs.append((sequence,'+', diffs[i+1][2:]))
i += 2
else: # the line is deleted and inserted new line # case 4
sequence +=1
sdiffs.append((sequence,'+', diffs[i+1][2:]))
i+=1
i += 1
return sdiffs
感谢 @Sнаđошƒаӽ 提供的代码。