我不关心已更改的注释,因此我希望使用
git diff 忽略所有与 ^ \ s * \ *。* $ 匹配的行,因为这些都是注释(/* */的一部分)。
我找不到任何告诉 git diff 忽略特定行的方法。
我已经尝试设置textconv属性,以使Git在对它们进行diff之前将文件传递给sed,以便sed可以删除有问题的行 - 但是,问题是 git diff --name-status 实际上并未对文件进行差异比较,仅仅是比较哈希值,而且当然所有哈希值都已更改。
有办法做到这一点吗?
git diff 忽略所有与 ^ \ s * \ *。* $ 匹配的行,因为这些都是注释(/* */的一部分)。
我找不到任何告诉 git diff 忽略特定行的方法。
我已经尝试设置textconv属性,以使Git在对它们进行diff之前将文件传递给sed,以便sed可以删除有问题的行 - 但是,问题是 git diff --name-status 实际上并未对文件进行差异比较,仅仅是比较哈希值,而且当然所有哈希值都已更改。
有办法做到这一点吗?
git (log|diff) -G<regex>
选项的解决方案和一些缺失的文档。*
或#
开头的注释,有时在*
之前还有一个空格... 但仍需要允许#ifdef
,#include
等更改。-G
选项支持,一般情况下也不支持?
,我在使用*
时遇到了问题。然而,+
似乎表现良好。git diff -w -G'(^[^\*# /])|(^#\w)|(^\s+[^\*#/])'
-w
忽略空格-G
仅显示与以下正则表达式匹配的差异行(^[^\*# /])
任何不以星号、井号或空格开头的行(^#\w)
任何以 #
开头后跟一个字母的行(^\s+[^\*#/])
任何以一些空格开头后跟注释字符的行基本上,现在每个文件的进出都被一个 SVN 钩子修改,并且修改了每个文件中的多行注释块。现在我可以将我的更改与 SVN 进行比较,而不会受到 SVN 在注释中添加的 FYI 信息的干扰。
从技术上讲,这将允许 Python 和 Bash 注释(如 #TODO
)在差异中显示,如果 C++ 中的除法运算符从新行开始,则可以忽略它:
a = b
/ c;
此外,Git 中关于 -G
的文档似乎相当不足,因此这里的信息应该会有所帮助:
git diff -G<regex>
该命令用于在 Git 中查找所有与正则表达式匹配的行,并将其输出为差异。
-G<regex>
Look for differences whose patch text contains added/removed lines that match
<regex>
.To illustrate the difference between
-S<regex> --pickaxe-regex
and-G<regex>
, consider a commit with the following diff in the same file:
+ return !regexec(regexp, two->ptr, 1, ®match, 0); ... - hit = !regexec(regexp, mf2.ptr, 1, ®match, 0);
While
git log -G"regexec\(regexp"
will show this commit,git log -S"regexec\(regexp" --pickaxe-regex
will not (because the number of occurrences of that string did not change).See the pickaxe entry in gitdiffcore(7) for more information.
(注意,已测试在Git v2.7.0上)
-G
使用基本正则表达式。?
, *
, !
, {
, }
正则表达式语法。()
进行分组和OR操作,需要用|
。\s
, \W
, 等等。^$
有效。请注意,-G
选项会过滤将被比较的文件。
但是,如果一个文件被“比较”,那些在之前“排除/包括”的行将全部显示在差异中。
仅显示至少提到foo
的一行的文件差异。
git diff -G'foo'
#
开头的行之外的所有文件差异。git diff -G'^[^#]'
FIXME
或 TODO
差异的文件。git diff -G`(FIXME)|(TODO)`
git log -G
,git grep
,git log -S
,--pickaxe-regex
和--pickaxe-all
。if (opts & (DIFF_PICKAXE_REGEX | DIFF_PICKAXE_KIND_G)) {
int cflags = REG_EXTENDED | REG_NEWLINE;
if (DIFF_OPT_TST(o, PICKAXE_IGNORE_CASE))
cflags |= REG_ICASE;
regcomp_or_die(®ex, needle, cflags);
regexp = ®ex;
// and in the regcom_or_die function
regcomp(regex, needle, cflags);
http://man7.org/linux/man-pages/man3/regexec.3.html
REG_EXTENDED
Use POSIX Extended Regular Expression syntax when interpreting
regex. If not set, POSIX Basic Regular Expression syntax is
used.
// ...
REG_NEWLINE
Match-any-character operators don't match a newline.
A nonmatching list ([^...]) not containing a newline does not
match a newline.
Match-beginning-of-line operator (^) matches the empty string
immediately after a newline, regardless of whether eflags, the
execution flags of regexec(), contains REG_NOTBOL.
Match-end-of-line operator ($) matches the empty string
immediately before a newline, regardless of whether eflags
contains REG_NOTEOL.
+
(我刚刚测试过)。 - Emadpresgit diff -G'^[^#]'
。它仍然显示以 #
开头的行。 - Martin Vegtergit diff -G <regex>
并指定一个正则表达式,该正则表达式不与您的版本号行匹配。
我发现使用git difftool
启动外部差异工具最容易:
git difftool -y -x "diff -I '<regex>'"
git diff --numstat --minimal <commit> <commit> | sed '/^[1-]\s\+[1-]\s\+.*/d'
使用'grep'命令在'git diff'输出中查找特定内容
git diff -w | grep -c -E "(^[+-]\s*(\/)?\*)|(^[+-]\s*\/\/)"
只有注释行的更改可以被计算。(A)
使用'git diff --stat'输出,
git diff -w --stat
所有行变化都可以计算出来。(B)
要获取非注释源代码行变化(NCSL)数量,从(B)中减去(A)。
解释:
在'git diff'输出中(忽略空格更改),
注意:由于以下假设可能存在一些注释行计数的小错误,因此结果应该作为近似值。
1.) Source files are based on the C language. Makefile and shell script files have a different convention, '#', to denote the comment lines and if they are part of diffset, their comment lines won't be counted.
2.) The Git convention of line change: If a line is modified, Git sees it as that particular line is deleted and a new line is inserted there and it may look like two lines are changed whereas in reality one line is modified.
In the below example, the new definition of 'FOO' looks like a two-line change.
$ git diff --stat -w abc.h
...
-#define FOO 7
+#define FOO 105
...
1 files changed, 1 insertions(+), 1 deletions(-)
$
3.) Valid comment lines not matching the pattern (or) Valid source code lines matching the pattern can cause errors in the calculation.
+ /*
+ blah blah
+ *
+ */
+ printf("\n %p",
+ *ptr);
对于大多数编程语言,要正确地执行此操作,您必须解析原始源文件/ast,并以此方式排除注释。
一个原因是多行注释的开头可能没有被差异覆盖。另一个原因是语言解析并不是简单的,经常会有一些可以使天真的解析器出错的东西。
我本来想为Python做这个,但字符串处理已经足够满足我的需求了。
对于Python,您可以使用自定义过滤器忽略注释和尝试忽略文档字符串,例如:
#!/usr/bin/env python
import sys
import re
import configparser
from fnmatch import fnmatch
from unidiff import PatchSet
EXTS = ["py"]
class Opts: # pylint: disable=too-few-public-methods
debug = False
exclude = []
def filtered_hunks(fil):
path_re = ".*[.](%s)$" % "|".join(EXTS)
for patch in PatchSet(fil):
if not re.match(path_re, patch.path):
continue
excluded = False
if Opts.exclude:
if Opts.debug:
print(">", patch.path, "=~", Opts.exclude)
for ex in Opts.exclude:
if fnmatch(patch.path, ex):
excluded = True
if excluded:
continue
for hunk in patch:
yield hunk
class Typ: # pylint: disable=too-few-public-methods
LINE = "."
COMMENT = "#"
DOCSTRING = "d"
WHITE = "w"
def classify_lines(fil):
for hunk in filtered_hunks(fil):
yield from classify_hunk(hunk)
def classify_line(lval):
"""Classify a single python line, noting comments, best efforts at docstring start/stop and pure-whitespace."""
lval = lval.rstrip("\n\r")
remaining_lval = lval
typ = Typ.LINE
if re.match(r"^ *$", lval):
return Typ.WHITE, None, ""
if re.match(r"^ *#", lval):
typ = Typ.COMMENT
remaining_lval = ""
else:
slug = re.match(r"^ *(\"\"\"|''')(.*)", lval)
if slug:
remaining_lval = slug[2]
slug = slug[1]
return Typ.DOCSTRING, slug, remaining_lval
return typ, None, remaining_lval
def classify_hunk(hunk):
"""Classify lines of a python diff-hunk, attempting to note comments and docstrings.
Ignores context lines.
Docstring detection is not guaranteed (changes in the middle of large docstrings won't have starts.)
Using ast would fix, but seems like overkill, and cannot be done on a diff-only.
"""
p = ""
prev_typ = 0
pslug = None
for line in hunk:
lval = line.value
lval = lval.rstrip("\n\r")
typ = Typ.LINE
naive_typ, slug, remaining_lval = classify_line(lval)
if p and p[-1] == "\\":
typ = prev_typ
else:
if prev_typ != Typ.DOCSTRING and naive_typ == Typ.COMMENT:
typ = naive_typ
elif naive_typ == Typ.DOCSTRING:
if prev_typ == Typ.DOCSTRING and pslug == slug:
# remainder of line could have stuff on it
typ, _, _ = classify_line(remaining_lval)
else:
typ = Typ.DOCSTRING
pslug = slug
elif prev_typ == Typ.DOCSTRING:
# continue docstring found in this context/hunk
typ = Typ.DOCSTRING
p = lval
prev_typ = typ
if typ == Typ.DOCSTRING:
if re.match(r"(%s) *$" % pslug, remaining_lval):
prev_typ = Typ.LINE
if line.is_context:
continue
yield typ, lval
def count_lines(fil):
"""Totals changed lines of python code, attempting to strip comments and docstrings.
Deletes/adds are counted equally.
Could miss some things, don't rely on exact counts.
"""
count = 0
for (typ, line) in classify_lines(fil):
if Opts.debug:
print(typ, line)
if typ == Typ.LINE:
count += 1
return count
def main():
Opts.debug = "--debug" in sys.argv
Opts.exclude = []
use_covrc = "--covrc" in sys.argv
if use_covrc:
config = configparser.ConfigParser()
config.read(".coveragerc")
cfg = {s: dict(config.items(s)) for s in config.sections()}
exclude = cfg.get("report", {}).get("omit", [])
Opts.exclude = [f.strip() for f in exclude.split("\n") if f.strip()]
for i in range(len(sys.argv)):
if sys.argv[i] == "--exclude":
Opts.exclude.append(sys.argv[i + 1])
if Opts.debug and Opts.exclude:
print("--exclude", Opts.exclude)
print(count_lines(sys.stdin))
example = '''
diff --git a/cryptvfs.py b/cryptvfs.py
index c68429cf6..ee90ecea8 100755
--- a/cryptvfs.py
+++ b/cryptvfs.py
@@ -2,5 +2,17 @@
from src.main import proc_entry
-if __name__ == "__main__":
- proc_entry()
+
+
+class Foo:
+ """some docstring
+ """
+ # some comment
+ pass
+
+class Bar:
+ """some docstring
+ """
+ # some comment
+ def method():
+ line1 + 1
'''
def strio(s):
import io
return io.StringIO(s)
def test_basic():
assert count_lines(strio(example)) == 10
def test_main(capsys):
sys.argv = []
sys.stdin = strio(example)
main()
cap = capsys.readouterr()
print(cap.out)
assert cap.out == "10\n"
def test_debug(capsys):
sys.argv = ["--debug"]
sys.stdin = strio(example)
main()
cap = capsys.readouterr()
print(cap.out)
assert Typ.DOCSTRING + ' """some docstring' in cap.out
def test_exclude(capsys):
sys.argv = ["--exclude", "cryptvfs.py"]
sys.stdin = strio(example)
main()
cap = capsys.readouterr()
print(cap.out)
assert cap.out == "0\n"
def test_covrc(capsys):
sys.argv = ["--covrc"]
sys.stdin = strio(example)
main()
cap = capsys.readouterr()
print(cap.out)
assert cap.out == "10\n"
if __name__ == "__main__":
main()
那段代码可以轻松地修改为生成文件名,而不是计数。
但它当然也可能错误地将docstring的一部分误认为是“代码”(对于覆盖率等内容来说并不是)。
也许可以使用类似这样的Bash脚本:
#!/bin/bash
git diff --name-only "$@" | while read FPATH ; do
LINES_COUNT=`git diff --textconv "$FPATH" "$@" | sed '/^[1-]\s\+[1-]\s\+.*/d' | wc -l`
if [ $LINES_COUNT -gt 0 ] ; then
echo -e "$LINES_COUNT\t$FPATH"
fi
done | sort -n
我使用meld作为工具,通过设置它的选项来忽略注释,然后将meld用作差异工具:
git difftool --tool=meld -y
git diff --name-status --textconv
吗?或者是git diff --name-only
? - rodrigogit diff -I<regex>
。 - VonC