我们有一个包含大量目录和文件的svn仓库,我们的构建系统需要能够在检出之前递归地找到该仓库中某个分支的所有svn:externals属性。目前我们使用以下命令:
svn propget svn:externals -R http://url.of.repo/Branch
这证明非常耗费时间,而且占用大量带宽。看起来客户端正在接收仓库中所有东西的属性,并在本地进行过滤(虽然我尚未通过wireshark确认)。有没有更快的方法呢?最好是一些只返回所需数据的服务器获取方法。
我们有一个包含大量目录和文件的svn仓库,我们的构建系统需要能够在检出之前递归地找到该仓库中某个分支的所有svn:externals属性。目前我们使用以下命令:
svn propget svn:externals -R http://url.of.repo/Branch
这证明非常耗费时间,而且占用大量带宽。看起来客户端正在接收仓库中所有东西的属性,并在本地进行过滤(虽然我尚未通过wireshark确认)。有没有更快的方法呢?最好是一些只返回所需数据的服务器获取方法。
正如您所提到的,这确实会消耗网络带宽。但是,如果您可以访问托管这些存储库的服务器,则可以通过file://
协议运行它。经证明,这样做更快且不消耗网络带宽。
svn propget svn:externals -R file:///path/to/repo/Branch
另外,如果您已经将整个工作副本放置好了,您也可以在其中运行它。
svn propget svn:externals -R /path/to/WC
希望这能帮助你更快地实现结果!
不确定我从哪里找到了这个宝石,但它非常有用,可以看到具有自己外部库的外部库:
Windows:
svn status . | findstr /R "^X"
Linux/Unix:
svn status . | grep -E "^X"
我最终想出了一个解决方案。我决定将请求分成多个小的svn请求,然后让每个请求成为线程池中运行的任务。这会给svn服务器带来一定压力,但在我们的情况下,svn服务器位于局域网内,而且此查询仅在完整构建期间进行,因此似乎不是问题。
import os
import sys
import threading
import ThreadPool
thread_pool = ThreadPool.ThreadPool(8)
externs_dict = {}
externs_lock = threading.Lock()
def getExternRev( path, url ):
cmd = 'svn info "%s"' % url
pipe = os.popen(cmd, 'r')
data = pipe.read().splitlines()
#Harvest last changed rev
for line in data:
if "Last Changed Rev" in line:
revision = line.split(":")[1].strip()
externs_lock.acquire()
externs_dict[path] = (url, revision)
externs_lock.release()
def getExterns(url, base_dir):
cmd = 'svn propget svn:externals "%s"' % url
pipe = os.popen(cmd, 'r')
data = pipe.read().splitlines()
pipe.close()
for line in data:
if line:
line = line.split()
path = base_dir + line[0]
url = line[1]
thread_pool.add_task( getExternRev, path, url )
def processDir(url, base_dir):
thread_pool.add_task( getExterns, url, base_dir )
cmd = 'svn list "%s"' % url
pipe = os.popen(cmd, 'r')
listing = pipe.read().splitlines()
pipe.close()
dir_list = []
for node in listing:
if node.endswith('/'):
dir_list.append(node)
for node in dir_list:
#externs_data.extend( analyzePath( url + node, base_dir + node ) )
thread_pool.add_task( processDir, url+node, base_dir+node )
def analyzePath(url, base_dir = ''):
thread_pool.add_task( processDir, url, base_dir )
thread_pool.wait_completion()
analyzePath( "http://url/to/repository" )
print externs_dict
os.popen()
时遇到了问题,当它在一个线程中执行时,它会默默地死掉。我放弃了在线程中运行它,并从脚本中删除了所有的线程部分。虽然我在牺牲速度,但这个脚本比propget -R
更可靠,因为后者在仓库太大时会默默地死掉。 - ceilfors由于-R开关,它变得很慢;在您的存储库路径中搜索属性时会递归搜索所有目录,这是很多工作。
这不是一个理想的解决方案(可能会有副作用),也不能回答你的问题,但是
你可以重写所有外部定义并将(重写后的)定义添加到一个共同的、已知的位置——这样在更改后就可以消除pg中的递归。
如果您不介意使用Python和pysvn库,这里是一个完整的命令行程序,我用于SVN外部引用:
"""
@file
@brief SVN externals utilities.
@author Lukasz Matecki
"""
import sys
import os
import pysvn
import argparse
class External(object):
def __init__(self, parent, remote_loc, local_loc, revision):
self.parent = parent
self.remote_loc = remote_loc
self.local_loc = local_loc
self.revision = revision
def __str__(self):
if self.revision.kind == pysvn.opt_revision_kind.number:
return """\
Parent: {0}
Source: {1}@{2}
Local name: {3}""".format(self.parent, self.remote_loc, self.revision.number, self.local_loc)
else:
return """\
Parent: {0}
Source: {1}
Local name: {2}""".format(self.parent, self.remote_loc, self.local_loc)
def find_externals(client, repo_path, external_path=None):
"""
@brief Find SVN externals.
@param client (pysvn.Client) The client to use.
@param repo_path (str) The repository path to analyze.
@param external_path (str) The URL of the external to find; if omitted, all externals will be searched.
@returns [External] The list of externals descriptors or empty list if none found.
"""
repo_root = client.root_url_from_path(repo_path)
def parse(ext_prop):
for parent in ext_prop:
external = ext_prop[parent]
for line in external.splitlines():
path, name = line.split()
path = path.replace("^", repo_root)
parts = path.split("@")
if len(parts) > 1:
url = parts[0]
rev = pysvn.Revision(pysvn.opt_revision_kind.number, int(parts[1]))
else:
url = parts[0]
rev = pysvn.Revision(pysvn.opt_revision_kind.head)
retval = External(parent, url, name, rev)
if external_path and not external_path == url:
continue
else:
yield retval
for entry in client.ls(repo_path, recurse=True):
if entry["kind"] == pysvn.node_kind.dir and entry["has_props"] == True:
externals = client.propget("svn:externals", entry["name"])
if externals:
for e in parse(externals):
yield e
def check_externals(client, externals_list):
for i, e in enumerate(externals_list):
url = e.remote_loc
rev = e.revision
try:
info = client.info2(url, revision=rev, recurse=False)
props = info[0][1]
url = props.URL
print("[{0}] Existing:\n{1}".format(i + 1, "\n".join([" {0}".format(line) for line in str(e).splitlines()])))
except:
print("[{0}] Not found:\n{1}".format(i + 1, "\n".join([" {0}".format(line) for line in str(e).splitlines()])))
def main(cmdargs):
parser = argparse.ArgumentParser(description="SVN externals processing.",
formatter_class=argparse.RawDescriptionHelpFormatter,
prefix_chars='-+')
SUPPORTED_COMMANDS = ("check", "references")
parser.add_argument(
"action",
type=str,
default="check",
choices=SUPPORTED_COMMANDS,
help="""\
the operation to execute:
'check' to validate all externals in a given location;
'references' to print all references to a given location""")
parser.add_argument(
"url",
type=str,
help="the URL to operate on")
parser.add_argument(
"--repo", "-r",
dest="repo",
type=str,
default=None,
help="the repository (or path within) to perform the operation on, if omitted is inferred from url parameter")
args = parser.parse_args()
client = pysvn.Client()
if args.action == "check":
externals = find_externals(client, args.url)
check_externals(client, externals)
elif args.action == "references":
if args.repo:
repo_root = args.repo
else:
repo_root = client.root_url_from_path(args.url)
for i, e in enumerate(find_externals(client, repo_root, args.url)):
print("[{0}] Reference:\n{1}".format(i + 1, "\n".join([" {0}".format(line) for line in str(e).splitlines()])))
if __name__ == "__main__":
sys.exit(main(sys.argv))
这应该适用于Python 2和Python 3。您可以像这样使用它(实际地址已删除):
python svn_externals.py references https://~~~~~~~~~~~~~~/cmd_utils.py
[1] Reference:
Parent: https://~~~~~~~~~~~~~~/BEFORE_MK2/scripts/utils
Source: https://~~~~~~~~~~~~~~/tools/python/cmd_utils.py
Local name: cmd_utils.py
[2] Reference:
Parent: https://~~~~~~~~~~~~~~/VTB-1425_PCU/scripts/utils
Source: https://~~~~~~~~~~~~~~/tools/python/cmd_utils.py
Local name: cmd_utils.py
[3] Reference:
Parent: https://~~~~~~~~~~~~~~/scripts/utils
Source: https://~~~~~~~~~~~~~~/tools/python/cmd_utils.py
Local name: cmd_utils.py
就性能而言,这个程序运行得非常快(虽然我的代码库很小)。你需要自己检查一下。
file://
会在几秒钟后给我 Aborted。类似于 http://svn.haxx.se/users/archive-2007-04/0500.shtml。 - ceilfors