让Python脚本自我复制的最佳方法是什么?

7
我正在使用Python进行科学应用程序。我使用各种参数运行模拟,我的脚本将数据输出到相应的目录中。稍后我会使用该数据。然而,有时我会编辑我的脚本;为了能够在需要时再现我的结果,我希望在数据目录中直接保存生成数据所使用的脚本版本的副本。因此,基本上我想让我的Python脚本复制自己到数据目录中。如何最好地完成这个任务?
谢谢!

4
啊,我明白了。你想制造一个病毒? :) - sshashank124
6
为什么不使用版本控制系统(VCS),并将脚本的版本标识符与数据一起存储呢? - Martijn Pieters
1
但是如果你坚持的话: Python脚本有一个__file__全局变量;您可以使用它和shutil.copy()将文件复制到其他位置。 - Martijn Pieters
2
我曾经和你一样遇到了同样的问题。我想出的解决方案是将参数存储在一个 JSON 文件中,该文件可以与 Python 脚本中的参数类进行通信。每次进行模拟时,参数类都会创建一个 JSON 文件,该文件与模拟数据一起存储。 - jrsm
1
Git只是一个例子(因为它是我现在日常使用的),但还有其他选择。尝试并查看你工作场所使用/支持的是什么。我曾经从事科学研究,我可以告诉你,尽早弄清这些事情非常值得。长远来看,这确实会让你的生活更轻松,让你有更多时间做出好的科学成果。 - juanchopanza
显示剩余4条评论
3个回答

11

我偶然发现了这个问题,因为我也想做同样的事情。虽然我同意git / VCS与修订和一切都是最清洁的解决方案,但有时你只是想要快速而粗糙地完成任务。因此,如果还有人感兴趣:

使用 __file__,您可以访问正在运行的脚本文件名(带路径),并且如已经建议的那样,您可以使用高级文件操作库例如shutil将其复制到某个位置。一条命令:

shutil.copy(__file__, 'experiment_folder_path/copied_script_name.py') 

通过相应的导入和一些花里胡哨的东西:

import shutil
import os     # optional: for extracting basename / creating new filepath
import time   # optional: for appending time string to copied script

# generate filename with timestring
copied_script_name = time.strftime("%Y-%m-%d_%H%M") + '_' + os.path.basename(__file__)

# copy script
shutil.copy(__file__, 'my_experiment_folder_path' + os.sep + copied_script_name) 

4

可以使用shutil.copy()复制脚本。

但是你应该考虑将你的脚本保存在版本控制下。这样可以保留修订历史。

例如,我使用git将我的脚本保存在版本控制下。在Python文件中,我倾向于保留一个版本字符串,如下所示;

__version__ = '$Revision: a42ef58 $'[11:-2]

这个版本字符串会在相关文件更改时使用git短哈希标签更新。(这是通过从git的post-commit钩子运行名为update-modified-keywords.py的脚本来实现的。)
如果你有这样的版本字符串,你可以将其嵌入到输出中,这样你就始终知道哪个版本生成了输出。
编辑:
下面是update-modified-keywords脚本的示例;
#!/usr/bin/env python2
# -*- coding: utf-8 -*-
#
# Author: R.F. Smith <rsmith@xs4all.nl>
# $Date: 2013-11-24 22:20:54 +0100 $
# $Revision: 3d4f750 $
#
# To the extent possible under law, Roland Smith has waived all copyright and
# related or neighboring rights to update-modified-keywords.py. This work is
# published from the Netherlands.
# See http://creativecommons.org/publicdomain/zero/1.0/

"""Remove and check out those files that that contain keywords and have
changed since in the last commit in the current working directory."""

from __future__ import print_function, division
import os
import mmap
import sys
import subprocess


def checkfor(args):
    """Make sure that a program necessary for using this script is
    available.

    Arguments:
    args -- string or list of strings of commands. A single string may
            not contain spaces.
    """
    if isinstance(args, str):
        if ' ' in args:
            raise ValueError('No spaces in single command allowed.')
        args = [args]
    try:
        with open(os.devnull, 'w') as bb:
            subprocess.check_call(args, stdout=bb, stderr=bb)
    except subprocess.CalledProcessError:
        print("Required program '{}' not found! exiting.".format(args[0]))
        sys.exit(1)


def modifiedfiles():
    """Find files that have been modified in the last commit.

    :returns: A list of filenames.
    """
    fnl = []
    try:
        args = ['git', 'diff-tree', 'HEAD~1', 'HEAD', '--name-only', '-r',
                '--diff-filter=ACMRT']
        with open(os.devnull, 'w') as bb:
            fnl = subprocess.check_output(args, stderr=bb).splitlines()
            # Deal with unmodified repositories
            if len(fnl) == 1 and fnl[0] is 'clean':
                return []
    except subprocess.CalledProcessError as e:
        if e.returncode == 128:  # new repository
            args = ['git', 'ls-files']
            with open(os.devnull, 'w') as bb:
                fnl = subprocess.check_output(args, stderr=bb).splitlines()
    # Only return regular files.
    fnl = [i for i in fnl if os.path.isfile(i)]
    return fnl


def keywordfiles(fns):
    """Filter those files that have keywords in them

    :fns: A list of filenames
    :returns: A list for filenames for files that contain keywords.
    """
    # These lines are encoded otherwise they would be mangled if this file
    # is checked in my git repo!
    datekw = 'JERhdGU='.decode('base64')
    revkw = 'JFJldmlzaW9u'.decode('base64')
    rv = []
    for fn in fns:
        with open(fn, 'rb') as f:
            try:
                mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
                if mm.find(datekw) > -1 or mm.find(revkw) > -1:
                    rv.append(fn)
                mm.close()
            except ValueError:
                pass
    return rv


def main(args):
    """Main program.

    :args: command line arguments
    """
    # Check if git is available.
    checkfor(['git', '--version'])
    # Check if .git exists
    if not os.access('.git', os.F_OK):
        print('No .git directory found!')
        sys.exit(1)
    print('{}: Updating modified files.'.format(args[0]))
    # Get modified files
    files = modifiedfiles()
    if not files:
        print('{}: Nothing to do.'.format(args[0]))
        sys.exit(0)
    files.sort()
    # Find files that have keywords in them
    kwfn = keywordfiles(files)
    for fn in kwfn:
        os.remove(fn)
    args = ['git', 'checkout', '-f'] + kwfn
    subprocess.call(args)


if __name__ == '__main__':
    main(sys.argv)

如果您不想让关键词扩展混淆您的Git历史记录,您可以使用smudgeclean过滤器。我在我的~/.gitconfig中设置了以下内容:

[filter "kw"]
    clean = kwclean
    smudge = kwset

kwclean和kwset都是Python脚本。

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Author: R.F. Smith <rsmith@xs4all.nl>
# $Date: 2013-11-24 22:20:54 +0100 $
#
# To the extent possible under law, Roland Smith has waived all copyright and
# related or neighboring rights to kwset.py. This work is published from
# the Netherlands. See http://creativecommons.org/publicdomain/zero/1.0/

"""Fill the Date and Revision keywords from the latest git commit and tag and
   subtitutes them in the standard input."""

import os
import sys
import subprocess
import re


def gitdate():
    """Get the date from the latest commit in ISO8601 format.
    """
    args = ['git', 'log',  '-1', '--date=iso']
    dline = [l for l in subprocess.check_output(args).splitlines()
             if l.startswith('Date')]
    try:
        dat = dline[0][5:].strip()
        return ''.join(['$', 'Date: ', dat, ' $'])
    except IndexError:
        raise ValueError('Date not found in git output')


def gitrev():
    """Get the latest tag and use it as the revision number. This presumes the
    habit of using numerical tags. Use the short hash if no tag available.
    """
    args = ['git', 'describe',  '--tags', '--always']
    try:
        with open(os.devnull, 'w') as bb:
            r = subprocess.check_output(args, stderr=bb)[:-1]
    except subprocess.CalledProcessError:
        return ''.join(['$', 'Revision', '$'])
    return ''.join(['$', 'Revision: ', r, ' $'])


def main():
    """Main program.
    """
    dre = re.compile(''.join([r'\$', r'Date:?\$']))
    rre = re.compile(''.join([r'\$', r'Revision:?\$']))
    currp = os.getcwd()
    if not os.path.exists(currp+'/.git'):
        print >> sys.stderr, 'This directory is not controlled by git!'
        sys.exit(1)
    date = gitdate()
    rev = gitrev()
    for line in sys.stdin:
        line = dre.sub(date, line)
        print rre.sub(rev, line),


if __name__ == '__main__':
    main()

并且

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Author: R.F. Smith <rsmith@xs4all.nl>
# $Date: 2013-11-24 22:20:54 +0100 $
#
# To the extent possible under law, Roland Smith has waived all copyright and
# related or neighboring rights to kwclean.py. This work is published from the
# Netherlands. See http://creativecommons.org/publicdomain/zero/1.0/

"""Remove the Date and Revision keyword contents from the standard input."""

import sys
import re

## This is the main program ##
if __name__ == '__main__':
    dre = re.compile(''.join([r'\$', r'Date.*\$']))
    drep = ''.join(['$', 'Date', '$'])
    rre = re.compile(''.join([r'\$', r'Revision.*\$']))
    rrep = ''.join(['$', 'Revision', '$'])
    for line in sys.stdin:
        line = dre.sub(drep, line)
        print rre.sub(rrep, line),

这两个脚本都已安装在我的目录中(通常情况下,可执行文件的文件名不带扩展名),并且已设置其可执行位。

在我的存储库的.gitattributes文件中,我选择哪些文件需要关键字扩展。例如,对于Python文件;

*.py filter=kw

我实现了你的建议。所以,如果你例如运行git status,你就只能接受你的脚本看起来被修改了这个事实? - Kai Sikorski
@KaiSikorski不一定。如果您像更新的答案中所示使用kwsetkwclean过滤器作为smudgeclean过滤器,您可以在不破坏提交历史记录的情况下在工作目录中拥有最新的关键字。 - Roland Smith

2
如果您使用Linux,可以使用以下方法。
import os
os.system("cp ./scriptname ./")

simple and sweet ! - smerllo
@Cs20 回头看,这似乎很像一个 fork 炸弹。嗯,算了。 - Beta Decay

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接