使用循环中的规则执行Snakemake

Question

使用循环中的规则执行Snakemake

10

我想将Snakemake规则放在一个循环中以便规则可以将前一次迭代的输出作为输入。这个是否可能，如果是，如何实现？

下面是我的示例：

设置测试数据

mkdir -p test
echo "SampleA" > test/SampleA.txt
echo "SampleB" > test/SampleB.txt

Snakemake

SAMPLES = ["SampleA", "SampleB"]

rule all:
    input:
        # Output of the final loop
        expand("loop3/{sample}.txt", sample = SAMPLES)


#### LOOP ####
for i in list(range(1, 4)):
    # Setup prefix for input
    if i == 1:
        prefix = "test"
    else:
        prefix = "loop%s" % str(i-1)

    # Setup prefix for output
    opref =  "loop%s" % str(i)

    # Rule
    rule loop_rule:
        input:
            prefix+"/{sample}.txt"
        output:
            prefix+"/{sample}.txt"
            #expand("loop{i}/{sample}.txt", i = i, sample = wildcards.sample)
        params:
            add=prefix
        shell:
            "awk '{{print $0, {params.add}}}' {input} > {output}"

尝试运行示例会出现错误：/Users/fabiangrammes/Desktop/Projects/snake_loop/Snakefile 的第 26 行出现 CreateRuleException: 规则名 loop_rule 已被其他规则使用。如果有人发现解决方法，请告诉我，谢谢！

- Fabian_G

2个回答

6

我的理解是，在运行之前，你的规则会被转换成Python代码，并且在这个过程中，所有原始的Python代码都是按顺序运行的。可以将snakemake规则看作Python函数进行评估。

但有一个限制，任何规则只能被评估为一个函数一次。

你可以使用if/else表达式并根据配置值进行差异化评估规则(一次)，但不能多次评估规则。

我不太确定如何重写你的Snakefile以实现你想要的功能。你能否给出一个需要循环结构的真实示例呢？

--- 编辑

对于固定的迭代次数，可能可以使用输入函数多次运行规则。(我建议不要这样做，一定要小心禁止无限循环)

SAMPLES = ["SampleA", "SampleB"]

rule all:
    input:
        # Output of the final loop
        expand("loop3/{sample}.txt", sample = SAMPLES)

def looper_input(wildcards):
    # could be written more cleanly with a dictionary
    if (wildcards["prefix"] == "loop0"):
        input = "test/{}.txt".format(wildcards["sample"])
    else if (wildcards["prefix"] == "loop1"):
        input = "loop0/{}.txt".format(wildcards["sample"])
    ...
    return input


rule looper:
    input:
            looper_input
    output:
            "{prefix}/{sample}.txt"
    params:
            # ? should this be add="{prefix}" ?
            add=prefix
    shell:
            "awk '{{print $0, {params.add}}}' {input} > {output}"

- Russ Hyde

感谢您的输入Russ。我的真实世界示例是SNP效应的迭代估计。我必须进行迭代。有人知道是否可以通过函数分配规则名称-这可能是我解决问题的一个可能的解决方案吗？ - Fabian_G

你不能在规则的run/shell中定义循环的原因是什么？ - Russ Hyde

也许那个可以行得通，但我不确定。实际上，我有4-5个单独的规则。我今晚会尝试一下。现在正在旅行中。 - Fabian_G

这是一个固定迭代次数的问题，还是一个收敛迭代的问题？如果是前者，您可以使用输入函数来允许迭代。我会尝试添加一些代码。 - Russ Hyde

感谢你的努力，Russ！我会接受Merv的答案，因为它更加优雅，但我真的很感激你的帮助。 - Fabian_G

没问题，但请务必确保递归正确。无限循环会损坏您的硬盘。 - Russ Hyde

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- merv · Accepted Answer

我认为这是使用递归编程的好机会。不必为每个迭代显式地包含条件语句，而是编写一个将迭代(n-1)转换为n的单一规则。大致如下：

SAMPLES = ["SampleA", "SampleB"]

rule all:
    input:
        expand("loop3/{sample}.txt", sample=SAMPLES)

def recurse_sample(wcs):
    n = int(wcs.n)
    if n == 1:
        return "test/%s.txt" % wcs.sample
    elif n > 1:
        return "loop%d/%s.txt" % (n-1, wcs.sample)
    else:
        raise ValueError("loop numbers must be 1 or greater: received %s" % wcs.n)

rule loop_n:
    input: recurse_sample
    output: "loop{n}/{sample}.txt"
    wildcard_constraints:
        sample="[^/]+",
        n="[0-9]+"
    shell:
        """
        awk -v loop='loop{wildcards.n}' '{{print $0, loop}}' {input} > {output}
        """

正如@RussHyde所说，您需要积极采取措施确保不会触发无限循环。为此，我们确保在recurse_sample中涵盖了所有情况，并使用wildcard_constraints确保匹配精确。