我有一个snakemake流程,其中我需要对数据进行一个小步骤的处理(对数据帧应用滚动平均)。
我想写出类似这样的代码:
rule average_df:
input:
# script = ,
df_raw = "{sample}_raw.csv"
params:
window = 83
output:
df_avg = "{sample}_avg.csv"
shell:
"""
python
import pandas as pd
df=pd.read_csv("{input.df_raw}")
df=df.rolling(window={params.window}, center=True, min_periods=1).mean()
df.to_csv("{output.df_avg}")
"""
然而它并不起作用。
我是否需要创建一个包含这4行代码的Python文件?我想到的另一种选择有点繁琐。它将是
average_df.py
import pandas as pd
def average_df(i_path, o_path, window):
df=pd.read_csv(path)
df=df.rolling(window=window, center=True, min_periods=1).mean()
df.to_csv(o_path)
return None
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser(description='Description of your program')
parser.add_argument('-i_path', '--input_path', help='csv file', required=True)
parser.add_argument('-o_path', '--output_path', help='csv file ', required=True)
parser.add_argument('-w', '--window', help='window for averaging', required=True)
args = vars(parser.parse_args())
i_path = args['input_path']
o_path = args['output_path']
window = args['window']
average_df(i_path, o_path, window)
然后,将snakemake规则设置为以下内容:
rule average_df:
input:
script = average_df.py,
df_raw = "{sample}_raw.csv"
params:
window = 83
output:
df_avg = "{sample}_avg.csv"
shell:
"""
python average_df.py --input_path {input.df_raw} --ouput_path {output.df_avg} -window {params.window}
"""
有没有更聪明或更有效的方法来做这件事?那将是太棒了!期待您的意见!
run:
替代shell:
参见:https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#snakefiles-and-rules - Alex