top | item 36738529

(no title)

Yes, it's a DSL.

Here's a simple scatter-gather example. Let's say you want to count the number of lines in each file for a list of samples, and report a table of counts collected from each sample. Define a rule to process each input file, and a rule to collect the results.

I find this much less complex than an equivalent bash workflow. Additionally, these rules can be easily containerized, the workflow can be parallelized, and the workflow is robust to interruption and the addition of new samples. Snakemake manages checking for existing files and running rules as necessary to create missing files, logic that is much more finicky to implement by hand in bash.

    with open('data/samples.txt') as slist:
        SAMPLES = [l.strip() for l in slist.readlines()]
    
    rule all:
        input:
            "results/line_counts.txt"
    
    rule count_lines:
        input:
            "data/lines/{sample}.txt"
        output:
            "processed/count_lines/{sample}.txt"
        shell:
            """
            cat {input} |
              wc -l | 
              paste <(echo -e {wildcards.sample}) - > {output}
            """
    
    rule collect_counts:
        input:
            expand("processed/count_lines/{sample}.txt", sample=SAMPLES)
        output:
            "results/line_counts.txt"
        shell:
            """
            cat <(echo -e "sample\tn_lines") {input} > {output}
            """

discuss

c0l0|2 years ago

This looks like... an unconstrained amalgamation of YAML, python, and zsh/bash. Knowing all these building blocks quite well, I cannot say I find this attractive at all. Seems like it is bound to incur all the problems that having everything in YAML cursed configuration management and deployment orchestration with in ansible.