Skip to content

Snakemake

💡 Please have a look at your tutorial for beginners: tutorials/snakemake

How to deal with the dependencies?

Overview

Snakemake can deal with several dependencies system: - Conda - Module - Singularity - Docker (not available on the IFB Core HPC Cluster)

Example

Requirements

The data are available in a Zenodo record: doi.org/10.5281/zenodo.3997236

The file structure:

./
├── data
    ├── SRR3099585_chr18.fastq.gz
    ├── SRR3099586_chr18.fastq.gz
    ├── SRR3099587_chr18.fastq.gz
├── envs
    ├── fastqc-0.11.9.yml
    ├── multiqc-1.9.yml
├── multiqc.smk
├── [...]

The snakefile

  • multiqc.smk
SAMPLES = ["SRR3099585_chr18","SRR3099586_chr18","SRR3099587_chr18"]

rule all:
  input:
    expand("FastQC/{sample}_fastqc.html", sample=SAMPLES),
    "multiqc_report.html"

rule multiqc:
  output:
    "multiqc_report.html"
  input:
    expand("FastQC/{sample}_fastqc.zip", sample = SAMPLES)
  log:
    std="Logs/multiqc.std",
    err="Logs/multiqc.err"
  conda:
    "envs/multiqc-1.9.yml"
  container:
    "https://depot.galaxyproject.org/singularity/multiqc:1.10.1--pyhdfd78af_1"
  envmodules:
    "multiqc/1.9"
  shell: "multiqc {input} 1>{log.std} 2>{log.err}"

rule fastqc:
  output:
    "FastQC/{sample}_fastqc.zip",
    "FastQC/{sample}_fastqc.html"
  input:
    "data/{sample}.fastq.gz"
  log:
    std="Logs/{sample}_fastqc.std",
    err="Logs/{sample}_fastqc.err"
  conda:
    "envs/fastqc-0.11.9.yml"
  container:
     "docker://biocontainers/fastqc:v0.11.9_cv8"
  envmodules:
    "fastqc/0.11.9"
  shell: "fastqc --outdir FastQC/ {input} 1>{log.std} 2>{log.err}"

[optional] The conda env files

  • envs/fastqc-0.11.9.yml
channels:
 - conda-forge
 - bioconda
 - default
dependencies:
 - bioconda::fastqc=0.11.9
  • envs/multiqc-1.9.yml
    channels:
    - conda-forge
    - bioconda
    - default
    dependencies:
    - bioconda::multiqc=1.9
    

Run

With the same Snakefile above, you can test 3 different dependencies management systems.

💡 The Env Module is the quicker solution if you are using the IFB Core HPC Cluster, but maybe not the more portable.

Use Module
module purge; module load snakemake slurm-drmaa
# cleanup
snakemake -c 1 -s ex1_o8.smk --delete-all-output; rm -rf multiqc_*

snakemake --drmaa --jobs=3 -s ex1_o8.smk --use-envmodule
Use Conda
module purge; module load snakemake slurm-drmaa conda
# cleanup
snakemake -c 1 -s ex1_o8.smk --delete-all-output; rm -rf multiqc_*

snakemake --drmaa --jobs=3 -s ex1_o8.smk --use-conda
Use Singularity
module purge; module load snakemake slurm-drmaa singularity
# cleanup
snakemake -c 1 -s ex1_o8.smk --delete-all-output; rm -rf multiqc_*

snakemake --drmaa --jobs=3 -s ex1_o8.smk --use-singularity