Joint variant calling with GATK4 HaplotypeCaller, Google DeepVariant 1.0.0 and Strelka2, coordinated via Snakemake.
git clone https://github.com/NCI-CGR/GEMSCAN.git
This would clone the GEMSCAN github repository under your current folder, i.e.
create config/samples.txt and config/config.yaml as described in the Inputs and outputs section.
go to workflow directory
cd <cloned repo dir>/workflow
You might want to create a submission script that looks like this
#!/bin/bash
#SBATCH --time=2-00:00:00
#SBATCH -o <cloned repo dir>/workflow/snakemake.out
#SBATCH -e <cloned repo dir>/workflow/snakemake.err
module load singularity
export SINGULARITY_CACHEDIR=/data/<yourusername>/
eval "$(conda shell.bash hook)"
conda activate snakemake
snakemake --profile profiles/biowulf --use-singularity --singularity-args "--bind /data/$USER,/fdb,/lscratch" --jobs 100
If you have lots of samples, you may want to keep your main Snakemake workflow manager running until all the tasks finishes and it may exceed the maximum wall time of the norm partition on Biowulf. In that case you may want to use the unlimited partition, i.e.
#!/bin/bash
#SBATCH --partition=unlimited
#SBATCH -o <cloned repo dir>/workflow/snakemake.out
#SBATCH -e <cloned repo dir>/workflow/snakemake.err
module load singularity
export SINGULARITY_CACHEDIR=/data/<yourusername>/
eval "$(conda shell.bash hook)"
conda activate snakemake
snakemake --profile profiles/biowulf --use-singularity --singularity-args "--bind /data/$USER,/fdb,/lscratch" --jobs 100
One thing to pay attention to is that if any data you use is not on the list of directories listed in the example script above, you will have to add them to –singularity-args after –bind, so that singularity will bind those folders as well.
Make sure you have a relatively updated Snakemake in your conda environment (>=5.26.1). Then activate the snakemake conda environment created in the installation step
conda activate snakemake
submit the script
sbatch --time=48:00:00 <script>
monitor the jobs
squeue -u <your_username>
The log files are in <cloned repo dir>/workflow/logs/
, it is also useful to look at <cloned repo dir>/workflow/snakemake.err
to see the snakemake running progress and potential errors