GwasQcPipeline Documentation¶
The CGR GwasQcPipeline generates a number of sample level QC metrics followed by several filtering criteria. Samples are then split into ancestral populations and examined for relatedness, population structure, and genotyping errors. The full workflow is pictured in Fig. 1.
CGR GwasQcPipeline is designed to be an easy to deploy to multiple cluster systems. It is built on top of snakemake but packaged into as an easy to install python utility. This lets us take advantage of snakemake’s amazing workflow management system, while adding a custom library of helper tools.
CGR GwasQcPipeline is divided into six sub-workflows. Each of which can be run independently, provided that all previous sub-workflows completed. For example, the Sample QC sub-workflow requires the Entry Points sub-workflow (and optionally the Contamination) to be complete. Here we describe the various steps that each sub-workflow runs, the config options, and a summary of the generated outputs.