GwasQcPipeline Documentation

The CGR GwasQcPipeline generates a number of sample level QC metrics followed by several filtering criteria. Samples are then split into ancestral populations and examined for relatedness, population structure, and genotyping errors. The full workflow is pictured in Fig. 1.

_images/GwasQcPipelineWorkflow.svg

Fig. 1 The CGR GwasQcPipeline.

CGR GwasQcPipeline is designed to be an easy to deploy to multiple cluster systems. It is built on top of snakemake but packaged into as an easy to install python utility. This lets us take advantage of snakemake’s amazing workflow management system, while adding a custom library of helper tools.

CGR GwasQcPipeline is broken into 5 sub-workflows. Each sub-workflow can be run independently, as long as all previous sub-workflows are complete. For example, the Sample QC sub-workflow requires the Entry Points sub-workflow (and optionally the Contamination) to be complete. Here we describe the various steps that each sub-workflow runs, the config options, and a summary of the generated outputs.

User Reference

API Reference

To Do