Contamination Sub-workflow¶
- Workflow File:
Config Options: see The config.yml for more details
reference_files.thousand_genome_vcf
reference_files.thousand_genome_tbi
user_files.gtc_pattern
user_files.idat_pattern
software_params.contam_population
Major Outputs:
sample_level/<BPM Prefix>.<software_params.contam_population>.abf.txt
B allele frequencies from the 1000 genomes.
sample_level/contamination/median_idat_intensity.csv
aggregated table of median IDAT intensities.
sample_level/contamination/verifyIDintensity.csv
aggregated table of contamination scores.
![../_images/contamination.png](../_images/contamination.png)
Fig. 3 The contamination sub-workflow. This workflow will estimate contamination using verifyIDintensity on each sample individually. It requires that you have GTC/IDAT files. It first pull B-allele frequencies from the 1000 Genomes VCF file. It then estimate contamination for each sample and aggregates these results. Finally, it also estimates the per sample median IDAT intensity, which is used to filter contamination results in the Sample QC Sub-workflow¶