Pipeline Overview¶
Genomics Workflow¶
The PsychCore NGS pipeline is a serverless, automated, easy to use genomics pipeline for calling germline variants on large cohorts of human sequencing samples, as well as de novo variants. The workflow of the pipeline is as follows:
The pipeline takes gzipped paired end sequencing fastq files (R1|R2.fastq.gz)
for each sample and aligns them to a specified reference build, using BWA MEM.
Several BAM processing steps (Picard and GATK) follow to produce a final
processed BAM, which is haplotyped using Sentieon’s* Haplotyper module. This
BAM can optionally undergo a quality analysis using Picard’s CollectWGSMetrics
module. These steps are done in parallel for each sample in the cohort.
After Haplotyper is run, joint genotyping is performed accross the entire
cohort with Sentieon’s Genotyper module, producing a VCF that gets passed to VQSR (GATK) and
outputs the final VCF.
Pipeline Infrastructure¶
Several AWS services are invovled in the infrastructure of the pipeline - AWS CloudFormation, StepFunctions, Lambda, Batch, EC2, S3, and CloudWatch. The pipeline also makes use of Docker containers, as well as Google Cloud Platform’s (GCP) Google Cloud Storage (GCS) and Dataproc services for downstream processes (in development).
The pipeline’s main bioinformatics tools are BWA MEM, Picard, GATK, and Sentieon. Each of these tools have been (Docker) containerized and are deployed using AWS Batch. AWS Lambda and AWS StepFunctions manage the submission of jobs to the Batch cluster as well as the handling of user input. The entire system architecture of the pipeline is managed by AWS CloudFormation while logging during the run of the pipeline is handled by AWS CloudWatch.
A Note about Sentieon¶
Sentieon develops and supplies a suite of bioinformatics analysis tools for processing genomics data. In 2016, Sentieon won the PrecisionFDA Truth and Consistency challenges. Sentieon also won first place in ICGC TCGA Dream Mutation Calling Challenge. For more information, see Sentieon’s homepage here.