nf-core/chipseq   
 ChIP-seq peak-calling, QC and differential analysis pipeline.
1.0.0). The latest
                                stable release is
 2.1.0 
.
  Introduction
nfcore/chipseq is a bioinformatics analysis pipeline used for Chromatin ImmunopreciPitation sequencing (ChIP-seq) data.
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.
Pipeline summary
- Raw read QC (FastQC)
- Adapter trimming (Trim Galore!)
- Alignment (BWA)
- Mark duplicates (picard)
- Merge alignments from multiple libraries of the same sample (picard)- Re-mark duplicates (picard)
- Filtering to remove:
- reads mapping to blacklisted regions (SAMtools,BEDTools)
- reads that are marked as duplicates (SAMtools)
- reads that arent marked as primary alignments (SAMtools)
- reads that are unmapped (SAMtools)
- reads that map to multiple locations (SAMtools)
- reads containing > 4 mismatches (BAMTools)
- reads that have an insert size > 2kb (BAMTools; paired-end only)
- reads that map to different chromosomes (Pysam; paired-end only)
- reads that arent in FR orientation (Pysam; paired-end only)
- reads where only one read of the pair fails the above criteria (Pysam; paired-end only)
 
- reads mapping to blacklisted regions (
- Alignment-level QC and estimation of library complexity (picard,Preseq)
- Create normalised bigWig files scaled to 1 million mapped reads (BEDTools,bedGraphToBigWig)
- Generate gene-body meta-profile from bigWig files (deepTools)
- Calculate genome-wide IP enrichment relative to control (deepTools)
- Calculate strand cross-correlation peak and ChIP-seq quality measures including NSC and RSC (phantompeakqualtools)
- Call broad/narrow peaks (MACS2)
- Annotate peaks relative to gene features (HOMER)
- Create consensus peakset across all samples and create tabular file to aid in the filtering of the data (BEDTools)
- Count reads in consensus peaks (featureCounts)
- Differential binding analysis, PCA and clustering (R,DESeq2)
 
- Re-mark duplicates (
- Create IGV session file containing bigWig tracks, peaks and differential sites for data visualisation (IGV).
- Present QC for raw read, alignment, peak-calling and differential binding results (MultiQC,R)
Documentation
The nf-core/chipseq pipeline comes with documentation about the pipeline, found in the docs/ directory:
- Installation
- Pipeline configuration
- Running the pipeline
- Output and how to interpret the results
- Troubleshooting
Credits
These scripts were orginally written by Chuan Wang (@chuan-wang) and Phil Ewels (@ewels) for use at the National Genomics Infrastructure at SciLifeLab in Stockholm, Sweden. It has since been re-implemented by Harshil Patel (@drpatelh) from The Bioinformatics & Biostatistics Group at The Francis Crick Institute, London.
Many thanks to others who have helped out along the way too, including (but not limited to): @apeltzer, @bc2zb, @drejom, @KevinMenden, @pditommaso.
Citation
You can cite the nf-core pre-print as follows:
Ewels PA, Peltzer A, Fillinger S, Alneberg JA, Patel H, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. nf-core: Community curated bioinformatics pipelines. bioRxiv. 2019. p. 610741. doi: 10.1101/610741.