nf-core/viralmetagenome   
 Detect iSNV and construct whole viral genomes from metagenomic samples
Introduction
Viralgenie is a bioinformatics best-practice analysis pipeline for reconstructing consensus genomes and to identify intra-host variants from metagenomic sequencing data or enriched based sequencing data like hybrid capture.
Pipeline summary
- Read QC (FastQC)
- Performs optional read pre-processing
- Metagenomic diversity mapping
- Denovo assembly (SPAdes,TRINITY,megahit), combine contigs.
- [Optional] extend the contigs with sspace_basic and filter with prinseq++
- [Optional] Map reads to contigs for coverage estimation (BowTie2,BWAmem2andBWA)
- Contig reference idententification (blastn)- Identify top 5 blast hits
- Merge blast hit and all contigs of a sample
 
- [Optional] Precluster contigs based on taxonomy
- Cluster contigs (or every taxonomic bin) of samples, options are:
- [Optional] Remove clusters with low read coverage. bin/extract_clusters.py
- Scaffolding of contigs to centroid (Minimap2,iVar-consensus)
- [Optional] Annotate 0-depth regions with external reference bin/nocov_to_reference.py.
- [Optional] Select best reference from --mapping_constraints:
- Mapping filtered reads to supercontig and mapping constraints(BowTie2,BWAmem2andBWA)
- [Optional] Deduplicate reads (Picardor if UMI’s are usedUMI-tools)
- Variant calling and filtering (BCFTools,iVar)
- Create consensus genome (BCFTools,iVar)
- Repeat step 12-15 multiple times for the denovo contig route
- Consensus evaluation and annotation (QUAST,CheckV,blastn,prokkammseqs-search,MAFFT- alignment of contigs vs iterations & consensus)
- Result summary visualisation for raw read, alignment, assembly, variant calling and consensus calling results (MultiQC)
Usage
If you are new to Nextflow and nf-core, please refer to  this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.
First, prepare a samplesheet with your input data that looks as follows:
samplesheet.csv:
sample,fastq_1,fastq_2
sample1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
sample2,AEG588A5_S5_L003_R1_001.fastq.gz,
sample3,AEG588A3_S3_L002_R1_001.fastq.gz,AEG588A3_S3_L002_R2_001.fastq.gzEach row represents a fastq file (single-end) or a pair of fastq files (paired end).
Now, you can run the pipeline using:
nextflow run Joon-Klaps/viralgenie \
   -profile <docker/singularity/.../institute> \
   --input samplesheet.csv \
   --outdir <OUTDIR>Please provide pipeline parameters via the CLI or Nextflow  -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.
For more details and further functionality, please refer to the usage documentation and the parameter documentation.
Pipeline output
To see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation.
Credits
Viralgenie was originally written by Joon-Klaps.
We thank the following people for their extensive assistance in the development of this pipeline:
Contributions and Support
If you would like to contribute to this pipeline, please see the contributing guidelines.
Citations
Viralgenie is currently not Published. Please cite as: Github https://github.com/Joon-Klaps/viralgenie
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.