README.md 6.75 KB
Newer Older
Joanna Fourquet's avatar
Joanna Fourquet committed
1
# metagWGS
Celine Noirot's avatar
Celine Noirot committed
2

Joanna Fourquet's avatar
Joanna Fourquet committed
3
## Introduction
Joanna Fourquet's avatar
Joanna Fourquet committed
4

Joanna Fourquet's avatar
Joanna Fourquet committed
5
6
**metagWGS** is a [Nextflow](https://www.nextflow.io/docs/latest/index.html#) bioinformatics analysis pipeline used for **metag**enomic **W**hole **G**enome **S**hotgun sequencing data (Illumina HiSeq3000 or NovaSeq, paired, 2\*150bp).

Joanna Fourquet's avatar
Joanna Fourquet committed
7
### Pipeline graphical representation
Joanna Fourquet's avatar
Joanna Fourquet committed
8
9
10
The workflow processes raw data from `.fastq` or `.fastq.gz` inputs and do the modules represented into this figure:
![](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/raw/dev/docs/Pipeline.png)

Joanna Fourquet's avatar
Joanna Fourquet committed
11
### metagWGS steps
Joanna Fourquet's avatar
Joanna Fourquet committed
12

Joanna Fourquet's avatar
Joanna Fourquet committed
13
metagWGS is splitted into different steps that correspond to different parts of the bioinformatics analysis:
Joanna Fourquet's avatar
Joanna Fourquet committed
14

Joanna Fourquet's avatar
Joanna Fourquet committed
15
* `01_clean_qc` (can ke skipped)
Joanna Fourquet's avatar
Joanna Fourquet committed
16
17
18
19
   * trims adapters sequences and deletes low quality reads ([Cutadapt](https://cutadapt.readthedocs.io/en/stable/#), [Sickle](https://github.com/najoshi/sickle))
   * suppresses host contaminants ([BWA](http://bio-bwa.sourceforge.net/) + [Samtools](http://www.htslib.org/) + [Bedtools](https://bedtools.readthedocs.io/en/latest/))
   * controls the quality of raw and cleaned data ([FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
   * makes a taxonomic classification of cleaned reads ([Kaiju MEM](https://github.com/bioinformatics-centre/kaiju) + [kronaTools](https://github.com/marbl/Krona/wiki/KronaTools) + [Generate_barplot_kaiju.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/Generate_barplot_kaiju.py) + [merge_kaiju_results.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/merge_kaiju_results.py))
Joanna Fourquet's avatar
Joanna Fourquet committed
20
* `02_assembly`
Joanna Fourquet's avatar
Joanna Fourquet committed
21
22
23
   * assembles cleaned reads (combined with `01_clean_qc` step) or raw reads (combined with `--skip_01_clean_qc` parameter) ([metaSPAdes](https://github.com/ablab/spades) or [Megahit](https://github.com/voutcn/megahit))
   * assesses the quality of assembly ([metaQUAST](http://quast.sourceforge.net/metaquast))
   * deduplicates cleaned reads (combined with `01_clean_qc` step) or raw reads (combined with `--skip_01_clean_qc` parameter) ([BWA](http://bio-bwa.sourceforge.net/) + [Samtools](http://www.htslib.org/) + [Bedtools](https://bedtools.readthedocs.io/en/latest/))
Joanna Fourquet's avatar
Joanna Fourquet committed
24
* `03_filtering` (can be skipped)
Joanna Fourquet's avatar
Joanna Fourquet committed
25
   * filters contigs with low CPM value ([Filter_contig_per_cpm.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/Filter_contig_per_cpm.py) + [metaQUAST](http://quast.sourceforge.net/metaquast))
Joanna Fourquet's avatar
Joanna Fourquet committed
26
* `04_structural_annot`
Joanna Fourquet's avatar
Joanna Fourquet committed
27
   * makes a structural annotation of genes ([Prokka](https://github.com/tseemann/prokka) + [Rename_contigs_and_genes.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/Rename_contigs_and_genes.py))
Joanna Fourquet's avatar
Joanna Fourquet committed
28
* `05_alignment`
Joanna Fourquet's avatar
Joanna Fourquet committed
29
30
   * aligns reads to the contigs ([BWA](http://bio-bwa.sourceforge.net/) + [Samtools](http://www.htslib.org/))
   * aligns the protein sequence of genes against a protein database ([DIAMOND](https://github.com/bbuchfink/diamond))
Joanna Fourquet's avatar
Joanna Fourquet committed
31
* `06_func_annot`
Joanna Fourquet's avatar
Joanna Fourquet committed
32
33
34
   * makes a sample and global clustering of genes ([cd-hit-est](http://weizhongli-lab.org/cd-hit/) + [cd_hit_produce_table_clstr.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/cd_hit_produce_table_clstr.py))
   * quantifies reads that align with the genes ([featureCounts](http://subread.sourceforge.net/) + [Quantification_clusters.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/Quantification_clusters.py))
   * makes a functional annotation of genes and a quantification of reads by function ([eggNOG-mapper](http://eggnog-mapper.embl.de/) + [best_bitscore_diamond.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/best_bitscore_diamond.py) + [merge_abundance_and_functional_annotations.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/merge_abundance_and_functional_annotations.py) + [quantification_by_functional_annotation.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/quantification_by_functional_annotation.py))
Joanna Fourquet's avatar
Joanna Fourquet committed
35
* `07_taxo_affi`
Joanna Fourquet's avatar
Joanna Fourquet committed
36
37
   * taxonomically affiliates the genes ([Samtools](http://www.htslib.org/) + [aln2taxaffi.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/aln2taxaffi.py))
   * taxonomically affiliates the contigs ([Samtools](http://www.htslib.org/) + [aln2taxaffi.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/aln2taxaffi.py))
38
   * counts the number of reads and contigs, for each taxonomic affiliation, per taxonomic level ([Samtools](http://www.htslib.org/) + [merge_contig_quantif_perlineage.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/merge_contig_quantif_perlineage.py) + [quantification_by_contig_lineage.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/quantification_by_contig_lineage.py))
Joanna Fourquet's avatar
Joanna Fourquet committed
39
* `08_binning` from [nf-core/mag 1.0.0](https://github.com/nf-core/mag/releases/tag/1.0.0)
Joanna Fourquet's avatar
Joanna Fourquet committed
40
41
42
   * makes binning of contigs ([MetaBAT2](https://bitbucket.org/berkeleylab/metabat/src/master/))
   * assesses bins ([BUSCO](https://busco.ezlab.org/) + [metaQUAST](http://quast.sourceforge.net/metaquast) + [summary_busco.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/summary_busco.py) and [combine_tables.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/combine_tables.py) from [nf-core/mag](https://github.com/nf-core/mag))
   * taxonomically affiliates the bins ([BAT](https://github.com/dutilh/CAT))
Joanna Fourquet's avatar
Joanna Fourquet committed
43

Joanna Fourquet's avatar
Joanna Fourquet committed
44
A report html file is generated at the end of the workflow with [MultiQC](https://multiqc.info/).
Joanna Fourquet's avatar
Joanna Fourquet committed
45

Joanna Fourquet's avatar
Joanna Fourquet committed
46
The pipeline is built using [Nextflow,](https://www.nextflow.io/docs/latest/index.html#) a bioinformatics workflow tool to run tasks across multiple compute infrastructures in a very portable manner.
Joanna Fourquet's avatar
Joanna Fourquet committed
47

Joanna Fourquet's avatar
Joanna Fourquet committed
48
Two [Singularity](https://sylabs.io/docs/) containers are available making installation trivial and results highly reproducible.
Joanna Fourquet's avatar
Joanna Fourquet committed
49

Joanna Fourquet's avatar
Joanna Fourquet committed
50
## Documentation
Joanna Fourquet's avatar
Joanna Fourquet committed
51

Joanna Fourquet's avatar
Joanna Fourquet committed
52
metagWGS documentation is available [here](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/tree/dev/docs).
Joanna Fourquet's avatar
Joanna Fourquet committed
53

Joanna Fourquet's avatar
Joanna Fourquet committed
54
55
56
57
58
59
## License
metagWGS is distributed under the GNU General Public License v3.

## Copyright
2021 INRAE

Claire Hoede's avatar
Claire Hoede committed
60
61
## Funded by
Anti-Selfish (Labex ECOFECT – N° 00002455-CT15000562)
Claire Hoede's avatar
Claire Hoede committed
62

Claire Hoede's avatar
Claire Hoede committed
63
France Génomique National Infrastructure (funded as part of Investissement d’avenir program managed by Agence Nationale de la Recherche, contract ANR-10-INBS-09)
Claire Hoede's avatar
Claire Hoede committed
64

Claire Hoede's avatar
Claire Hoede committed
65
With participation of SeqOccIn members financed by FEDER-FSE MIDI-PYRENEES ET GARONNE 2014-2020.
Claire Hoede's avatar
Claire Hoede committed
66

Joanna Fourquet's avatar
Joanna Fourquet committed
67
68
69
70
71
72
73
74
75
76
## Citation
metagWGS has been presented at JOBIM 2020:

Poster "Whole metagenome analysis with metagWGS", J. Fourquet, C. Noirot, C. Klopp, P. Pinton, S. Combes, C. Hoede, G. Pascal.

https://www.sfbi.fr/sites/sfbi.fr/files/jobim/jobim2020/posters/compressed/jobim2020_poster_9.pdf

metagWGS has been presented at JOBIM 2019 and at Genotoul Biostat Bioinfo day:

Poster "Whole metagenome analysis with metagWGS", J. Fourquet, A. Chaubet, H. Chiapello, C. Gaspin, M. Haenni, C. Klopp, A. Lupo, J. Mainguy, C. Noirot, T. Rochegue, M. Zytnicki, T. Ferry, C. Hoede.