@@ -35,7 +35,7 @@ metagWGS is splitted into different steps that correspond to different parts of
*`07_taxo_affi`
* taxonomically affiliates the genes ([Samtools](http://www.htslib.org/) + [aln2taxaffi.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/aln2taxaffi.py))
* taxonomically affiliates the contigs ([Samtools](http://www.htslib.org/) + [aln2taxaffi.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/aln2taxaffi.py))
* counts the number of reads and contigs, for each taxonomic affiliation, per taxonomic level ([Samtools](http://www.htslib.org/) + [merge_idxstats_percontig_lineage.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/merge_idxstats_percontig_lineage.py) + [quantification_by_contig_lineage.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/quantification_by_contig_lineage.py))
* counts the number of reads and contigs, for each taxonomic affiliation, per taxonomic level ([Samtools](http://www.htslib.org/) + [merge_contig_quantif_perlineage.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/merge_contig_quantif_perlineage.py) + [quantification_by_contig_lineage.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/quantification_by_contig_lineage.py))
*`08_binning` from [nf-core/mag 1.0.0](https://github.com/nf-core/mag/releases/tag/1.0.0)
* makes binning of contigs ([MetaBAT2](https://bitbucket.org/berkeleylab/metabat/src/master/))
* assesses bins ([BUSCO](https://busco.ezlab.org/) + [metaQUAST](http://quast.sourceforge.net/metaquast) + [summary_busco.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/summary_busco.py) and [combine_tables.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/combine_tables.py) from [nf-core/mag](https://github.com/nf-core/mag))
@@ -35,7 +35,7 @@ metagWGS is splitted into different steps that correspond to different parts of
*`07_taxo_affi`
* taxonomically affiliates the genes ([Samtools](http://www.htslib.org/) + [aln2taxaffi.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/aln2taxaffi.py))
* taxonomically affiliates the contigs ([Samtools](http://www.htslib.org/) + [aln2taxaffi.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/aln2taxaffi.py))
* counts the number of reads and contigs, for each taxonomic affiliation, per taxonomic level ([Samtools](http://www.htslib.org/) + [merge_idxstats_percontig_lineage.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/merge_idxstats_percontig_lineage.py) + [quantification_by_contig_lineage.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/quantification_by_contig_lineage.py))
* counts the number of reads and contigs, for each taxonomic affiliation, per taxonomic level ([Samtools](http://www.htslib.org/) + [merge_contig_quantif_perlineage.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/merge_contig_quantif_perlineage.py) + [quantification_by_contig_lineage.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/quantification_by_contig_lineage.py))
*`08_binning` from [nf-core/mag 1.0.0](https://github.com/nf-core/mag/releases/tag/1.0.0)
* makes binning of contigs ([MetaBAT2](https://bitbucket.org/berkeleylab/metabat/src/master/))
* assesses bins ([BUSCO](https://busco.ezlab.org/) + [metaQUAST](http://quast.sourceforge.net/metaquast) + [summary_busco.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/summary_busco.py) and [combine_tables.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/combine_tables.py) from [nf-core/mag](https://github.com/nf-core/mag))
@@ -132,13 +132,13 @@ The `results/` directory contains a sub-directory for each step launched:
| `SAMPLE_NAME/SAMPLE_NAME.pergene.tsv` | Taxonomic affiliation of genes. One line corresponds to a gene (1st column), its corresponding taxon id (2nd column), its corresponding lineage (3rd column) and the tax ids of each level of this lineage (4th column). |
| `SAMPLE_NAME/SAMPLE_NAME.warn.tsv` | List of genes with a hit without corresponding taxonomic affiliation. Each line corresponds to a gene (1st column), the reason why the gene is in this list (2nd column) and match ids into the database used during `05_alignment/05_2_database_alignment/` (3rd column). |
| `SAMPLE_NAME/SAMPLE_NAME.percontig.tsv` | Taxonomic affiliation of contigs. One line corresponds to a contig (1st column), its corresponding taxon id (2nd column), its corresponding lineage (3rd column) and the tax ids of each level of this lineage (4th column). |
| `SAMPLE_NAME/SAMPLE_NAME_idxstats_percontig.tsv` | Quantification table of reads aligned on contigs affiliated to each lineage of the first column. One line = one taxonomic affiliation (1st column, `lineage_by_level`), the corresponding taxon id (2nd column, `consensus_tax_id`), the tax ids of each level of this taxonomic affiliation (3rd column, `tax_id_by_level`), the name of contigs affiliated to this lineage (4th column, `name_contigs`), the number of contigs affiliated to this lineage (5th column, `nb_contigs`) and the sum of the number of reads aligned to these contigs (6th column, `nb_reads`). |
| `SAMPLE_NAME/SAMPLE_NAME_idxstats_percontig_by_[taxonomic_level].tsv` | One file by taxonomic level (superkingdom, phylum, order, class, family, genus, species) for the sample `SAMPLE_NAME`. Quantification table of reads aligned on contigs affiliated to each lineage of the corresponding [taxonomic level]. One line = one taxonomic affiliation at this [taxonomic level] with is taxon id (1st column, `tax_id_by_level`), its lineage (2nd column, `lineage_by_level`), the name of contigs affiliated to this lineage (3rd column, `name_contigs`), the number of contigs affiliated to this lineage (4th column, `nb_contigs`) and the sum of the number of reads aligned to these contigs (5th column, `nb_reads`). |
| `SAMPLE_NAME/SAMPLE_NAME_quantif_percontig.tsv` | Quantification table of reads aligned on contigs affiliated to each lineage of the first column. One line = one taxonomic affiliation (1st column, `lineage_by_level`), the corresponding taxon id (2nd column, `consensus_tax_id`), the tax ids of each level of this taxonomic affiliation (3rd column, `tax_id_by_level`), the name of contigs affiliated to this lineage (4th column, `name_contigs`), the number of contigs affiliated to this lineage (5th column, `nb_contigs`) and the sum of the number of reads aligned to these contigs (6th column, `nb_reads`). |
| `SAMPLE_NAME/SAMPLE_NAME_quantif_percontig_by_[taxonomic_level].tsv` | One file by taxonomic level (superkingdom, phylum, order, class, family, genus, species) for the sample `SAMPLE_NAME`. Quantification table of reads aligned on contigs affiliated to each lineage of the corresponding [taxonomic level]. One line = one taxonomic affiliation at this [taxonomic level] with is taxon id (1st column, `tax_id_by_level`), its lineage (2nd column, `lineage_by_level`), the name of contigs affiliated to this lineage (3rd column, `name_contigs`), the number of contigs affiliated to this lineage (4th column, `nb_contigs`) and the sum of the number of reads aligned to these contigs (5th column, `nb_reads`). |
| `SAMPLE_NAME/graphs/SAMPLE_NAME_aln_diamond.m8_contig_taxonomy_level.pdf` | Figure representing the number of contigs (y-axis) affiliated to each taxonomy levels (x-axis). |
| `SAMPLE_NAME/graphs/SAMPLE_NAME_aln_diamond.m8_prot_taxonomy_level.pdf` | Figure representing the number of proteins (y-axis) affiliated to each taxonomy levels (x-axis). |
| `SAMPLE_NAME/graphs/SAMPLE_NAME_aln_diamond.m8_nb_prot_annotated_and_assigned.pdf` | Figure representing the number of proteins (y-axis) in our contigs (`Total` bar), the number of proteins with a match into the database (`Annotated` bar) and the number of proteins with a match into the database which is found into the taxonomy (`Assigned` bar) (x-axis). |
| `quantification_by_contig_lineage_all.tsv` | Quantification table of reads aligned on contigs affiliated to each lineage. One line = one taxonomic affiliation with its lineage (1st column, `lineage_by_level`), the taxon id at each level of this lineage (2nd column, `tax_id_by_level`), and then all next 3-columns blocks correspond to one sample. Each 3-column block corresponds to the name of contigs affiliated to this lineage (1st column, `name_contigs_SAMPLE_NAME_idxstats_percontig.tsv`), the number of contigs affiliated to this lineage (2nd column, `nb_contigs_SAMPLE_NAME_idxstats_percontig.tsv`) and the sum of the number of reads aligned to these contigs (3rd column, `nb_reads_SAMPLE_NAME_idxstats_percontig.tsv`). |
| `quantification_by_contig_lineage_[taxonomic_level].tsv` | One file by taxonomic level (superkingdom, phylum, order, class, family, genus, species). Quantification table of reads aligned on contigs affiliated to each lineage of the corresponding [taxonomic level]. One line = one taxonomic affiliation at this [taxonomic level] with its taxon id (1st column, `tax_id_by_level`), its lineage (2nd column, `lineage_by_level`), and then all next 3-columns blocks correspond to one sample. Each 3-column block corresponds to the name of contigs affiliated to this lineage (1st column, `name_contigs_SAMPLE_NAME_idxstats_percontig_by_[taxonomic_level].tsv`), the number of contigs affiliated to this lineage (2nd column, `nb_contigs_SAMPLE_NAME_idxstats_percontig_by_[taxonomic_level].tsv`) and the sum of the number of reads aligned to these contigs (3rd column, `nb_reads_SAMPLE_NAME_idxstats_percontig_by_[taxonomic_level].tsv`). |
| `quantification_by_contig_lineage_all.tsv` | Quantification table of reads aligned on contigs affiliated to each lineage. One line = one taxonomic affiliation with its lineage (1st column, `lineage_by_level`), the taxon id at each level of this lineage (2nd column, `tax_id_by_level`), and then all next 3-columns blocks correspond to one sample. Each 3-column block corresponds to the name of contigs affiliated to this lineage (1st column, `name_contigs_SAMPLE_NAME_quantif_percontig.tsv`), the number of contigs affiliated to this lineage (2nd column, `nb_contigs_SAMPLE_NAME_quantif_percontig.tsv`) and the sum of the number of reads aligned to these contigs (3rd column, `nb_reads_SAMPLE_NAME_quantif_percontig.tsv`). |
| `quantification_by_contig_lineage_[taxonomic_level].tsv` | One file by taxonomic level (superkingdom, phylum, order, class, family, genus, species). Quantification table of reads aligned on contigs affiliated to each lineage of the corresponding [taxonomic level]. One line = one taxonomic affiliation at this [taxonomic level] with its taxon id (1st column, `tax_id_by_level`), its lineage (2nd column, `lineage_by_level`), and then all next 3-columns blocks correspond to one sample. Each 3-column block corresponds to the name of contigs affiliated to this lineage (1st column, `name_contigs_SAMPLE_NAME_quantif_percontig_by_[taxonomic_level].tsv`), the number of contigs affiliated to this lineage (2nd column, `nb_contigs_SAMPLE_NAME_quantif_percontig_by_[taxonomic_level].tsv`) and the sum of the number of reads aligned to these contigs (3rd column, `nb_reads_SAMPLE_NAME_quantif_percontig_by_[taxonomic_level].tsv`). |
@@ -1029,13 +1029,13 @@ In this directory you have results per sample of taxonomic affiliation of genes
#### 2. `07_taxo_affi/`
You can find in this directory two types of files:
-`quantification_by_contig_lineage_all.tsv`: the quantification table of reads aligned on contigs affiliated to each lineage. One line = one taxonomic affiliation with its lineage (1st column, `lineage_by_level`), the taxon id at each level of this lineage (2nd column, `tax_id_by_level`), and then all next 3-columns blocks correspond to one sample. Each 3-column block corresponds to the name of contigs affiliated to this lineage (1st column, `name_contigs_SAMPLE_NAME_idxstats_percontig.tsv`), the number of contigs affiliated to this lineage (2nd column, `nb_contigs_SAMPLE_NAME_idxstats_percontig.tsv`) and the sum of the number of reads aligned to these contigs (3rd column, `nb_reads_SAMPLE_NAME_idxstats_percontig.tsv`). We cannot display this table here because even the first lines are too long.
-`quantification_by_contig_lineage_[taxonomic_level].tsv`: one file by taxonomic level (superkingdom, phylum, order, class, family, genus, species). Quantification table of reads aligned on contigs affiliated to each lineage of the corresponding [taxonomic level]. One line = one taxonomic affiliation at this [taxonomic level] with its taxon id (1st column, `tax_id_by_level`), its lineage (2nd column, `lineage_by_level`), and then all next 3-columns blocks correspond to one sample. Each 3-column block corresponds to the name of contigs affiliated to this lineage (1st column, `name_contigs_SAMPLE_NAME_idxstats_percontig_by_[taxonomic_level].tsv`), the number of contigs affiliated to this lineage (2nd column, `nb_contigs_SAMPLE_NAME_idxstats_percontig_by_[taxonomic_level].tsv`) and the sum of the number of reads aligned to these contigs (3rd column, `nb_reads_SAMPLE_NAME_idxstats_percontig_by_[taxonomic_level].tsv`).
-`quantification_by_contig_lineage_all.tsv`: the quantification table of reads aligned on contigs affiliated to each lineage. One line = one taxonomic affiliation with its lineage (1st column, `lineage_by_level`), the taxon id at each level of this lineage (2nd column, `tax_id_by_level`), and then all next 3-columns blocks correspond to one sample. Each 3-column block corresponds to the name of contigs affiliated to this lineage (1st column, `name_contigs_SAMPLE_NAME_quantif_percontig.tsv`), the number of contigs affiliated to this lineage (2nd column, `nb_contigs_SAMPLE_NAME_quantif_percontig.tsv`) and the sum of the number of reads aligned to these contigs (3rd column, `nb_reads_SAMPLE_NAME_quantif_percontig.tsv`). We cannot display this table here because even the first lines are too long.
-`quantification_by_contig_lineage_[taxonomic_level].tsv`: one file by taxonomic level (superkingdom, phylum, order, class, family, genus, species). Quantification table of reads aligned on contigs affiliated to each lineage of the corresponding [taxonomic level]. One line = one taxonomic affiliation at this [taxonomic level] with its taxon id (1st column, `tax_id_by_level`), its lineage (2nd column, `lineage_by_level`), and then all next 3-columns blocks correspond to one sample. Each 3-column block corresponds to the name of contigs affiliated to this lineage (1st column, `name_contigs_SAMPLE_NAME_quantif_percontig_by_[taxonomic_level].tsv`), the number of contigs affiliated to this lineage (2nd column, `nb_contigs_SAMPLE_NAME_quantif_percontig_by_[taxonomic_level].tsv`) and the sum of the number of reads aligned to these contigs (3rd column, `nb_reads_SAMPLE_NAME_quantif_percontig_by_[taxonomic_level].tsv`).
The first lines if the table `quantification_by_contig_lineage_species.tsv` are: