Commit 64bce452 authored by Celine Noirot's avatar Celine Noirot
Browse files

rename script merge_idxstats_percontig_lineage.py

Change doc
parent eb9b1188
......@@ -35,7 +35,7 @@ metagWGS is splitted into different steps that correspond to different parts of
* `07_taxo_affi`
* taxonomically affiliates the genes ([Samtools](http://www.htslib.org/) + [aln2taxaffi.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/aln2taxaffi.py))
* taxonomically affiliates the contigs ([Samtools](http://www.htslib.org/) + [aln2taxaffi.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/aln2taxaffi.py))
* counts the number of reads and contigs, for each taxonomic affiliation, per taxonomic level ([Samtools](http://www.htslib.org/) + [merge_idxstats_percontig_lineage.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/merge_idxstats_percontig_lineage.py) + [quantification_by_contig_lineage.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/quantification_by_contig_lineage.py))
* counts the number of reads and contigs, for each taxonomic affiliation, per taxonomic level ([Samtools](http://www.htslib.org/) + [merge_contig_quantif_perlineage.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/merge_contig_quantif_perlineage.py) + [quantification_by_contig_lineage.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/quantification_by_contig_lineage.py))
* `08_binning` from [nf-core/mag 1.0.0](https://github.com/nf-core/mag/releases/tag/1.0.0)
* makes binning of contigs ([MetaBAT2](https://bitbucket.org/berkeleylab/metabat/src/master/))
* assesses bins ([BUSCO](https://busco.ezlab.org/) + [metaQUAST](http://quast.sourceforge.net/metaquast) + [summary_busco.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/summary_busco.py) and [combine_tables.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/combine_tables.py) from [nf-core/mag](https://github.com/nf-core/mag))
......
#!/usr/bin/env python
"""--------------------------------------------------------------------
Script Name: merge_idxstats_percontig_lineage.py
Description: merge idstats and .percontig.tsv files for one sample.
Input files: idxstats file and percontig.tsv file.
Script Name: merge_contig_quantif_perlineage.py
Description: merge quantifications and lineage into one matrice for one sample.
Input files: idxstats file, depth from mosdepth (bed.gz) and lineage percontig.tsv file.
Created By: Joanna Fourquet
Date: 2021-01-19
-----------------------------------------------------------------------
......@@ -37,7 +37,7 @@ print(str(datetime.now()))
# Manage parameters.
parser = argparse.ArgumentParser(description = 'Script which \
merge idstats and .percontig.tsv files for one sample.')
merge quantifications and lineage into one matrice for one sample.')
parser.add_argument('-i', '--idxstats_file', required = True, \
help = 'idxstats file.')
......
......@@ -12,8 +12,9 @@ process {
cpus = { 1 * task.attempt }
memory = { 2.GB * task.attempt }
errorStrategy = { task.exitStatus in [1,143,137,104,134,139] ? 'retry' : 'finish' }
maxRetries = 4
errorStrategy = 'finish'
//{ task.exitStatus in [1,143,137,104,134,139] ? 'retry' : 'finish' }
maxRetries = 1
maxErrors = '-1'
container = 'file://metagwgs/env/metagwgs.sif'
withName: cutadapt {
......
......@@ -35,7 +35,7 @@ metagWGS is splitted into different steps that correspond to different parts of
* `07_taxo_affi`
* taxonomically affiliates the genes ([Samtools](http://www.htslib.org/) + [aln2taxaffi.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/aln2taxaffi.py))
* taxonomically affiliates the contigs ([Samtools](http://www.htslib.org/) + [aln2taxaffi.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/aln2taxaffi.py))
* counts the number of reads and contigs, for each taxonomic affiliation, per taxonomic level ([Samtools](http://www.htslib.org/) + [merge_idxstats_percontig_lineage.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/merge_idxstats_percontig_lineage.py) + [quantification_by_contig_lineage.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/quantification_by_contig_lineage.py))
* counts the number of reads and contigs, for each taxonomic affiliation, per taxonomic level ([Samtools](http://www.htslib.org/) + [merge_contig_quantif_perlineage.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/merge_contig_quantif_perlineage.py) + [quantification_by_contig_lineage.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/quantification_by_contig_lineage.py))
* `08_binning` from [nf-core/mag 1.0.0](https://github.com/nf-core/mag/releases/tag/1.0.0)
* makes binning of contigs ([MetaBAT2](https://bitbucket.org/berkeleylab/metabat/src/master/))
* assesses bins ([BUSCO](https://busco.ezlab.org/) + [metaQUAST](http://quast.sourceforge.net/metaquast) + [summary_busco.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/summary_busco.py) and [combine_tables.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/combine_tables.py) from [nf-core/mag](https://github.com/nf-core/mag))
......
......@@ -132,13 +132,13 @@ The `results/` directory contains a sub-directory for each step launched:
| `SAMPLE_NAME/SAMPLE_NAME.pergene.tsv` | Taxonomic affiliation of genes. One line corresponds to a gene (1st column), its corresponding taxon id (2nd column), its corresponding lineage (3rd column) and the tax ids of each level of this lineage (4th column). |
| `SAMPLE_NAME/SAMPLE_NAME.warn.tsv` | List of genes with a hit without corresponding taxonomic affiliation. Each line corresponds to a gene (1st column), the reason why the gene is in this list (2nd column) and match ids into the database used during `05_alignment/05_2_database_alignment/` (3rd column). |
| `SAMPLE_NAME/SAMPLE_NAME.percontig.tsv` | Taxonomic affiliation of contigs. One line corresponds to a contig (1st column), its corresponding taxon id (2nd column), its corresponding lineage (3rd column) and the tax ids of each level of this lineage (4th column). |
| `SAMPLE_NAME/SAMPLE_NAME_idxstats_percontig.tsv` | Quantification table of reads aligned on contigs affiliated to each lineage of the first column. One line = one taxonomic affiliation (1st column, `lineage_by_level`), the corresponding taxon id (2nd column, `consensus_tax_id`), the tax ids of each level of this taxonomic affiliation (3rd column, `tax_id_by_level`), the name of contigs affiliated to this lineage (4th column, `name_contigs`), the number of contigs affiliated to this lineage (5th column, `nb_contigs`) and the sum of the number of reads aligned to these contigs (6th column, `nb_reads`). |
| `SAMPLE_NAME/SAMPLE_NAME_idxstats_percontig_by_[taxonomic_level].tsv` | One file by taxonomic level (superkingdom, phylum, order, class, family, genus, species) for the sample `SAMPLE_NAME`. Quantification table of reads aligned on contigs affiliated to each lineage of the corresponding [taxonomic level]. One line = one taxonomic affiliation at this [taxonomic level] with is taxon id (1st column, `tax_id_by_level`), its lineage (2nd column, `lineage_by_level`), the name of contigs affiliated to this lineage (3rd column, `name_contigs`), the number of contigs affiliated to this lineage (4th column, `nb_contigs`) and the sum of the number of reads aligned to these contigs (5th column, `nb_reads`). |
| `SAMPLE_NAME/SAMPLE_NAME_quantif_percontig.tsv` | Quantification table of reads aligned on contigs affiliated to each lineage of the first column. One line = one taxonomic affiliation (1st column, `lineage_by_level`), the corresponding taxon id (2nd column, `consensus_tax_id`), the tax ids of each level of this taxonomic affiliation (3rd column, `tax_id_by_level`), the name of contigs affiliated to this lineage (4th column, `name_contigs`), the number of contigs affiliated to this lineage (5th column, `nb_contigs`) and the sum of the number of reads aligned to these contigs (6th column, `nb_reads`). |
| `SAMPLE_NAME/SAMPLE_NAME_quantif_percontig_by_[taxonomic_level].tsv` | One file by taxonomic level (superkingdom, phylum, order, class, family, genus, species) for the sample `SAMPLE_NAME`. Quantification table of reads aligned on contigs affiliated to each lineage of the corresponding [taxonomic level]. One line = one taxonomic affiliation at this [taxonomic level] with is taxon id (1st column, `tax_id_by_level`), its lineage (2nd column, `lineage_by_level`), the name of contigs affiliated to this lineage (3rd column, `name_contigs`), the number of contigs affiliated to this lineage (4th column, `nb_contigs`) and the sum of the number of reads aligned to these contigs (5th column, `nb_reads`). |
| `SAMPLE_NAME/graphs/SAMPLE_NAME_aln_diamond.m8_contig_taxonomy_level.pdf` | Figure representing the number of contigs (y-axis) affiliated to each taxonomy levels (x-axis). |
| `SAMPLE_NAME/graphs/SAMPLE_NAME_aln_diamond.m8_prot_taxonomy_level.pdf` | Figure representing the number of proteins (y-axis) affiliated to each taxonomy levels (x-axis). |
| `SAMPLE_NAME/graphs/SAMPLE_NAME_aln_diamond.m8_nb_prot_annotated_and_assigned.pdf` | Figure representing the number of proteins (y-axis) in our contigs (`Total` bar), the number of proteins with a match into the database (`Annotated` bar) and the number of proteins with a match into the database which is found into the taxonomy (`Assigned` bar) (x-axis). |
| `quantification_by_contig_lineage_all.tsv` | Quantification table of reads aligned on contigs affiliated to each lineage. One line = one taxonomic affiliation with its lineage (1st column, `lineage_by_level`), the taxon id at each level of this lineage (2nd column, `tax_id_by_level`), and then all next 3-columns blocks correspond to one sample. Each 3-column block corresponds to the name of contigs affiliated to this lineage (1st column, `name_contigs_SAMPLE_NAME_idxstats_percontig.tsv`), the number of contigs affiliated to this lineage (2nd column, `nb_contigs_SAMPLE_NAME_idxstats_percontig.tsv`) and the sum of the number of reads aligned to these contigs (3rd column, `nb_reads_SAMPLE_NAME_idxstats_percontig.tsv`). |
| `quantification_by_contig_lineage_[taxonomic_level].tsv` | One file by taxonomic level (superkingdom, phylum, order, class, family, genus, species). Quantification table of reads aligned on contigs affiliated to each lineage of the corresponding [taxonomic level]. One line = one taxonomic affiliation at this [taxonomic level] with its taxon id (1st column, `tax_id_by_level`), its lineage (2nd column, `lineage_by_level`), and then all next 3-columns blocks correspond to one sample. Each 3-column block corresponds to the name of contigs affiliated to this lineage (1st column, `name_contigs_SAMPLE_NAME_idxstats_percontig_by_[taxonomic_level].tsv`), the number of contigs affiliated to this lineage (2nd column, `nb_contigs_SAMPLE_NAME_idxstats_percontig_by_[taxonomic_level].tsv`) and the sum of the number of reads aligned to these contigs (3rd column, `nb_reads_SAMPLE_NAME_idxstats_percontig_by_[taxonomic_level].tsv`). |
| `quantification_by_contig_lineage_all.tsv` | Quantification table of reads aligned on contigs affiliated to each lineage. One line = one taxonomic affiliation with its lineage (1st column, `lineage_by_level`), the taxon id at each level of this lineage (2nd column, `tax_id_by_level`), and then all next 3-columns blocks correspond to one sample. Each 3-column block corresponds to the name of contigs affiliated to this lineage (1st column, `name_contigs_SAMPLE_NAME_quantif_percontig.tsv`), the number of contigs affiliated to this lineage (2nd column, `nb_contigs_SAMPLE_NAME_quantif_percontig.tsv`) and the sum of the number of reads aligned to these contigs (3rd column, `nb_reads_SAMPLE_NAME_quantif_percontig.tsv`). |
| `quantification_by_contig_lineage_[taxonomic_level].tsv` | One file by taxonomic level (superkingdom, phylum, order, class, family, genus, species). Quantification table of reads aligned on contigs affiliated to each lineage of the corresponding [taxonomic level]. One line = one taxonomic affiliation at this [taxonomic level] with its taxon id (1st column, `tax_id_by_level`), its lineage (2nd column, `lineage_by_level`), and then all next 3-columns blocks correspond to one sample. Each 3-column block corresponds to the name of contigs affiliated to this lineage (1st column, `name_contigs_SAMPLE_NAME_quantif_percontig_by_[taxonomic_level].tsv`), the number of contigs affiliated to this lineage (2nd column, `nb_contigs_SAMPLE_NAME_quantif_percontig_by_[taxonomic_level].tsv`) and the sum of the number of reads aligned to these contigs (3rd column, `nb_reads_SAMPLE_NAME_quantif_percontig_by_[taxonomic_level].tsv`). |
#### **08_binning/08_1_binning/**
......
......@@ -1029,13 +1029,13 @@ In this directory you have results per sample of taxonomic affiliation of genes
#### 2. `07_taxo_affi/`
You can find in this directory two types of files:
- `quantification_by_contig_lineage_all.tsv`: the quantification table of reads aligned on contigs affiliated to each lineage. One line = one taxonomic affiliation with its lineage (1st column, `lineage_by_level`), the taxon id at each level of this lineage (2nd column, `tax_id_by_level`), and then all next 3-columns blocks correspond to one sample. Each 3-column block corresponds to the name of contigs affiliated to this lineage (1st column, `name_contigs_SAMPLE_NAME_idxstats_percontig.tsv`), the number of contigs affiliated to this lineage (2nd column, `nb_contigs_SAMPLE_NAME_idxstats_percontig.tsv`) and the sum of the number of reads aligned to these contigs (3rd column, `nb_reads_SAMPLE_NAME_idxstats_percontig.tsv`). We cannot display this table here because even the first lines are too long.
- `quantification_by_contig_lineage_[taxonomic_level].tsv`: one file by taxonomic level (superkingdom, phylum, order, class, family, genus, species). Quantification table of reads aligned on contigs affiliated to each lineage of the corresponding [taxonomic level]. One line = one taxonomic affiliation at this [taxonomic level] with its taxon id (1st column, `tax_id_by_level`), its lineage (2nd column, `lineage_by_level`), and then all next 3-columns blocks correspond to one sample. Each 3-column block corresponds to the name of contigs affiliated to this lineage (1st column, `name_contigs_SAMPLE_NAME_idxstats_percontig_by_[taxonomic_level].tsv`), the number of contigs affiliated to this lineage (2nd column, `nb_contigs_SAMPLE_NAME_idxstats_percontig_by_[taxonomic_level].tsv`) and the sum of the number of reads aligned to these contigs (3rd column, `nb_reads_SAMPLE_NAME_idxstats_percontig_by_[taxonomic_level].tsv`).
- `quantification_by_contig_lineage_all.tsv`: the quantification table of reads aligned on contigs affiliated to each lineage. One line = one taxonomic affiliation with its lineage (1st column, `lineage_by_level`), the taxon id at each level of this lineage (2nd column, `tax_id_by_level`), and then all next 3-columns blocks correspond to one sample. Each 3-column block corresponds to the name of contigs affiliated to this lineage (1st column, `name_contigs_SAMPLE_NAME_quantif_percontig.tsv`), the number of contigs affiliated to this lineage (2nd column, `nb_contigs_SAMPLE_NAME_quantif_percontig.tsv`) and the sum of the number of reads aligned to these contigs (3rd column, `nb_reads_SAMPLE_NAME_quantif_percontig.tsv`). We cannot display this table here because even the first lines are too long.
- `quantification_by_contig_lineage_[taxonomic_level].tsv`: one file by taxonomic level (superkingdom, phylum, order, class, family, genus, species). Quantification table of reads aligned on contigs affiliated to each lineage of the corresponding [taxonomic level]. One line = one taxonomic affiliation at this [taxonomic level] with its taxon id (1st column, `tax_id_by_level`), its lineage (2nd column, `lineage_by_level`), and then all next 3-columns blocks correspond to one sample. Each 3-column block corresponds to the name of contigs affiliated to this lineage (1st column, `name_contigs_SAMPLE_NAME_quantif_percontig_by_[taxonomic_level].tsv`), the number of contigs affiliated to this lineage (2nd column, `nb_contigs_SAMPLE_NAME_quantif_percontig_by_[taxonomic_level].tsv`) and the sum of the number of reads aligned to these contigs (3rd column, `nb_reads_SAMPLE_NAME_quantif_percontig_by_[taxonomic_level].tsv`).
The first lines if the table `quantification_by_contig_lineage_species.tsv` are:
```bash
head quantification_by_contig_lineage_species.tsv
tax_id_by_level lineage_by_level name_contigs_ERR3201928_idxstats_percontig_by_species.tsv nb_contigs_ERR3201928_idxstats_percontig_by_species.tsv nb_reads_ERR3201928_idxstats_percontig_by_species.tsv name_contigs_ERR3201914_idxstats_percontig_by_species.tsv nb_contigs_ERR3201914_idxstats_percontig_by_species.tsv nb_reads_ERR3201914_idxstats_percontig_by_species.tsv name_contigs_ERR3201918_idxstats_percontig_by_species.tsv nb_contigs_ERR3201918_idxstats_percontig_by_species.tsv nb_reads_ERR3201918_idxstats_percontig_by_species.tsv
tax_id_by_level lineage_by_level name_contigs_ERR3201928_quantif_percontig_by_species.tsv nb_contigs_ERR3201928_quantif_percontig_by_species.tsv nb_reads_ERR3201928_quantif_percontig_by_species.tsv name_contigs_ERR3201914_quantif_percontig_by_species.tsv nb_contigs_ERR3201914_quantif_percontig_by_species.tsv nb_reads_ERR3201914_quantif_percontig_by_species.tsv name_contigs_ERR3201918_quantif_percontig_by_species.tsv nb_contigs_ERR3201918_quantif_percontig_by_species.tsv nb_reads_ERR3201918_quantif_percontig_by_species.tsv
1262740 Bacteroides sp. CAG:462 ERR3201928_c2847 1 1756 0 0 0 0 0 0
1262976 Sutterella sp. CAG:397 ERR3201928_c2641 1 542 0 0 0 0 0 0
1262986 Proteobacteria bacterium CAG:139 ERR3201928_c233;ERR3201928_c325;ERR3201928_c422;ERR3201928_c485;ERR3201928_c512;ERR3201928_c521;ERR3201928_c577;ERR3201928_c607;ERR3201928_c630;ERR3201928_c700;ERR3201928_c707;ERR3201928_c713;ERR3201928_c745;ERR3201928_c779;ERR3201928_c871;ERR3201928_c890;ERR3201928_c892;ERR3201928_c914;ERR3201928_c931;ERR3201928_c940;ERR3201928_c964;ERR3201928_c994;ERR3201928_c995;ERR3201928_c997;ERR3201928_c1008;ERR3201928_c1030;ERR3201928_c1041;ERR3201928_c1072;ERR3201928_c1101;ERR3201928_c1110;ERR3201928_c1145;ERR3201928_c1159;ERR3201928_c1178;ERR3201928_c1187;ERR3201928_c1196;ERR3201928_c1311;ERR3201928_c1315;ERR3201928_c1318;ERR3201928_c1341;ERR3201928_c1366;ERR3201928_c1386;ERR3201928_c1394;ERR3201928_c1401;ERR3201928_c1458;ERR3201928_c1519;ERR3201928_c1543;ERR3201928_c1553;ERR3201928_c1572;ERR3201928_c1630;ERR3201928_c1639;ERR3201928_c1662;ERR3201928_c1689;ERR3201928_c1716;ERR3201928_c1787;ERR3201928_c1811;ERR3201928_c1849;ERR3201928_c1927;ERR3201928_c2035;ERR3201928_c2059;ERR3201928_c2069;ERR3201928_c2072;ERR3201928_c2131;ERR3201928_c2190;ERR3201928_c2193;ERR3201928_c2207;ERR3201928_c2235;ERR3201928_c2236;ERR3201928_c2262;ERR3201928_c2281;ERR3201928_c2316;ERR3201928_c2321;ERR3201928_c2428;ERR3201928_c2468;ERR3201928_c2477;ERR3201928_c2486;ERR3201928_c2495;ERR3201928_c2497;ERR3201928_c2501;ERR3201928_c2511;ERR3201928_c2545;ERR3201928_c2606;ERR3201928_c2687;ERR3201928_c2715;ERR3201928_c2758;ERR3201928_c2761;ERR3201928_c2776;ERR3201928_c2801;ERR3201928_c3032;ERR3201928_c3429 89 122433 0 0 0 0 0 0
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment