metagWGS issueshttps://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/issues2024-02-15T21:14:28+01:00https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/issues/241use new version of gtdbtk because it correct the bug with pplacer2024-02-15T21:14:28+01:00Claire Hoedeuse new version of gtdbtk because it correct the bug with pplacerpplacer skip randomly some MAGs. new version of gtdb-tk correct this trouble.pplacer skip randomly some MAGs. new version of gtdb-tk correct this trouble.Issues en vrachttps://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/issues/240test MetaMDBG2024-02-09T08:46:32+01:00Claire Hoedetest MetaMDBGNew assembler for HiFi reads : https://www.nature.com/articles/s41587-023-01983-6New assembler for HiFi reads : https://www.nature.com/articles/s41587-023-01983-6Issues en vrachttps://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/issues/239tester https://github.com/GaetanBenoitDev/metaMDBG pour assembler les reads HiFi2023-11-28T15:36:42+01:00Claire Hoedetester https://github.com/GaetanBenoitDev/metaMDBG pour assembler les reads HiFiSemble sur le papier plus efficace et plus rapide en utilisant moins de RAM.
A tester.Semble sur le papier plus efficace et plus rapide en utilisant moins de RAM.
A tester.Issues en vrachttps://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/issues/235Improvment siggested by Maria2023-12-26T17:22:39+01:00Claire HoedeImprovment siggested by Maria- rajouter le id et coverage (qcov et pident) du seed.ortholog dans les sorties eggnog mapper,
- filtrer sur la mapq 60 les reads sur les bins,
- donner l'info de la coverage des génomes dans chaque sample.- rajouter le id et coverage (qcov et pident) du seed.ortholog dans les sorties eggnog mapper,
- filtrer sur la mapq 60 les reads sur les bins,
- donner l'info de la coverage des génomes dans chaque sample.Issues en vrachttps://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/issues/228binning : benschmark of other tools / methods2024-01-17T17:40:40+01:00Claire Hoedebinning : benschmark of other tools / methodsPacbio a amélioré sa stratégie de binning :
La tester : https://github.com/PacificBiosciences/pb-metagenomics-tools/blob/master/docs/Tutorial-HiFi-MAG-Pipeline.md
tester aussi semibin2 et comebin (deep learning) rajouté dans notre appro...Pacbio a amélioré sa stratégie de binning :
La tester : https://github.com/PacificBiosciences/pb-metagenomics-tools/blob/master/docs/Tutorial-HiFi-MAG-Pipeline.md
tester aussi semibin2 et comebin (deep learning) rajouté dans notre approche (ou à la place d'un autre outil)Issues en vrachttps://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/issues/227check if assembly filter is working well when co-assembly is configured and a...2023-04-19T14:45:27+02:00Claire Hoedecheck if assembly filter is working well when co-assembly is configured and add the choice to not filter on cpm but in contig lengthAfter some experiences, we observed that, for a lot of data, the cpm filter can be too stringent. A minimal length contig treshold is more relevant. Moreover, a collegue has observed a curious behaviour for the cpm filter in case of co-a...After some experiences, we observed that, for a lot of data, the cpm filter can be too stringent. A minimal length contig treshold is more relevant. Moreover, a collegue has observed a curious behaviour for the cpm filter in case of co-assembly. We need to check that.Issues en vrachttps://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/issues/224improve checkM2 database managing2023-03-24T13:30:10+01:00Claire Hoedeimprove checkM2 database managingnew version of checkM2 allows specification of th database in the command line. So we can manage database as the others. And not dl it in the sigularity image.new version of checkM2 allows specification of th database in the command line. So we can manage database as the others. And not dl it in the sigularity image.Issues en vracDARBOT VincentDARBOT Vincenthttps://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/issues/221provide the possibility to give the metaflye infos of circular contigs as inp...2023-01-26T09:54:44+01:00VIENNE MAINAprovide the possibility to give the metaflye infos of circular contigs as input at the same time as the assemblyLors du binning HiFi, on utilise l'information sur les contigs circulaires. Pour l'assembleur metaflye cette infos est continue dans un fichier séparé de l'assemblage. Lorsqu'un utilisateur donne un assemblage de metaflye en input, il fa...Lors du binning HiFi, on utilise l'information sur les contigs circulaires. Pour l'assembleur metaflye cette infos est continue dans un fichier séparé de l'assemblage. Lorsqu'un utilisateur donne un assemblage de metaflye en input, il faudrait lui donner la possiblité de fournir le fichier d'info sur les contigs ciruclaire, dans la samplesheet ou en parametre.Issues en vrachttps://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/issues/220Duplicate clustering if coassembly all samples2023-01-16T17:07:53+01:00VIENNE MAINADuplicate clustering if coassembly all samplesSi on coasssemble tous les echantillons ensemble, le clustering de l'etape 6 est techniquement fait 2 fois (pour individuel et global)Si on coasssemble tous les echantillons ensemble, le clustering de l'etape 6 est techniquement fait 2 fois (pour individuel et global)Issues en vrachttps://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/issues/215Make binning results compatible with anvio2022-12-02T15:19:02+01:00Ghost UserMake binning results compatible with anvioNeed to try anvio and check required files.Need to try anvio and check required files.Issues en vracVIENNE MAINAVIENNE MAINAhttps://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/issues/214Refactoring2023-01-13T12:02:43+01:00Claire HoedeRefactoring- [x] Quantification_clusters.py : remplacer le sprint par des logging puis enlever code inutile, faire un main et tester avec les fichiers d'input et output nécessaires extraits des tests.
- [x] cd_hit_produce_table_produce.py : voir da...- [x] Quantification_clusters.py : remplacer le sprint par des logging puis enlever code inutile, faire un main et tester avec les fichiers d'input et output nécessaires extraits des tests.
- [x] cd_hit_produce_table_produce.py : voir dans cd_hit.nf l'appel de ce script et recoder un script python qui lis le fasta.clstr et fait la suite. OU faire tout en awk ou en bash mais pas de mélange des deux.
- [x] merge_kaiju_result.py : idem premier : mettre le main remplacer les print par logging, enlever code inutile, mettre le header etc... et tester avant et après refactoring qu'on a rien cassé.
- [ ] merge_abundance_and_functionnal_annotations.py : idem plus haut. tests et refactoring avec un main et des logging et refactoring si nécessaire.
- [ ] chercher dans le code si on a toujours un objet df_name = pd.read_csv, sinon renommer l'objet.
- [ ] Faire passer un pep8 sur tout le code pour voir si tout va bien.
- [ ] Utiliser pyfastx au lieu de biopython pour parser du fastq et du fasta
- [x] merge_contig_quantif_perlineage.py : on peut faire une fonction pour les deux blocs finaux des krona qui prenne en argument nb_reads ou depth.
- [ ] plots_contigs_taxonomic_affiliation.py : améliorer le header, renommer df en df_affi
- [ ] quantification_by_contig_lineage.py : header à améliorer avec exemple, les print à enlever changer avec logging, mettre un main
- [ ] quantification_by_functional_annotation : enlever la fonction au milieu du script, main à rajouter, print enlever la duplication de code en faisant une fonction, utiliser les noms des colonnes au lieu de leur index.
- [ ] db_versions, mettre header : voir si on peut récupérer la taille d'un répertoire en python
- [ ] rajouter py.fastx partout même dans binnin (image singularity)Issues en vrachttps://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/issues/213Add info if bin is RNA complete or not2023-01-20T10:42:25+01:00Ghost UserAdd info if bin is RNA complete or notBins quality is also evaluated according the number of tRNA and if all rRNA genes are found.
Check : https://ena-docs.readthedocs.io/en/latest/faq/metagenomes.html#how-is-the-quality-of-a-metagenomic-assembly-defined
Add the number...Bins quality is also evaluated according the number of tRNA and if all rRNA genes are found.
Check : https://ena-docs.readthedocs.io/en/latest/faq/metagenomes.html#how-is-the-quality-of-a-metagenomic-assembly-defined
Add the number of tRNA and if all rRNA genes are found in the bins to the stat bin table.Issues en vrachttps://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/issues/212Minimap2 requires an extra parameter for large reference (>4GB)2023-01-20T10:55:15+01:00Ghost UserMinimap2 requires an extra parameter for large reference (>4GB)See minimap2 FAQ: https://github.com/lh3/minimap2/blob/master/FAQ.md#3-the-output-sam-doesnt-have-a-header
And this issue: https://github.com/lh3/minimap2/issues/301
Minimap2 process would break with a large reference genome and requi...See minimap2 FAQ: https://github.com/lh3/minimap2/blob/master/FAQ.md#3-the-output-sam-doesnt-have-a-header
And this issue: https://github.com/lh3/minimap2/issues/301
Minimap2 process would break with a large reference genome and requires to add --split-prefix or increase the -I parameter.Issues en vrachttps://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/issues/204CDHIT on aa protein sequences rather than on nucleotides protein sequences2022-10-12T09:57:24+02:00Ghost UserCDHIT on aa protein sequences rather than on nucleotides protein sequencesGoal of the clustering is to have a shared table gathering of functions and their abundance across samples.
The clustering is done one nucleotide genes sequences. It could be interesting to do it on amino acid sequences. This would be f...Goal of the clustering is to have a shared table gathering of functions and their abundance across samples.
The clustering is done one nucleotide genes sequences. It could be interesting to do it on amino acid sequences. This would be faster and would allow to cluster proteins that have similar aa sequences and so similar function even if their nucleotide sequences have diverged.
We would still use a strict identity threshold (>95% ?) to cluster aa sequences as the main goal is to have a shared function table between sample and not to have protein families.Issues en vrachttps://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/issues/201Improve HiFi binning2022-12-02T15:09:30+01:00Ghost UserImprove HiFi binningDifferent possibility to improve HiFi binning:
* Implement a circular aware startegy as in the [Binning Pacbio Pipeline](https://github.com/PacificBiosciences/pb-metagenomics-tools/blob/master/docs/Tutorial-HiFi-MAG-Pipeline.md). Check o...Different possibility to improve HiFi binning:
* Implement a circular aware startegy as in the [Binning Pacbio Pipeline](https://github.com/PacificBiosciences/pb-metagenomics-tools/blob/master/docs/Tutorial-HiFi-MAG-Pipeline.md). Check on project if it would really improved the results
* Use of assembly graph: information of the assembly graph could be used to bin contigs : Check [RepBin](https://github.com/xuehansheng/RepBin), [https://github.com/metagentools/GraphBin2](GraphBin2), [GraphMB](https://github.com/MicrobialDarkMatter/GraphMB)
* Use of methylation marks found in HiFi reads : check [nanodisco](https://github.com/fanglab/nanodisco) use to do that with ONT reads. The methods was originally made for PacBio RS II data with the tool [mbin](https://github.com/fanglab/mbin). It is now possible to call methylation in HiFi reads so we could theoretically apply this method with our HiFi reads.Issues en vrachttps://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/issues/199Re-activate the workflow installation with conda2022-09-09T11:28:42+02:00Claire HoedeRe-activate the workflow installation with condaInstallation with conda don't work anymore.
We need to fix this and think to test it regularly.Installation with conda don't work anymore.
We need to fix this and think to test it regularly.Issues en vrachttps://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/issues/198Add taxonomic affiliation plots to multiqc2022-09-01T16:46:56+02:00Ghost UserAdd taxonomic affiliation plots to multiqcThe taxonomic affiliation plots (#136 ) should be part of the multiqc final report.
These plots combines the taxonomic affiliation of contigs and the abundances from the samtools coverage so that would not fit any official multiqc modul...The taxonomic affiliation plots (#136 ) should be part of the multiqc final report.
These plots combines the taxonomic affiliation of contigs and the abundances from the samtools coverage so that would not fit any official multiqc module. However, it is still possible to code a custom module for that.
On top of that the plotly html plot are quite slow to render.Issues en vrachttps://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/issues/193Functional test: check process and image, nextflow stub and gitlab CI/CD2023-01-25T10:07:39+01:00VIENNE MAINAFunctional test: check process and image, nextflow stub and gitlab CI/CDfaciliter la maintenance des tests fonctionnels pour les process, peut etre utiliser :
https://www.nextflow.io/docs/edge/process.html#stub
mais c'est encore experimental
reduire la taille des bases de données test, notamment kaijufaciliter la maintenance des tests fonctionnels pour les process, peut etre utiliser :
https://www.nextflow.io/docs/edge/process.html#stub
mais c'est encore experimental
reduire la taille des bases de données test, notamment kaijuIssues en vracVIENNE MAINAVIENNE MAINAhttps://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/issues/168Databases saving folder2022-12-02T15:20:55+01:00MARTIN PierreDatabases saving folderAdd databases saving option?
Test download databases with new publishdir:
-> [publishdir "move" ?]
-> [publishdir symbolic link?]
-> [option: --copy_databases with "false" by default?]
-> [DL all DB to use in separate workflow (as o...Add databases saving option?
Test download databases with new publishdir:
-> [publishdir "move" ?]
-> [publishdir symbolic link?]
-> [option: --copy_databases with "false" by default?]
-> [DL all DB to use in separate workflow (as option of metag)]Issues en vracVIENNE MAINAVIENNE MAINAhttps://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/issues/164Eggnog mapper chunks2023-12-18T14:20:06+01:00MARTIN PierreEggnog mapper chunksAs for assembly filter or maybe diamond, do eggnog by chunksAs for assembly filter or maybe diamond, do eggnog by chunksIssues en vrac