erreur avec le script d'accés au données de genbank
Il utilise la banque genbank et tourne longtemps avant de générer l'erreur suivante :
voir les scripts
- softwares/scripts/extractGB.py
- https://forgemia.inra.fr/omnicrobe/text-mining-workflow/-/blob/dev/process_GenBank_corpus.snakefile#L18
(snakemake-5.13.0-env) mba@front:/work_projet/omnicrobe_data/tm_workflow/text-mining-workflow$ cat log/snakejob.extract_genbank_data.2.sh.e322424
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 64
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 extract_genbank_data
1
[Wed Apr 13 03:12:47 2022]
rule extract_genbank_data:
input: ancillaries/extended-microorganisms-taxonomy/taxa+id_microorganisms.txt, /db/genbank/current/flat
output: corpora/genbank/GenBank_extraction_20210127.tsv
jobid: 0
python3 softwares/scripts/extractGB.py --taxoref ancillaries/extended-microorganisms-taxonomy/taxa+id_microorganisms.txt --dbpath /db/genbank/current/flat --fout corpora/genbank/GenBank_extraction_20210127.tsv
Activating conda environment: /work_projet/omnicrobe_data/tm_workflow/text-mining-workflow/.snakemake/conda/e6e60c83
Traceback (most recent call last):
File "/work_projet/omnicrobe_data/tm_workflow/text-mining-workflow/softwares/scripts/extractGB.py", line 126, in <module>
accession, length, species, strain, taxID, journal, source, host, country = get_values(record)
File "/work_projet/omnicrobe_data/tm_workflow/text-mining-workflow/softwares/scripts/extractGB.py", line 48, in get_values
taxID = feature.qualifiers['db_xref'][0].replace('taxon:', '')
KeyError: 'db_xref'
[Wed Apr 13 07:34:02 2022]
Error in rule extract_genbank_data:
jobid: 0
output: corpora/genbank/GenBank_extraction_20210127.tsv
conda-env: /work_projet/omnicrobe_data/tm_workflow/text-mining-workflow/.snakemake/conda/e6e60c83
shell:
python3 softwares/scripts/extractGB.py --taxoref ancillaries/extended-microorganisms-taxonomy/taxa+id_microorganisms.txt --dbpath /db/genbank/current/flat --fout corpora/genbank/GenBank_extraction_20210127.tsv
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Removing output files of failed job extract_genbank_data since they might be corrupted:
corpora/genbank/GenBank_extraction_20210127.tsv
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message