epmc-crawler
Get articles (XML fulltext) from https://europepmc.org/ by using dois, pmid or/and pmcids.
install
git clone https://forgemia.inra.fr/mandiayba/epmc-crawler.git
cd epmc-crawler
conda env conda env create -f softwares/envs/snakemake-5.13.0-env.yaml
usage (on migale)
- corpus from a list of dois
conda activate snakemake-5.13.0-env
snakemake --nolock --verbose --printshellcmds --use-singularity --use-conda --reason --latency-wait 60 --jobs 2 --snakefile get_corpus_from_doiss.snakefile all --cluster "qsub -v PYTHONPATH='' -l mem_free=4G -V -cwd -e log/ -o log/ -q short.q -pe thread 2" --config --config DOIS_FILE=data/pmids.txt --config CORPUS_FOLDER=data/corpus_1
- corpus from a list of pmids
conda activate snakemake-5.13.0-env
snakemake --nolock --verbose --printshellcmds --use-singularity --use-conda --reason --latency-wait 60 --jobs 2 --snakefile get_corpus_from_pmids.snakefile all --cluster "qsub -v PYTHONPATH='' -l mem_free=4G -V -cwd -e log/ -o log/ -q short.q -pe thread 2" --config --config PMID_FILE=data/pmids.txt --config CORPUS_FOLDER=data/corpus_2
- corpus from a list of pmcids
conda activate snakemake-5.13.0-
snakemake --nolock --verbose --printshellcmds --use-singularity --use-conda --reason --latency-wait 60 --jobs 2 --snakefile get_corpus_from_pmcids.snakefile all --cluster "qsub -v PYTHONPATH='' -l mem_free=4G -V -cwd -e log/ -o log/ -q short.q -pe thread 2" --config --config PMCID_FILE=data/pmcids.txt --config CORPUS_FOLDER=data/corpus_3
other usages
- joint annotations results with other metadata using PMCID
snakemake --nolock --verbose --printshellcmds --use-singularity --use-conda --reason --latency-wait 60 --jobs 2 --snakefile joint_results.snakefile all --cluster "qsub -v PYTHONPATH='' -l mem_free=4G -V -cwd -e log/ -o log/ -q short.q -pe thread 2"
todo
- apply to list of dois, pmids, pmcids collected by Open16S project members
- integrate the pipelines to the omnicrobe workflow
- handle the exceptions (missing dois, pmids, texts in epmc, add constraints to enhance the extractions of articles)