assertcurrent_query_idnotinquery_ids_processed,f"Queries are not sorted in blast result. Query {current_query_id} is found in different part of the file."
logging.info(f'{query_count_with_low_hit} queries ({100*query_count_with_low_hit/(query_i+1):.2f}%) have low hits that do not pass identity ({min_identity}%) or coverage ({min_coverage}%) thresholds')
logging.info(f'{best_hit_count} best hits of {query_i+1-query_count_with_low_hit} queries have been written in {outfile}.')
@@ -45,7 +45,7 @@ A report html file is generated at the end of the workflow with [MultiQC](https:
The pipeline is built using [Nextflow,](https://www.nextflow.io/docs/latest/index.html#) a bioinformatics workflow tool to run tasks across multiple compute infrastructures in a very portable manner.
Two[Singularity](https://sylabs.io/docs/) containers are available making installation trivial and results highly reproducible.
Three[Singularity](https://sylabs.io/docs/) containers are available making installation trivial and results highly reproducible.
Three files (`metagwgs.sif`, `mosdepth.sif` and `eggnog_mapper.sif`) must have been downloaded.
Three files (`metagwgs.sif`, `eggnog_mapper.sif` and `mosdepth.sif`) must have been downloaded.
### Solution 2: build the two containers.
### Solution 2: build the three containers.
**In the directory you want tu run the workflow**, where you have downloaded metagWGS source files, go to `metagwgs/env/` directory, and follow [these explanations](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/wikis/Singularity%20container) to build the two containers. You need two files by container to build them. These files are into the `metagwgs/env/` folder and you can read them here:
**In the directory you want tu run the workflow**, where you have downloaded metagWGS source files, go to `metagwgs/env/` directory, and follow [these explanations](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/wikis/Singularity%20container) to build the three containers. You need three files by container to build them. These files are into the `metagwgs/env/` folder and you can read them here:
1. See [Installation page](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/installation.md) to install metagWGS. Make sure you are in the directory where you downloaded `metagwgs` source files and added into `metagwgs/dev` the two Singularity images `metagwgs.sif` and `eggnog_mapper.sif`.
1. See [Installation page](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/installation.md) to install metagWGS. Make sure you are in the directory where you downloaded `metagwgs` source files and added into `metagwgs/env` the two Singularity images `metagwgs.sif` and `eggnog_mapper.sif`.
2. metagWGS is still under development: you need to use the `dev` branch of the metagwgs repository.
...
...
@@ -108,7 +108,7 @@ It allows you to choose the configuration profile among:
These profiles are associated to different configuration files developped [in this directory](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/tree/dev/conf). The `base.config` file available in this directory is the base configuration load in first which is crushed by indications of the profile you use. See [here](https://genotoul-bioinfo.pages.mia.inra.fr/use-nextflow-nfcore-course/nfcore/profiles.html) for more explanations.
### 2. Usefull options
### 2. Useful options
#### `-resume`
...
...
@@ -223,7 +223,7 @@ No parameter available for this substep.
*`--kaiju_db_dir "PATH/directory"`: if you have already downloaded the kaiju database, indicate its directory. **WARNING:** you will not be able to use kaiju database built with `kaiju-makedb` command line. Default: `--kaiju_db_dir false`. See **WARNING 2**.
*`--kaiju_db "http://kaiju.binf.ku.dk/database/CHOOSEN_DATABASE.tgz"`: allows metagWGS to download kaiju database of your choice. The list of kaiju databases is available in [kaiju website](http://kaiju.binf.ku.dk/server), in the blue insert on the left side. Default: `--kaiju_db false`. See **WARNING 2**.
*`--kaiju_db "http://kaiju.binf.ku.dk/database/CHOOSEN_DATABASE.tgz"`: allows metagWGS to download kaiju database of your choice. The list of kaiju databases is available in [kaiju website](http://kaiju.binf.ku.dk/server), in the blue insert on the left side. Default: `--kaiju_db https://kaiju.binf.ku.dk/database/kaiju_db_refseq_2021-02-26.tgz`. See **WARNING 2**.
*`--skip_kaiju`: allows to skip taxonomic affiliation of reads with kaiju. Krona files will not be generated. Use: `--skip_kaiju`. See **WARNING 2**.
...
...
@@ -237,7 +237,7 @@ No parameter available for this substep.
**WARNING 4:** the user has choice between `metaspades` or `megahit` for `--assembly` parameter. The choice can be based on CPUs and memory availability: `metaspades` needs more CPUs and memory than `megahit` but our tests showed that assembly metrics are better for `metaspades` than `megahit`.
*`--metaspades_mem [memory_value]`: memory (in G) used by `metaspades` process. Default: `440`.
*`--metaspades_mem [memory_value]`: memory (in Gb) used by `metaspades` process. Default: `440`.
#### **`03_filtering` step:**
...
...
@@ -288,6 +288,8 @@ No parameters.
*`--taxdump "FTP_PATH_TO_taxdump.tar.gz"`: indicates the FTP adress of the NCBI file `taxdump.tar.gz`. Default `"ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz"`.
*`--taxonomy_dir "PATH/directory": if you have already downloaded the accession2taxid and taxdump databases, indicate their parent directory. Default: `--taxonomy_dir false`.`
#### **`08_binning` step:**
**WARNING 13:**`08_binning` step depends on `01_clean_qc`, `02_assembly`, `03_filtering` (if you use it), `04_structural_annot` and `05_alignment` steps. You need to use mandatory files of these six steps to run `08_binning`. See [II. Input files](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/usage.md#ii-input-files) and WARNINGS from 1 to 9.
1. metagWGS is still under development: you need to use the `dev-test` branch of the metagwgs repository.
Run:
```bash
cd metagwgs
git checkout dev-test
git pull
cd functional_tests
```
2. Make sure you are in the directory where you downloaded `metagwgs` source files and added the three mandatory Singularity images in `metagwgs/env`
3. Make sure you downloaded all the required data files for metagwgs. If not, they will be downloaded by the pipeline each time you run it in a new project.
4. Download the test datasets (expected results + test fastq) from [link-to-test-datasets].
## II. Functional tests
Each step of metagwgs produces a series of files. We want to be able to determine if the modifications we perform on metagwgs have an impact on any of these files (presence, contents, format, ...).
Two datasets are currently available for these functional tests: test (from [metagwgs/test](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/tree/master/test)) and MAG (from [nf-core/test-datasets](https://github.com/nf-core/test-datasets/tree/mag/test_data))
When launching the functional test script, the files contained in *exp_dir* (in ./test_expected_logs) are scanned and, for each possible file extension, a test if performed on an expected file against it's observed version (in ./results).
### Test methods
5 simple test methods are used:
diff: simple bash difference between two files
`diff exp_path obs_path`
zdiff: simple bash difference between two gzipped files
`zdiff exp_path obs_path`
no_header_diff: remove the headers of .annotations and .seed_orthologs files
Nextflow metagwgs can be launched on any cluster manager (sge, slurm, ...). The script for functional tests can use a provided script containing the command to launch Nextflow on a cluster.
Exemples below use the slurm job manager and launche all 7 steps of metagwgs to ensure all parts of main.nf work as intended.
### Launch with script
Create a new directory (project-directory) containing a shell script to be used by functional tests: