Commit e30f00a2 authored by MARTIN Pierre's avatar MARTIN Pierre
Browse files

multiple fixes

parent 628aee35
...@@ -73,27 +73,13 @@ At the end of the build, two files (`metagwgs.sif` and `eggnog_mapper.sif`) must ...@@ -73,27 +73,13 @@ At the end of the build, two files (`metagwgs.sif` and `eggnog_mapper.sif`) must
**WARNING:** to ensure Nextflow can find the _.sif_ files, we encourage you to change the _nextflow.config_ file in metagWGS to contain these lines: **WARNING:** to ensure Nextflow can find the _.sif_ files, we encourage you to change the _nextflow.config_ file in metagWGS to contain these lines:
``` ```
process { process {
container = '<PATH>/metagwgs.sif' container = '$SING_IMG_FOLDER/metagwgs.sif'
withLabel: EGGNOG { withLabel: EGGNOG {
container = '<PATH>/eggnog_mapper.sif' container = '$SING_IMG_FOLDER/eggnog_mapper.sif'
} }
} }
``` ```
Where \<PATH\> leads to the directory where the singularity images are built/downloaded. Where $SING_IMG_FOLDER leads to the directory where the singularity images are built/downloaded.
**WARNING:** to ensure Nextflow can find the _.sif_ files, we encourage you to change the _nextflow.config_ file in metagWGS at these lines:
```
process {
container = '<PATH>/metagwgs.sif'
withLabel: eggnog {
container = '<PATH>/eggnog_mapper.sif'
}
withLabel: mosdepth {
container = '<PATH>/mosdepth.sif'
}
}
```
Where \<PATH\> leads to the directory where the singularity images are built/downloaded.
## V. Use metagWGS ## V. Use metagWGS
......
...@@ -3,15 +3,24 @@ ...@@ -3,15 +3,24 @@
## I. Pre-requisites ## I. Pre-requisites
1. Install metagwgs as described here: [installation doc](../docs/installation.md) 1. Install metagwgs as described here: [installation doc](../docs/installation.md)
2. Get datasets: three datasets are currently available for these functional tests at `https://forgemia.inra.fr/genotoul-bioinfo/metagwgs-test-datasets.git`: small, mag and hifi ([descriptions here](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs-test-datasets/-/blob/master/README.md)) 2. Get datasets: two datasets are currently available for these functional tests at `https://forgemia.inra.fr/genotoul-bioinfo/metagwgs-test-datasets.git`
``` Replace "\<dataset\>" with either "small" or "mag":
git clone git@forgemia.inra.fr:genotoul-bioinfo/metagwgs-test-datasets.git ```
git clone --branch <dataset> git@forgemia.inra.fr:genotoul-bioinfo/metagwgs-test-datasets.git
or or
wget https://forgemia.inra.fr/genotoul-bioinfo/metagwgs-test-datasets/-/archive/metagwgs-test-datasets.tar.gz wget https://forgemia.inra.fr/genotoul-bioinfo/metagwgs-test-datasets/-/archive/<dataset>/metagwgs-test-datasets-<dataset>.tar.gz
``` ```
3. Get data banks: download [this archive](http://genoweb.toulouse.inra.fr/~choede/FT_banks_2021-10-19.tar.gz) and decompress its contents in any folder. This archive contains data banks for:
- **Kaiju** (_kaijudb_refseq_2020-05-25_)
- **Diamond** (_refseq_bacteria_2021-05-20_)
- **NCBI Taxonomy** (_taxonomy_2021-08-23_)
- **Eggnog Mapper** (_eggnog-mapper-2.0.4-rf1_)
> Use those banks to reproduce the outputs of functional tests.
## II. Run functional tests ## II. Run functional tests
...@@ -23,7 +32,7 @@ To launch functional tests, you need to be located at the root of the folder whe ...@@ -23,7 +32,7 @@ To launch functional tests, you need to be located at the root of the folder whe
cd test_folder cd test_folder
python <metagwgs-src>/functional_tests/main.py -step 07_taxo_affi -exp_dir metagwgs-test-datasets/small/output -obs_dir ./results python <metagwgs-src>/functional_tests/main.py -step 07_taxo_affi -exp_dir metagwgs-test-datasets/small/output -obs_dir ./results
``` ```
- by providing a script which will launch the nextflow pipeline [see example](./launch_example.sh) - by providing a script which will launch the nextflow pipeline [see example](./launch_example.sh) (this example is designed for the "small" dataset with --min_contigs_cpm>1000, using slurm)
``` ```
mkdir test_folder mkdir test_folder
cd test_folder cd test_folder
...@@ -31,6 +40,8 @@ cp <metagwgs-src>/functional_tests/launch_example.sh ./ ...@@ -31,6 +40,8 @@ cp <metagwgs-src>/functional_tests/launch_example.sh ./
python <metagwgs-src>/functional_tests/main.py -step 07_taxo_affi -exp_dir metagwgs-test-datasets/small/output -obs_dir ./results --script launch_example.sh python <metagwgs-src>/functional_tests/main.py -step 07_taxo_affi -exp_dir metagwgs-test-datasets/small/output -obs_dir ./results --script launch_example.sh
``` ```
>**NOTE: more information on the command used to produce each dataset in [small](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs-test-datasets/-/tree/small) and [mag](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs-test-datasets/-/tree/mag) READMEs**
## III. Output ## III. Output
A ft_\[step\].log file is created for each step of metagwgs. It contains information about each test performed on given files. A ft_\[step\].log file is created for each step of metagwgs. It contains information about each test performed on given files.
...@@ -38,9 +49,9 @@ A ft_\[step\].log file is created for each step of metagwgs. It contains informa ...@@ -38,9 +49,9 @@ A ft_\[step\].log file is created for each step of metagwgs. It contains informa
Exemple with ft_01_clean_qc.log: Exemple with ft_01_clean_qc.log:
``` ```
Expected directory: /work/pmartin2/metaG/test_expected_logs/01_clean_qc Expected directory: metagwgs-test-datasets/output/01_clean_qc
vs vs
Observed directory: /work/pmartin2/metaG/refac_09_13.2/results/01_clean_qc Observed directory: results/01_clean_qc
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
......
...@@ -112,8 +112,9 @@ def check_files(exp_dir, obs_dir, step, methods, verbose): ...@@ -112,8 +112,9 @@ def check_files(exp_dir, obs_dir, step, methods, verbose):
expected_path = path.join(expected_prefix, file_path) expected_path = path.join(expected_prefix, file_path)
observed_path = path.join(observed_prefix, file_path) observed_path = path.join(observed_prefix, file_path)
print("exp:\t",expected_path) if verbose:
print("obs:\t",observed_path) print("exp:\t",expected_path)
print("obs:\t",observed_path)
file_name = path.basename(file_path) file_name = path.basename(file_path)
file_extension = path.splitext(file_name)[1] file_extension = path.splitext(file_name)[1]
......
...@@ -51,7 +51,7 @@ include { MULTIQC } from './modules/multiqc' ...@@ -51,7 +51,7 @@ include { MULTIQC } from './modules/multiqc'
S03_FILTERING options: S03_FILTERING options:
--stop_at_filtering Stop the pipeline at this step --stop_at_filtering Stop the pipeline at this step
--skip_filtering Skip this step --skip_filtering Skip this step
--min_contigs_cpm [cutoff] CPM cutoff (Count Per Million) to filter contigs with low number of reads. Default: 10. --min_contigs_cpm [cutoff] CPM cutoff (Count Per Million) to filter contigs with low number of reads. Default: 1.
S04_STRUCTURAL_ANNOT options: S04_STRUCTURAL_ANNOT options:
--stop_at_structural_annot Stop the pipeline at this step --stop_at_structural_annot Stop the pipeline at this step
...@@ -69,7 +69,7 @@ include { MULTIQC } from './modules/multiqc' ...@@ -69,7 +69,7 @@ include { MULTIQC } from './modules/multiqc'
S07_TAXO_AFFI options: S07_TAXO_AFFI options:
--skip_taxo_affi Skip this step --skip_taxo_affi Skip this step
--accession2taxid FTP adress of file prot.accession2taxid.gz. Default: "ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz". --accession2taxid FTP adress of file prot.accession2taxid.gz. Default: "ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.FULL.gz".
--taxdump FTP adress of file taxdump.tar.gz. Default: "ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz". --taxdump FTP adress of file taxdump.tar.gz. Default: "ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz".
--taxonomy_dir Directory if taxdump and accession2taxid already downloaded ("PATH/directory"). --taxonomy_dir Directory if taxdump and accession2taxid already downloaded ("PATH/directory").
...@@ -224,7 +224,6 @@ workflow { ...@@ -224,7 +224,6 @@ workflow {
if ( params.type.toUpperCase() == "SR" ) { if ( params.type.toUpperCase() == "SR" ) {
ch_multiqc_config = file(params.sr_multiqc_config, checkIfExists: true) ch_multiqc_config = file(params.sr_multiqc_config, checkIfExists: true)
println("Entering SR")
ch_inputs ch_inputs
.map { item -> [ item.sample, item.fastq_1, item.fastq_2 ] } .map { item -> [ item.sample, item.fastq_1, item.fastq_2 ] }
.set { ch_reads } .set { ch_reads }
...@@ -259,7 +258,6 @@ workflow { ...@@ -259,7 +258,6 @@ workflow {
else if ( params.type.toUpperCase() == "HIFI" ) { else if ( params.type.toUpperCase() == "HIFI" ) {
ch_multiqc_config = file(params.hifi_multiqc_config, checkIfExists: true) ch_multiqc_config = file(params.hifi_multiqc_config, checkIfExists: true)
println("Entering HiFi")
ch_inputs.map { item -> [ item.sample, item.assembly ] } // [sample, assembly] ch_inputs.map { item -> [ item.sample, item.assembly ] } // [sample, assembly]
.set { ch_assembly } .set { ch_assembly }
......
process EGGNOG_MAPPER { process EGGNOG_MAPPER {
publishDir "${params.outdir}/06_func_annot/06_3_functional_annotation", mode: 'copy' publishDir "${params.outdir}/06_func_annot/06_3_functional_annotation", mode: 'copy'
tag "${sampleId}"
label 'EGGNOG' label 'EGGNOG'
input: input:
......
...@@ -20,7 +20,6 @@ process FEATURE_COUNTS { ...@@ -20,7 +20,6 @@ process FEATURE_COUNTS {
// Create table with sum of reads for each global cluster of genes in each sample. // Create table with sum of reads for each global cluster of genes in each sample.
process QUANTIFICATION_TABLE { process QUANTIFICATION_TABLE {
publishDir "${params.outdir}/06_func_annot/06_2_quantification", mode: 'copy'
label 'PYTHON' label 'PYTHON'
input: input:
......
...@@ -7,7 +7,6 @@ workflow DATABASES { ...@@ -7,7 +7,6 @@ workflow DATABASES {
ch_host_fasta = Channel.empty() ch_host_fasta = Channel.empty()
ch_host_index = Channel.empty() ch_host_index = Channel.empty()
if ( !skip_clean && !params.skip_host_filter ) { if ( !skip_clean && !params.skip_host_filter ) {
println("Creating host db")
ch_host_fasta = Channel.value(file(params.host_fasta)) ch_host_fasta = Channel.value(file(params.host_fasta))
if ( !params.host_index ) { if ( !params.host_index ) {
INDEX_HOST(ch_host_fasta) INDEX_HOST(ch_host_fasta)
...@@ -20,7 +19,6 @@ workflow DATABASES { ...@@ -20,7 +19,6 @@ workflow DATABASES {
ch_kaiju_db = Channel.empty() ch_kaiju_db = Channel.empty()
if ( !skip_clean && !params.skip_kaiju ) { //kaiju_db if ( !skip_clean && !params.skip_kaiju ) { //kaiju_db
println("Creating kaiju db")
if ( !params.kaiju_db_dir && params.kaiju_db_url ) { if ( !params.kaiju_db_dir && params.kaiju_db_url ) {
INDEX_KAIJU(params.kaiju_db_url) INDEX_KAIJU(params.kaiju_db_url)
ch_kaiju_db = INDEX_KAIJU.out.kaiju_db ch_kaiju_db = INDEX_KAIJU.out.kaiju_db
...@@ -44,7 +42,6 @@ workflow DATABASES { ...@@ -44,7 +42,6 @@ workflow DATABASES {
ch_eggnog = Channel.empty() ch_eggnog = Channel.empty()
if ( !params.stop_at_clean && !params.stop_at_filtering && !params.stop_at_assembly && !params.stop_at_structural_annot && !params.skip_func_annot ) { //eggnog_mapper_db if ( !params.stop_at_clean && !params.stop_at_filtering && !params.stop_at_assembly && !params.stop_at_structural_annot && !params.skip_func_annot ) { //eggnog_mapper_db
println("Creating eggnog db")
if( params.eggnog_mapper_db_dir != "" ) { if( params.eggnog_mapper_db_dir != "" ) {
ch_eggnog = Channel.fromPath(params.eggnog_mapper_db_dir, checkIfExists: true).first() ch_eggnog = Channel.fromPath(params.eggnog_mapper_db_dir, checkIfExists: true).first()
} }
...@@ -59,7 +56,6 @@ workflow DATABASES { ...@@ -59,7 +56,6 @@ workflow DATABASES {
ch_taxonomy = Channel.empty() ch_taxonomy = Channel.empty()
if ( !params.stop_at_clean && !params.stop_at_filtering && !params.stop_at_assembly && !params.stop_at_structural_annot && !params.skip_taxo_affi ) { if ( !params.stop_at_clean && !params.stop_at_filtering && !params.stop_at_assembly && !params.stop_at_structural_annot && !params.skip_taxo_affi ) {
println("Creating taxonomy db")
if( !params.taxonomy_dir ) { if( !params.taxonomy_dir ) {
ch_accession2taxid = Channel.value(params.accession2taxid) ch_accession2taxid = Channel.value(params.accession2taxid)
ch_taxdump = Channel.value(params.taxdump) ch_taxdump = Channel.value(params.taxdump)
......
...@@ -21,7 +21,6 @@ workflow SHARED { ...@@ -21,7 +21,6 @@ workflow SHARED {
ch_prot_length = Channel.empty() ch_prot_length = Channel.empty()
if ( !params.stop_at_clean && !params.stop_at_assembly && !params.stop_at_filtering ) { if ( !params.stop_at_clean && !params.stop_at_assembly && !params.stop_at_filtering ) {
println("S04_STRUCTURAL_ANNOT")
S04_STRUCTURAL_ANNOT ( assembly ) S04_STRUCTURAL_ANNOT ( assembly )
ch_prokka_ffn = S04_STRUCTURAL_ANNOT.out.ffn ch_prokka_ffn = S04_STRUCTURAL_ANNOT.out.ffn
ch_prokka_faa = S04_STRUCTURAL_ANNOT.out.faa ch_prokka_faa = S04_STRUCTURAL_ANNOT.out.faa
...@@ -39,7 +38,6 @@ workflow SHARED { ...@@ -39,7 +38,6 @@ workflow SHARED {
ch_m8 = Channel.empty() ch_m8 = Channel.empty()
ch_sam_coverage = Channel.empty() ch_sam_coverage = Channel.empty()
if ( !params.stop_at_clean && !params.stop_at_assembly && !params.stop_at_filtering && !params.stop_at_structural_annot ) { if ( !params.stop_at_clean && !params.stop_at_assembly && !params.stop_at_filtering && !params.stop_at_structural_annot ) {
println("S05_ALIGNMENT")
S05_ALIGNMENT ( ch_contigs_and_reads, ch_prokka_faa ) S05_ALIGNMENT ( ch_contigs_and_reads, ch_prokka_faa )
ch_bam = S05_ALIGNMENT.out.bam ch_bam = S05_ALIGNMENT.out.bam
ch_m8 = S05_ALIGNMENT.out.m8 ch_m8 = S05_ALIGNMENT.out.m8
...@@ -49,14 +47,12 @@ workflow SHARED { ...@@ -49,14 +47,12 @@ workflow SHARED {
ch_quant_report = Channel.empty() ch_quant_report = Channel.empty()
ch_v_eggnogmapper = Channel.empty() ch_v_eggnogmapper = Channel.empty()
if ( !params.stop_at_clean && !params.stop_at_assembly && !params.stop_at_filtering && !params.stop_at_structural_annot && !params.skip_func_annot ) { if ( !params.stop_at_clean && !params.stop_at_assembly && !params.stop_at_filtering && !params.stop_at_structural_annot && !params.skip_func_annot ) {
println("S06_FUNC_ANNOT")
S06_FUNC_ANNOT ( ch_prokka_ffn, ch_prokka_faa, ch_prokka_gff, ch_bam, ch_m8, eggnog_db ) S06_FUNC_ANNOT ( ch_prokka_ffn, ch_prokka_faa, ch_prokka_gff, ch_bam, ch_m8, eggnog_db )
ch_quant_report = S06_FUNC_ANNOT.out.quant_report ch_quant_report = S06_FUNC_ANNOT.out.quant_report
ch_v_eggnogmapper = S06_FUNC_ANNOT.out.v_eggnogmapper ch_v_eggnogmapper = S06_FUNC_ANNOT.out.v_eggnogmapper
} }
if ( !params.stop_at_clean && !params.stop_at_assembly && !params.stop_at_filtering && !params.stop_at_structural_annot && !params.skip_taxo_affi ) { if ( !params.stop_at_clean && !params.stop_at_assembly && !params.stop_at_filtering && !params.stop_at_structural_annot && !params.skip_taxo_affi ) {
println("S07_TAXO_AFFI")
S07_TAXO_AFFI ( taxonomy, ch_m8, ch_sam_coverage, ch_prot_length) S07_TAXO_AFFI ( taxonomy, ch_m8, ch_sam_coverage, ch_prot_length)
} }
......
...@@ -28,7 +28,6 @@ workflow SHORT_READS { ...@@ -28,7 +28,6 @@ workflow SHORT_READS {
ch_filtered_report = Channel.empty() ch_filtered_report = Channel.empty()
if ( !params.skip_clean ) { if ( !params.skip_clean ) {
println("S01_CLEAN_QC")
S01_CLEAN_QC ( S01_CLEAN_QC (
reads, reads,
paired, paired,
...@@ -50,7 +49,6 @@ workflow SHORT_READS { ...@@ -50,7 +49,6 @@ workflow SHORT_READS {
ch_dedup = Channel.empty() ch_dedup = Channel.empty()
if ( !params.stop_at_clean ) { if ( !params.stop_at_clean ) {
println("S02_ASSEMBLY")
S02_ASSEMBLY ( ch_preprocessed_reads ) S02_ASSEMBLY ( ch_preprocessed_reads )
ch_assembly = S02_ASSEMBLY.out.assembly ch_assembly = S02_ASSEMBLY.out.assembly
ch_dedup = S02_ASSEMBLY.out.dedup ch_dedup = S02_ASSEMBLY.out.dedup
...@@ -60,7 +58,6 @@ workflow SHORT_READS { ...@@ -60,7 +58,6 @@ workflow SHORT_READS {
} }
if ( !params.stop_at_clean && !params.stop_at_assembly && !params.skip_filtering ) { if ( !params.stop_at_clean && !params.stop_at_assembly && !params.skip_filtering ) {
println("S03_FILTERING")
ch_min_contigs_cpm = Channel.value(params.min_contigs_cpm) ch_min_contigs_cpm = Channel.value(params.min_contigs_cpm)
ch_assembly ch_assembly
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment