Commit 39f4c22c authored by MARTIN Pierre's avatar MARTIN Pierre
Browse files

Functional tests merge request

parent 16c765f7
......@@ -106,4 +106,4 @@ merge.drop('#query_name', inplace=True, axis=1)
merge.drop("qseqid", inplace=True, axis=1)
# Write merge data frame in output file.
merge.to_csv(output_file, sep="\t", index=False)
merge.to_csv(args.output_file, sep="\t", index=False)
......@@ -57,7 +57,7 @@ with open(args.list_of_input_files) as finput_list:
sample_files = finput_list.read().split()
# Merge results for all samples by lineage.
for (sample_idx,sample_path) in enumerate(sample_files):
for (sample_idx,sample_path) in enumerate(sorted(sample_files)):
print(sample_idx)
if(sample_idx==0):
merge = pd.read_csv(sample_path, delimiter='\t', dtype=str)
......
......@@ -45,7 +45,7 @@ A report html file is generated at the end of the workflow with [MultiQC](https:
The pipeline is built using [Nextflow,](https://www.nextflow.io/docs/latest/index.html#) a bioinformatics workflow tool to run tasks across multiple compute infrastructures in a very portable manner.
Two [Singularity](https://sylabs.io/docs/) containers are available making installation trivial and results highly reproducible.
Three [Singularity](https://sylabs.io/docs/) containers are available making installation trivial and results highly reproducible.
## Documentation
......
......@@ -14,16 +14,111 @@
2. Make sure you are in the directory where you downloaded `metagwgs` source files and added the three mandatory Singularity images in `metagwgs/env`
3. Make sure you downloaded all the required data files for metagwgs. If not, they will be downloaded by the pipeline each time you run it in a different folder.
3. Make sure you downloaded all the required data files for metagwgs. If not, they will be downloaded by the pipeline each time you run it in a new project.
4. Download the expected results directory for test files from [link-to-exp-dir]
4. Download the test datasets (expected results + test fastq) from [link-to-test-datasets].
## II. Launch test
## II. Functional tests
The script can be used alongside a homemade script containing the launching command of MetagWGS on the computational cluster of your choice.
Each step of metagwgs produces a series of files. We want to be able to determine if the modifications we perform on metagwgs have an impact on any of these files (presence, contents, format, ...).
If you want to, you can
Two datasets are currently available for these functional tests: test (from [metagwgs/test](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/tree/master/test)) and MAG (from [nf-core/test-datasets](https://github.com/nf-core/test-datasets/tree/mag/test_data))
When launching the functional test script, the files contained in *exp_dir* (in ./test_expected_logs) are scanned and, for each possible file extension, a test if performed on an expected file against it's observed version (in ./results).
### Test methods
5 simple test methods are used:
diff: simple bash difference between two files
`diff exp_path obs_path`
zdiff: simple bash difference between two gzipped files
`zdiff exp_path obs_path`
no_header_diff: remove the headers of .annotations and .seed_orthologs files
`diff <(grep -w "^?#" exp_path) <(grep -w "^?#" obs_path)`
cut_diff: exception for cutadapt.log file
`diff <(tail -n+6 exp_path) <(tail -n+6 obs_path)`
not_empty: in python, check if file is empty
`test = path.getsize(obs_path) > 0`
## III. Launch test
Nextflow metagwgs can be launched on any cluster manager (sge, slurm, ...). The script for functional tests can use a provided script containing the command to launch Nextflow on a cluster.
Exemples below use the slurm job manager and launche all 7 steps of metagwgs to ensure all parts of main.nf work as intended.
### Launch with script
### Launch without script
\ No newline at end of file
Create a new directory (project-directory) containing a shell script to be used by functional tests:
```
#!/bin/bash
sbatch -W -p workq -J metagwgs --mem=6G \
--wrap="module load bioinfo/Nextflow-v21.04.1 ; module load system/singularity-3.7.3 ; nextflow run -profile test_genotoul_workq [work_dir]/metaG/metagwgs/main.nf --step '01_clean_qc,02_assembly,03_filtering,04_structural_annot,05_alignment,06_func_annot,07_taxo_affi' --reads '../metagwgs/test/*_{R1,R2}.fastq.gz' --host_fasta '[work_dir]/human_ref/Homo_sapiens.GRCh38_chr21.fa' --host_bwa_index '[work_dir]/human_ref/Homo_sapiens.GRCh38_chr21.fa.{amb,ann,bwt,pac,sa}' --kaiju_db_dir '/bank/kaijudb/kaijudb_refseq_2020-05-25' --taxonomy_dir '[work_dir]/taxonomy' --eggnog_mapper_db_dir '/bank/eggnog-mapper/eggnog-mapper-2.0.4-rf1/data' --assembly metaspades --diamond_bank "[work_dir]/refseq_bacteria_2021-05-20/refseq_bacteria.dmnd" -with-report -with-timeline -with-trace -with-dag -resume"
```
Then launch this command:
```
cd project-directory
python [work_dir]/metaG/metagwgs/functional_tests/main.py -step 07_taxo_affi -exp_dir [work_dir]/test_expected_logs -obs_dir ./results --script launch_07_taxo_affi.sh
```
### Launch without script
If you already have launched metagwgs [see metagwgs README and usage] on test data:
```
cd project-directory
python [work_dir]/metaG/metagwgs/functional_tests/main.py -step 07_taxo_affi -exp_dir [work_dir]/test_expected_logs -obs_dir ./results
```
## Output
A ft_\[step\].log file is created for each step of metagwgs. It contains information about each test performed on given files.
Exemple with ft_01_clean_qc.log:
```
Expected directory: /work/pmartin2/metaG/test_expected_logs/01_clean_qc
vs
Observed directory: /work/pmartin2/metaG/refac_09_13.2/results/01_clean_qc
------------------------------------------------------------------------------
File: 01_1_cleaned_reads/cleaned_a_R1.fastq.gz
Test method: zdiff
Test result: Passed
------------------------------------------------------------------------------
File: 01_1_cleaned_reads/cleaned_a_R2.fastq.gz
Test method: zdiff
Test result: Passed
...
=========================================
-----------------------------------------
Testing the 01_clean_qc step of metagWGS:
Total: 36
Passed: 36 (100.0%)
Missed: 0 (0.0%)
Not tested: 0
-----------------------------------------
=========================================
```
If a test resulted in 'Failed' instead of 'Passed', the stdout is printed in log.
Sometimes, files are not tested because present in exp_dir but not in obs_dir. Then a log ft_\[step\].not_tested if created containing names of missing files. In 02_assembly, there are two possible assembly programs that can be used: metaspades and megahit, resulting in this .not_tested log file. Not tested files are not counted in missed count.
\ No newline at end of file
......@@ -238,8 +238,6 @@ def test_file(exp_path, obs_path, method):
elif method == 'taxo_diff':
command = 'diff {} {}'.format(exp_path, obs_path)
# command = 'diff <(sort {}) <(sort {})'.format(exp_path, obs_path)
# command = 'diff <(cut -f1 {} | sort) <(cut -f1 {} | sort)'.format(exp_path, obs_path)
process = subprocess.Popen(command, stdout = subprocess.PIPE, shell = True, executable = '/bin/bash')
diff_out, error = process.communicate()
......
......@@ -34,6 +34,7 @@ steps_list = OrderedDict([
("07_taxo_affi", 7)
])
# Dictionary of test methods to use on files found in exp_dir (with exceptions i.e. cut_diff)
global methods
methods = OrderedDict([
("cut_diff", r".*_cutadapt\.log"),
......
......@@ -543,21 +543,21 @@ process assembly {
val spades_mem from metaspades_mem_ch
output:
set sampleId, file("${sampleId}_assembly/${sampleId}.contigs.fa") into assembly_for_quast_ch, assembly_for_dedup_ch, assembly_for_filter_ch, assembly_no_filter_ch
set sampleId, file("${sampleId}_assembly/${sampleId}.log"), file("${sampleId}_assembly/params.txt") into logs_assembly_ch
set sampleId, file("${params.assembly}/${sampleId}.contigs.fa") into assembly_for_quast_ch, assembly_for_dedup_ch, assembly_for_filter_ch, assembly_no_filter_ch
set sampleId, file("${params.assembly}/${sampleId}.log"), file("${params.assembly}/params.txt") into logs_assembly_ch
when: ('02_assembly' in step || '03_filtering' in step || '04_structural_annot' in step || '05_alignment' in step || '06_func_annot' in step || '07_taxo_affi' in step || '08_binning' in step)
script:
if(params.assembly=='metaspades')
"""
metaspades.py -t ${task.cpus} -m ${spades_mem} -1 ${preprocessed_reads_R1} -2 ${preprocessed_reads_R2} -o ${sampleId}_assembly
mv ${sampleId}_assembly/scaffolds.fasta ${sampleId}_assembly/${sampleId}.contigs.fa
mv ${sampleId}_assembly/spades.log ${sampleId}_assembly/${sampleId}.log
metaspades.py -t ${task.cpus} -m ${spades_mem} -1 ${preprocessed_reads_R1} -2 ${preprocessed_reads_R2} -o ${params.assembly}
mv ${params.assembly}/scaffolds.fasta ${params.assembly}/${sampleId}.contigs.fa
mv ${params.assembly}/spades.log ${params.assembly}/${sampleId}.log
"""
else if(params.assembly=='megahit')
"""
megahit -t ${task.cpus} -1 ${preprocessed_reads_R1} -2 ${preprocessed_reads_R2} -o ${sampleId}_assembly --out-prefix "${sampleId}"
megahit -t ${task.cpus} -1 ${preprocessed_reads_R1} -2 ${preprocessed_reads_R2} -o ${params.assembly} --out-prefix "${sampleId}"
"""
else
error "Invalid parameter: ${params.assembly}"
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment