README.md 6.75 KB
Newer Older
1
2
3
4
# Functional tests: Usage

## I. Pre-requisites

5
1. Install metagwgs as described here: [installation doc](../docs/installation.md)
MARTIN Pierre's avatar
MARTIN Pierre committed
6
2. Get datasets: two datasets are currently available for these functional tests at `https://forgemia.inra.fr/genotoul-bioinfo/metagwgs-test-datasets.git`
7

MARTIN Pierre's avatar
MARTIN Pierre committed
8
    ```
9
    git clone git@forgemia.inra.fr:genotoul-bioinfo/metagwgs-test-datasets.git
10
11

    or
12

13
    wget https://forgemia.inra.fr/genotoul-bioinfo/metagwgs-test-datasets.git
MARTIN Pierre's avatar
MARTIN Pierre committed
14
    ```
15
3. Get data banks: download [this archive](http://genoweb.toulouse.inra.fr/~choede/FT_banks_2021-12-16.tar.gz) and decompress its contents in any folder. This archive contains data banks for:
MARTIN Pierre's avatar
MARTIN Pierre committed
16
17
    - **Kaiju** (_kaijudb_refseq_2020-05-25_)
    - **Diamond** (_refseq_bacteria_2021-05-20_)
18
    - **NCBI Taxonomy** (_taxonomy_2021-12-7_ )
MARTIN Pierre's avatar
MARTIN Pierre committed
19
20
21
22
    - **Eggnog Mapper** (_eggnog-mapper-2.0.4-rf1_)


    > Use those banks to reproduce the outputs of functional tests.
23

Celine Noirot's avatar
Celine Noirot committed
24
## II. Run functional tests
25

Celine Noirot's avatar
Celine Noirot committed
26
Each step of metagwgs produces a series of files. We want to be able to determine if the modifications we perform on metagwgs have an impact on any of these files (presence, contents, format, ...). You'll find more info about how the files are tested at the end of this page.
27

28
29
To launch functional tests, you need to be located at the root of the folder where you want to perform the tests. There are two ways to launch functionnal tests (testing all steps to 07_taxo_affi):
- by providing the results folder of a pipeline already exectuted
MARTIN Pierre's avatar
MARTIN Pierre committed
30
```
31
cd test_folder
32
33
34
export METAG_PATH="/path/to/sources"
export DATASET="/path/to/metagwgs-test-datasets"
python $METAG_PATH/functional_tests/main.py -step 07_taxo_affi -exp_dir $DATASET/small/output -obs_dir ./results
MARTIN Pierre's avatar
MARTIN Pierre committed
35
```
MARTIN Pierre's avatar
MARTIN Pierre committed
36
- by providing a script which will launch the nextflow pipeline [see example](./launch_example.sh) (this example is designed for the "small" dataset with --min_contigs_cpm>1000, using slurm)
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59

    1. create working directory 
    ```
    mkdir test_folder
    cd test_folder
    ```
    
    2.set enviroment variables and load module 
    
    ```
    export METAG_PATH="/path/to/sources"
    export DATASET="/path/to/metagwgs-test-datasets"
    export DATABANK="/path/to/FT_banks_2021-10-19"
    export EGGNOG_DB="$DATABANK/eggnog-mapper-2.0.4-rf1/data"
    module load system/Python-3.7.4
    ```
    
    3.launch functional test  
    
    ```
    cp $METAG_PATH/functional_tests/launch_example.sh ./
    python $METAG_PATH/functional_tests/main.py -step 07_taxo_affi -exp_dir $DATASET/small/output -obs_dir ./results --script launch_example.sh
    ```
MARTIN Pierre's avatar
MARTIN Pierre committed
60

MARTIN Pierre's avatar
MARTIN Pierre committed
61
62
>**NOTE: more information on the command used to produce each dataset in [small](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs-test-datasets/-/tree/small) and [mag](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs-test-datasets/-/tree/mag) READMEs**

Celine Noirot's avatar
Celine Noirot committed
63
## III. Output
MARTIN Pierre's avatar
MARTIN Pierre committed
64

MARTIN Pierre's avatar
MARTIN Pierre committed
65
A `ft_[STEP].log` file is created for each step of metagwgs. It contains information about each test performed on given files.
MARTIN Pierre's avatar
MARTIN Pierre committed
66

MARTIN Pierre's avatar
MARTIN Pierre committed
67
Exemple with `ft_01_clean_qc.log`:
MARTIN Pierre's avatar
MARTIN Pierre committed
68
69

```
MARTIN Pierre's avatar
MARTIN Pierre committed
70
Expected directory: metagwgs-test-datasets/output/01_clean_qc
MARTIN Pierre's avatar
MARTIN Pierre committed
71
vs
MARTIN Pierre's avatar
MARTIN Pierre committed
72
Observed directory: results/01_clean_qc
MARTIN Pierre's avatar
MARTIN Pierre committed
73
74
75

------------------------------------------------------------------------------

76
File:           01_1_cleaned_reads/cleaned_c_R1.fastq.gz
MARTIN Pierre's avatar
MARTIN Pierre committed
77
78
79
80
81
Test method:    zdiff
Test result:    Passed

------------------------------------------------------------------------------

82
File:           01_1_cleaned_reads/cleaned_c_R2.fastq.gz
MARTIN Pierre's avatar
MARTIN Pierre committed
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
Test method:    zdiff
Test result:    Passed


...


=========================================
-----------------------------------------

Testing the 01_clean_qc step of metagWGS:

Total:      36
Passed:     36 (100.0%)
Missed:     0 (0.0%)
Not tested: 0

-----------------------------------------
=========================================
```

If a test resulted in 'Failed' instead of 'Passed', the stdout is printed in log.

MARTIN Pierre's avatar
MARTIN Pierre committed
106
Sometimes, files are not tested because present in _exp_dir_ but not in _obs_dir_. Then a log `ft_[STEP].not_tested` is created containing names of missing files. In **02_assembly**, there are two possible assembly programs that can be used: _metaspades_ and _megahit_, resulting in this `.not_tested log` file. Not tested files are not counted in missed count.
Celine Noirot's avatar
Celine Noirot committed
107
108
109
110
111
112


### Test methods

5 simple test methods are used:

MARTIN Pierre's avatar
MARTIN Pierre committed
113
114
sort_diff: simple bash difference between two files
`diff <(sort exp_path) <(sort obs_path)`
Celine Noirot's avatar
Celine Noirot committed
115

MARTIN Pierre's avatar
MARTIN Pierre committed
116
117
118
119
120
121
122
123
124
125
126
127
128
- **diff**: simple bash difference between two files
    
    `diff exp_path obs_path`

- **zdiff**: simple bash difference between two gzipped files

    `zdiff exp_path obs_path`

- **no_header_diff**: remove the headers of .annotations and .seed_orthologs files

    `diff <(grep -w "^?#" exp_path) <(grep -w "^?#" obs_path)`

- **cut_diff**: exception for cutadapt.log file
Celine Noirot's avatar
Celine Noirot committed
129

MARTIN Pierre's avatar
MARTIN Pierre committed
130
    `diff <(tail -n+6 exp_path) <(tail -n+6 obs_path)`
Celine Noirot's avatar
Celine Noirot committed
131

MARTIN Pierre's avatar
MARTIN Pierre committed
132
- **not_empty**: in python, check if file is empty
Celine Noirot's avatar
Celine Noirot committed
133
134
135

not_empty: in python, check if file is empty
`test = path.getsize(obs_path) > 0`
Celine Noirot's avatar
Celine Noirot committed
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161


# Test skips and check processes

The script `test_parameters_and_processes.py` check if execution with parameters specified in `expected_processes.tsv`, run processes as expected.

To use it :
1. retrieve databank and datasets (small) as describe in functional test above .
1. fix needed path
  - modules
  ```
    module load bioinfo/Nextflow-v21.04.1
    module load system/singularity-3.7.3
  ```
  - set enviroment variables : 
    ```
    export OUTDIR="/path/to/out"
    export METAG_PATH="/path/to/sources"
    export DATABANK="/path/to/FT_banks_2021-10-19"
    export DATASET="/path/to/metagwgs-test-datasets"
    export EGGNOG_DB="/bank/eggnog-mapper/eggnog-mapper-2.0.4-rf1/data"
    ```
  - create command file: 
  ``` 
  cut -f 1 $METAG_PATH/functional_tests/expected_processes_sr.tsv  | tail -n +2 > $OUTDIR/cmd_sr.sh
  ``` 
162
  > the commands use profile `test,genotoul`
163
164
165
166
  - replace path in the samplesheet : 
  ``` 
  sed -i -e "s,\$DATASET,$DATASET,g" $DATASET/small/input/samplesheet.csv
  ``` 
Celine Noirot's avatar
Celine Noirot committed
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
2. launch on the cluster the commands:
  ``` 
  cd $OUTDIR
  sarray cmd_sr.sh
  ```
3. launch `test_parameters_and_processes.py`
    ```
    $METAG_PATH/functional_tests/test_parameters_and_processes.py --file $METAG_PATH/functional_tests/expected_processes_sr.tsv
    ```

## Example on HiFi on genotoul :
```
module load bioinfo/Nextflow-v21.04.1
module load system/singularity-3.7.3

export OUTDIR="$HOME/work/metagenomic/test_processes/"
export METAG_PATH="$HOME/work/metagenomic/metagwgs/"
export DATABANK="/home/pmartin2/work/FT_banks_2021-10-19"
export DATASET="$HOME/work/metagenomic/metagwgs-test-datasets"
export EGGNOG_DB="/bank/eggnog-mapper/eggnog-mapper-2.0.4-rf1/data"
```

Fichier $DATASET/hifi/input/samplesHiFi.csv : 

```
cut -f 1 $METAG_PATH/functional_tests/expected_processes_hifi.tsv  | tail -n +2 > $OUTDIR/cmd_hifi.sh
193
sed -i -e "s,\$DATASET,$DATASET,g" $DATASET/hifi/input/samplesheet.csv
Celine Noirot's avatar
Celine Noirot committed
194
195
196
197
sarray $OUTDIR/cmd_hifi.sh

```