usage.md 24.4 KB
Newer Older
Joanna Fourquet's avatar
Joanna Fourquet committed
1
# metagWGS: Usage
Joanna Fourquet's avatar
Joanna Fourquet committed
2

Joanna Fourquet's avatar
Joanna Fourquet committed
3
## I. Basic usage
Joanna Fourquet's avatar
Joanna Fourquet committed
4

MARTIN Pierre's avatar
MARTIN Pierre committed
5
1. See [Installation page](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/installation.md) to install metagWGS. Make sure you are in the directory where you downloaded `metagwgs` source files and added into `metagwgs/env` the two Singularity images `metagwgs.sif` and `eggnog_mapper.sif`.
Joanna Fourquet's avatar
Joanna Fourquet committed
6
7

2. metagWGS is still under development: you need to use the `dev` branch of the metagwgs repository.
Joanna Fourquet's avatar
Joanna Fourquet committed
8

Joanna Fourquet's avatar
Joanna Fourquet committed
9
    Run:
Joanna Fourquet's avatar
Joanna Fourquet committed
10
    ```bash
Joanna Fourquet's avatar
Joanna Fourquet committed
11
12
13
14
15
    cd metagwgs
    git checkout dev
    git pull
    cd ..
    ```
Joanna Fourquet's avatar
Joanna Fourquet committed
16

Joanna Fourquet's avatar
Joanna Fourquet committed
17
3. Run a basic script:
Joanna Fourquet's avatar
Joanna Fourquet committed
18

Joanna Fourquet's avatar
Joanna Fourquet committed
19
   > The next script is a script working on **genologin slurm cluster**. Il allows to run the default [step](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/README.md#metagwgs-steps) `01_clean_qc` of the pipeline (without host reads deletion and taxonomic affiliation of reads).
Joanna Fourquet's avatar
Joanna Fourquet committed
20
   
Joanna Fourquet's avatar
Joanna Fourquet committed
21
   **WARNING:** You must adapt it if you want to run it into your cluster. You must install/load Nextflow and Singularity, and define a specific configuration for your cluster.
Joanna Fourquet's avatar
Joanna Fourquet committed
22

Joanna Fourquet's avatar
Joanna Fourquet committed
23
 * Write in a file `Script.sh`:
Joanna Fourquet's avatar
Joanna Fourquet committed
24

Joanna Fourquet's avatar
Joanna Fourquet committed
25
   > ```bash
Joanna Fourquet's avatar
Joanna Fourquet committed
26
27
28
29
30
31
32
33
34
35
   > #!/bin/bash
   > #SBATCH -p workq
   > #SBATCH --mem=6G
   > module purge
   > module load bioinfo/Nextflow-v20.01.0
   > module load system/singularity-3.5.3
   > nextflow run -profile test_genotoul_workq metagwgs/main.nf --reads "metagwgs/test/*_{R1,R2}.fastq.gz" --skip_removal_host --skip_kaiju
   > ```

   > **NOTE:** you can change Nextflow and Singularity versions with other versions available on the cluster (see all versions with `search_module ToolName`). Nextflow version must be >= v20 and Singularity version must be >= v3.
Joanna Fourquet's avatar
Joanna Fourquet committed
36

Joanna Fourquet's avatar
Joanna Fourquet committed
37
 * Run `Script.sh` with this command line:
Joanna Fourquet's avatar
Joanna Fourquet committed
38
39
40
   > ```bash
   > sbatch Script.sh
   > ```
Joanna Fourquet's avatar
Joanna Fourquet committed
41

Joanna Fourquet's avatar
Joanna Fourquet committed
42
    See the description of output files in [this part](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/output.md) of the documentation.
Joanna Fourquet's avatar
Joanna Fourquet committed
43

Joanna Fourquet's avatar
Joanna Fourquet committed
44
    `Script.sh` is a basic script that requires only small test data input (available into [`metagwgs/test`](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/tree/dev/test)) and no other files. To analyze real data, in addition to your metagenomic whole genome shotgun `.fastq` files, you need to download different files which are described into the next chapter.
Joanna Fourquet's avatar
Joanna Fourquet committed
45

Joanna Fourquet's avatar
Joanna Fourquet committed
46
> **WARNING:** if you run metagWGS to **analyze real metagenomics data on genologin cluster**, you have to use the `unlimitq` queue to run your Nextflow script. To do this, instead of writing in the second line of your script `#SBATCH -p workq` you need to write `#SBATCH -p unlimitq`.
Joanna Fourquet's avatar
Joanna Fourquet committed
47

Joanna Fourquet's avatar
Joanna Fourquet committed
48
## II. Input files
Joanna Fourquet's avatar
Joanna Fourquet committed
49

Joanna Fourquet's avatar
Joanna Fourquet committed
50
### 1. General mandatory files
Joanna Fourquet's avatar
Joanna Fourquet committed
51

Joanna Fourquet's avatar
Joanna Fourquet committed
52
Launching metagWGS involves the use of mandatory files:
Joanna Fourquet's avatar
Joanna Fourquet committed
53
* The **metagenomic whole genome shotgun data** you want to analyze: `.fastq` or `.fastq.gz` R1 and R2 files (Illumina HiSeq3000 or NovaSeq sequencing, 2*150bp). For a cleaner MultiQC html report at the end of the pipeline, raw data with extensions `_R1` and `_R2` are preferred to those with extensions `_1` and `_2`.
Joanna Fourquet's avatar
Joanna Fourquet committed
54
* The **metagWGS.sif** and **eggnog_mapper.sif** Singularity images (into `metagwgs/dev` folder).
Joanna Fourquet's avatar
Joanna Fourquet committed
55

Joanna Fourquet's avatar
Joanna Fourquet committed
56
### 2. Mandatory files for certain steps
Joanna Fourquet's avatar
Joanna Fourquet committed
57
58

In addition to the general mandatory files, if you wish to launch certain steps of the pipeline, you will need previously generated or downloaded files:
Joanna Fourquet's avatar
Joanna Fourquet committed
59

Joanna Fourquet's avatar
Joanna Fourquet committed
60
* Step `01_clean_qc`, **only if you want to remove host reads**: you need a fasta file of the genome.
Joanna Fourquet's avatar
Joanna Fourquet committed
61

Joanna Fourquet's avatar
Joanna Fourquet committed
62
* Step `05_alignment` **(against a protein database)**: download the protein database you want to use. For example you can use NR database.
Joanna Fourquet's avatar
Joanna Fourquet committed
63

Joanna Fourquet's avatar
Joanna Fourquet committed
64
* Step `08_binning`, **taxonomic affiliation of bins**: you need to download CAT/BAT database with `wget tbb.bio.uu.nl/bastiaan/CAT_prepare/CAT_prepare_20210107.tar.gz`
Joanna Fourquet's avatar
Joanna Fourquet committed
65

Joanna Fourquet's avatar
Joanna Fourquet committed
66
**WARNINGS:**
Joanna Fourquet's avatar
Joanna Fourquet committed
67
- if you use step `02_assembly` or `03_filtering` or `04_structural_annot` or `05_alignment` or `06_func_annot` or `07_taxo_affi` or `08_binning` without skipping `01_clean_qc` or host reads removal, you need to use mandatory file of step `01_clean_qc`.
Joanna Fourquet's avatar
Joanna Fourquet committed
68

Joanna Fourquet's avatar
Joanna Fourquet committed
69
**AND**
Joanna Fourquet's avatar
Joanna Fourquet committed
70

Joanna Fourquet's avatar
Joanna Fourquet committed
71
- if you use step `06_func_annot` or `07_taxo_affi` or `08_binning`, you need to use mandatory file of step `05_alignment`.
Joanna Fourquet's avatar
Joanna Fourquet committed
72

Joanna Fourquet's avatar
Joanna Fourquet committed
73
### 3. Others files for certain steps
Joanna Fourquet's avatar
Joanna Fourquet committed
74

Joanna Fourquet's avatar
Joanna Fourquet committed
75
In addition to the `general mandatory files` and `mandatory files for certain steps`, if you wish to launch certain steps of the pipeline, you can download files before to run metagWGS. It is not mandatory but it avoids unnecessary downloads.
Joanna Fourquet's avatar
Joanna Fourquet committed
76

Joanna Fourquet's avatar
Joanna Fourquet committed
77
* Step `01_clean_qc`:
Joanna Fourquet's avatar
Joanna Fourquet committed
78

Joanna Fourquet's avatar
Joanna Fourquet committed
79
    * **Only if you want to remove host reads**: if you also have the BWA index (.amb, .ann, .bwt, .pac and .sa files) of the host genome fasta file, you can specify it into a metagWGS parameter.
Joanna Fourquet's avatar
Joanna Fourquet committed
80

Joanna Fourquet's avatar
Joanna Fourquet committed
81
    * **Only if you want to have the taxonomic affiliation of reads**: you can previously download kaiju database index [here](http://kaiju.binf.ku.dk/server) (blue insert on the left side, click right of the desired database -> copy the link address). For example, download `refseq 2020-05-25 (17GB)` with `wget http://kaiju.binf.ku.dk/database/kaiju_db_refseq_2020-05-25.tgz` and unpack it with `tar -zxvf kaiju_db_refseq_2020-05-25.tgz`. This file is not mandatory, a metagWGS parameter allows to download automatically the wanted database among all available in the [kaiju website](http://kaiju.binf.ku.dk/server). **WARNING:** you are not authorized to use kaiju database built with `kaiju-makedb` command line.
Joanna Fourquet's avatar
Joanna Fourquet committed
82

Joanna Fourquet's avatar
Joanna Fourquet committed
83
Analyzing your metagenomic data with metagWGS allows you to use all **`nextflow run` options** in your `nextflow run` command line and different **metagWGS specific parameters**. Some of these specific parameters are usefull to indicate the `PATH` to these input files. The next chapters will explain these options and parameters.
Joanna Fourquet's avatar
Joanna Fourquet committed
84

Joanna Fourquet's avatar
Joanna Fourquet committed
85
## III. Nextflow options
Joanna Fourquet's avatar
Joanna Fourquet committed
86

Joanna Fourquet's avatar
Joanna Fourquet committed
87
**NOTE:** all `nextflow run` options available [here](https://www.nextflow.io/docs/latest/cli.html?highlight=trace#run) can be used when you run metagWGS. They are notified in the command line by `-`.
Joanna Fourquet's avatar
Joanna Fourquet committed
88

Joanna Fourquet's avatar
Joanna Fourquet committed
89
### 1. Mandatory option
Joanna Fourquet's avatar
Joanna Fourquet committed
90

Joanna Fourquet's avatar
Joanna Fourquet committed
91
92
#### `-profile`

Joanna Fourquet's avatar
Joanna Fourquet committed
93
It allows you to choose the configuration profile among:
Joanna Fourquet's avatar
Joanna Fourquet committed
94
   * `singularity` to analyze **your files** with metagWGS with **singularity containers**. You must have installed Singularity and downloaded the two Singularity containers associated to metagWGS (see [Installation page](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/installation.md)). Thus, your results will be reproducible. **NOTE:** there is no definition of the type of cluster (SGE, slurm, etc) you use in this profile.
Joanna Fourquet's avatar
Joanna Fourquet committed
95
   * `conda` to analyze **your files** with metagWGS with **conda environments** already defined. You must have installed Miniconda (see [here](https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html)).
Joanna Fourquet's avatar
Joanna Fourquet committed
96
97
98
99
100
      * **NOTE 1:** Use of `conda` profile is easier than `singularity` profile but your results will be less reproducible.
     * **NOTE 2:** there is no definition of the type of cluster (SGE, slurm, etc) you use in this profile. You can precise it into a `nextflow.config` file you can add into your working directory. For example if you are working on a slurm cluster, add this line to your `nextflow.config`:
       ```bash
       process.executor = 'slurm'
       ```
Joanna Fourquet's avatar
Joanna Fourquet committed
101
      * > **NOTE 3:** on [genologin cluster](http://bioinfo.genotoul.fr/) Miniconda is already installed. You can search Miniconda module with `search_module Miniconda` and load it with `module load choosen_miniconda_module`.
Joanna Fourquet's avatar
Joanna Fourquet committed
102
   * `genotoul` to analyze **your files** with metagWGS **on genologin cluster** with Singularity images `metagWGS.sif` and `eggnog_mapper.sif`. 
Joanna Fourquet's avatar
Joanna Fourquet committed
103
104
105
106
   * `test_genotoul_workq` to analyze **small test data files** (used in [I. Basic Usage](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/usage.md#i-basic-usage)) with metagWGS **on genologin cluster** on the **`workq`** queue with Singularity images `metagWGS.sif` and `eggnog_mapper.sif`.
   * `test_genotoul_testq` to analyze **small test data files** (used in [I. Basic Usage](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/usage.md#i-basic-usage)) with metagWGS **on genologin cluster** on the **`testq`** queue with Singularity images `metagWGS.sif` and `eggnog_mapper.sif`.
   * `big_test_genotoul` to analyze **big test data files** (used in [Use case documentation page](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/use_case.md)) with metagWGS **on genologin cluster** (on the **`workq`** queue) with Singularity images `metagWGS.sif` and `eggnog_mapper.sif`.
   * `test_local` to analyze **small test data files** (used in [I. Basic Usage](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/usage.md#i-basic-usage)) with metagWGS **on your computer** with Singularity images `metagWGS.sif` and `eggnog_mapper.sif`.
Joanna Fourquet's avatar
Joanna Fourquet committed
107
108
   * `debug` to **debug** metagWGS pipeline.

Joanna Fourquet's avatar
Joanna Fourquet committed
109
These profiles are associated to different configuration files developped [in this directory](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/tree/dev/conf). The `base.config` file available in this directory is the base configuration load in first which is crushed by indications of the profile you use. See [here](https://genotoul-bioinfo.pages.mia.inra.fr/use-nextflow-nfcore-course/nfcore/profiles.html) for more explanations.
Joanna Fourquet's avatar
Joanna Fourquet committed
110

Joanna Fourquet's avatar
Joanna Fourquet committed
111
### 2. Usefull options
Joanna Fourquet's avatar
Joanna Fourquet committed
112

Joanna Fourquet's avatar
Joanna Fourquet committed
113
114
115
116
117
118
119
120
121
122
123
124
125
126
#### `-resume`

It allows to rerun metagWGS from the last not correctly ended process or from process where input or output files have changed.

#### `-with-report`

Generates `report.html` file describing the use of memory and cpus for each process.

#### `-with-timeline`

Generates `timeline.html` file describing the duration of each process.

#### `-with-trace`

Joanna Fourquet's avatar
Joanna Fourquet committed
127
Generates `trace.txt` file describing location of cache directory and metrics for each process.
Joanna Fourquet's avatar
Joanna Fourquet committed
128
129
130
131
132
133
134

#### `-with-dag`

Generates `dag.dot` file, a graph representing the pipeline.

#### `-w working_directory_name`

Joanna Fourquet's avatar
Joanna Fourquet committed
135
Allows to choose the name of the cache directory. Default `-w work`.
Joanna Fourquet's avatar
Joanna Fourquet committed
136

Joanna Fourquet's avatar
Joanna Fourquet committed
137
## IV. metagWGS parameters
Joanna Fourquet's avatar
Joanna Fourquet committed
138

Joanna Fourquet's avatar
Joanna Fourquet committed
139
The next parameters can be used when you run metagWGS.
Joanna Fourquet's avatar
Joanna Fourquet committed
140
141
142

**NOTE:** the specific parameters of the pipeline are indicated by `--` in the command line.

Joanna Fourquet's avatar
Joanna Fourquet committed
143
### 1. Mandatory parameter: `--reads`
Joanna Fourquet's avatar
Joanna Fourquet committed
144

Joanna Fourquet's avatar
Joanna Fourquet committed
145
`--reads "PATH/*_{R1,R2}.fastq.gz"`: indicate location of `.fastq` or `.fastq.gz` input files. For example, `--reads "PATH/*_{R1,R2}.fastq.gz"` run the pipeline with all the `R1.fastq.gz` and `R2.fastq.gz` files available in the indicated `PATH`. For a cleaner MultiQC html report at the end of the pipeline, raw data with extensions `_R1` and `_R2` are preferred to those with extensions `_1` and `_2`.
Joanna Fourquet's avatar
Joanna Fourquet committed
146

Joanna Fourquet's avatar
Joanna Fourquet committed
147
### 2. `--step`
Joanna Fourquet's avatar
Joanna Fourquet committed
148

Joanna Fourquet's avatar
Joanna Fourquet committed
149
`--step "your_step"`: indicate the step of the pipeline you want to run. The steps available are described in the [`README`](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/tree/dev#metagwgs-steps) (`01_clean_qc`, `02_assembly`, `03_filtering`, `04_structural_annot`, `05_alignment`, `06_func_annot`, `07_taxo_affi` and `08_binning`).
Joanna Fourquet's avatar
Joanna Fourquet committed
150

Joanna Fourquet's avatar
Joanna Fourquet committed
151
**NOTES:**
Joanna Fourquet's avatar
Joanna Fourquet committed
152

Joanna Fourquet's avatar
Joanna Fourquet committed
153
**i. You can directly indicate the final step that is important to you. For example, if you are interested in binning (and the taxonomic affiliation of bins), just use `--step "08_binning"`. It runs the previous steps automatically (except `03_filtering`, see ii).**
Joanna Fourquet's avatar
Joanna Fourquet committed
154

Joanna Fourquet's avatar
Joanna Fourquet committed
155
**ii. `03_filtering` is automatically skipped for the next steps `04_structural_annot`, `05_alignment`, `06_func_annot`, `07_taxo_affi` and `08_binning`. If you want to filter your assembly before doing one of these steps, you must use `--step "03_filtering,the_step"`, for example `--step "03_filtering,04_structural_annot"`.**
Joanna Fourquet's avatar
Joanna Fourquet committed
156

Joanna Fourquet's avatar
Joanna Fourquet committed
157
**iii. When you run one of the three steps `06_func_annot`, `07_taxo_affi` or `08_binning` during a first analysis and then another of these steps interests you and you run metagWGS again to get the result of this other step, you have to indicate `--step "the_first_step,the_second_step"`. This will allow you to have a final MultiQC html report that will take into account the metrics of both analyses performed. If the third of these steps interests you and you run again metagWGS for this step, you also have to indicate `--step "the_first_step,the_second_step,the_third,step"` for the same reasons.**
Joanna Fourquet's avatar
Joanna Fourquet committed
158
159

When you want to run a particular step, you just need to specify its name:
Joanna Fourquet's avatar
Joanna Fourquet committed
160

Joanna Fourquet's avatar
Joanna Fourquet committed
161
   * `01_clean_qc` step: `--step "01_clean_qc"`. This step is automatically done in all others steps.
Joanna Fourquet's avatar
Joanna Fourquet committed
162
   If you want to skip this step into the other steps, add parameter `--skip_01_clean_qc` in your command line. Usefull when you have already checked and cleaned your `.fastq` files: you can put in input data (`--reads` parameter) your cleaned `.fastq` files and run directly the `02_assembly` step or other steps.
Joanna Fourquet's avatar
Joanna Fourquet committed
163

Joanna Fourquet's avatar
Joanna Fourquet committed
164
   * `02_assembly` step: `--step "02_assembly"`: assembly is done on reads cleaned with `01_clean_qc` step.
Joanna Fourquet's avatar
Joanna Fourquet committed
165

Joanna Fourquet's avatar
Joanna Fourquet committed
166
   * `03_filtering` step: `--step "03_filtering"`. **WARNING:** By default, if you want to run one of the next steps (`04_structural_annot`, `05_alignment`, `06_func_annot`, `07_taxo_affi` and `08_binning`), `03_filtering` is **not done**. If you want to do the `03_filtering` step when you run these steps, you must indicate `--step "03_filtering,the_step"` with `the_step` a step in `04_structural_annot`, `05_alignment`, `06_func_annot`, `07_taxo_affi` and `08_binning`.
Joanna Fourquet's avatar
Joanna Fourquet committed
167

Joanna Fourquet's avatar
Joanna Fourquet committed
168
   * `04_structural_annot` step: `--step "04_structural_annot"`.
Joanna Fourquet's avatar
Joanna Fourquet committed
169

Joanna Fourquet's avatar
Joanna Fourquet committed
170
   * `05_alignment` step: `--step "05_alignment"`.
Joanna Fourquet's avatar
Joanna Fourquet committed
171

Joanna Fourquet's avatar
Joanna Fourquet committed
172
   * `06_func_annot` step: `--step "06_func_annot"`.
Joanna Fourquet's avatar
Joanna Fourquet committed
173

Joanna Fourquet's avatar
Joanna Fourquet committed
174
   * `07_taxo_affi` step: `--step "07_taxo_affi"`.
Joanna Fourquet's avatar
Joanna Fourquet committed
175

Joanna Fourquet's avatar
Joanna Fourquet committed
176
   * `08_binning` step: `--step "08_binning"`.
Joanna Fourquet's avatar
Joanna Fourquet committed
177

Joanna Fourquet's avatar
Joanna Fourquet committed
178
Default: `01_clean_qc`.
Joanna Fourquet's avatar
Joanna Fourquet committed
179

Joanna Fourquet's avatar
Joanna Fourquet committed
180
For each [step](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/README.md#metagwgs-steps), specific parameters are available. You can add it to the command line and run the pipeline with it. They are described into the next section: [other parameters step by step](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/usage.md#other-parameters-step-by-step).
Joanna Fourquet's avatar
Joanna Fourquet committed
181

Joanna Fourquet's avatar
Joanna Fourquet committed
182
### 3. Other parameters step by step
Joanna Fourquet's avatar
Joanna Fourquet committed
183

Joanna Fourquet's avatar
Joanna Fourquet committed
184
#### **`01_clean_qc` step:**
Joanna Fourquet's avatar
Joanna Fourquet committed
185

Joanna Fourquet's avatar
Joanna Fourquet committed
186
**NOTE:** this step can be skipped with `--skip_01_clean_qc` parameter.  See **WARNING 1**.
Joanna Fourquet's avatar
Joanna Fourquet committed
187

Joanna Fourquet's avatar
Joanna Fourquet committed
188
There are 5 substeps in this step each of them with specific parameters:
Joanna Fourquet's avatar
Joanna Fourquet committed
189

Joanna Fourquet's avatar
Joanna Fourquet committed
190
**1. Remove adapter sequences and low quality reads with cutadapt**
Joanna Fourquet's avatar
Joanna Fourquet committed
191

Joanna Fourquet's avatar
Joanna Fourquet committed
192
* `--adapter1 "adapter_sequence"`: nucleotidic sequence of read 1 adapter (cutadapt -a option). Default `"AGATCGGAAGAGC"`: [GeT-PlaGe](https://ng6.toulouse.inra.fr/index.php?id=57) Illumina adapters. This adapters depends on the library kit used before sequencing data.
Joanna Fourquet's avatar
Joanna Fourquet committed
193

Joanna Fourquet's avatar
Joanna Fourquet committed
194
* `--adapter2 "adapter_sequence"`: nucleotidic sequence of read 2 adapter (cutadapt -A option). Default `"AGATCGGAAGAGC"`: [GeT-PlaGe](https://ng6.toulouse.inra.fr/index.php?id=57) Illumina adapters. This adapters depends on the library kit used before sequencing data.
Joanna Fourquet's avatar
Joanna Fourquet committed
195

Joanna Fourquet's avatar
Joanna Fourquet committed
196
**2. Remove low quality reads with sickle**
Joanna Fourquet's avatar
Joanna Fourquet committed
197

Joanna Fourquet's avatar
Joanna Fourquet committed
198
* `--skip_sickle`: allows to skip sickle substep. Use: `--skip_sickle`.
Joanna Fourquet's avatar
Joanna Fourquet committed
199

Joanna Fourquet's avatar
Joanna Fourquet committed
200
* `--quality_type "solexa" or "illumina" or "sanger"`: sickle -t quality type parameter. Default: `"sanger"`.
Joanna Fourquet's avatar
Joanna Fourquet committed
201
202
203

**3. Remove host reads with bwa, samtools and bedtoools**

Joanna Fourquet's avatar
Joanna Fourquet committed
204
* `--skip_removal_host` allows to skip the deletion of host reads. Use: `--skip_removal_host`. See **WARNING 1**.
Joanna Fourquet's avatar
Joanna Fourquet committed
205

Joanna Fourquet's avatar
Joanna Fourquet committed
206
207
208
209
210
211
212
213
* `--host_fasta "PATH/name_genome.fasta"`: indicate the nucleotide sequence of the host genome. Default: `""`. See **WARNING 1**. Depending on the size of your file, you may need to modify the memory and cpus settings of the Nextflow process to filter the host reads. If this is the case, create a `nextflow.config` file in our working directory and modify these parameters, such as :
```bash
withName: host_filter {
memory = { 200.GB * task.attempt }
time = '48h'
cpus = 8
}
```
Joanna Fourquet's avatar
Joanna Fourquet committed
214
* `--host_bwa_index "PATH/name_genome.{amb,ann,bwt,pac,sa}"`: indicate the bwa index files if they are already built. Default: `""` corresponding to the building of bwa index files by metagWGS. See **WARNING 1**.
Joanna Fourquet's avatar
Joanna Fourquet committed
215

Joanna Fourquet's avatar
Joanna Fourquet committed
216
**WARNING 1:** you need to use `--skip_removal_host` or `--host_fasta` or `--skip_01_clean_qc`. If it is not the case, an error message will occur.
Joanna Fourquet's avatar
Joanna Fourquet committed
217
218
219

**4. Quality control of raw data and cleaned data with fastQC**

Joanna Fourquet's avatar
Joanna Fourquet committed
220
No parameter available for this substep.
Joanna Fourquet's avatar
Joanna Fourquet committed
221
222

**5. Taxonomic classification of reads with kaiju**
Joanna Fourquet's avatar
Joanna Fourquet committed
223

Joanna Fourquet's avatar
Joanna Fourquet committed
224
* `--kaiju_db_dir "PATH/directory"`: if you have already downloaded the kaiju database, indicate its directory. **WARNING:** you will not be able to use kaiju database built with `kaiju-makedb` command line. Default: `--kaiju_db_dir false`. See **WARNING 2**.
Joanna Fourquet's avatar
Joanna Fourquet committed
225

MARTIN Pierre's avatar
MARTIN Pierre committed
226
* `--kaiju_db "http://kaiju.binf.ku.dk/database/CHOOSEN_DATABASE.tgz"`: allows metagWGS to download kaiju database of your choice. The list of kaiju databases is available in [kaiju website](http://kaiju.binf.ku.dk/server), in the blue insert on the left side. Default: `--kaiju_db https://kaiju.binf.ku.dk/database/kaiju_db_refseq_2021-02-26.tgz`. See **WARNING 2**.
Joanna Fourquet's avatar
Joanna Fourquet committed
227

Joanna Fourquet's avatar
Joanna Fourquet committed
228
* `--skip_kaiju`: allows to skip taxonomic affiliation of reads with kaiju. Krona files will not be generated. Use: `--skip_kaiju`. See **WARNING 2**.
Joanna Fourquet's avatar
Joanna Fourquet committed
229

Joanna Fourquet's avatar
Joanna Fourquet committed
230
**WARNING 2:** you need to use `--kaiju_db_dir` or `--kaiju_db` or `--skip_kaiju`. If it is not the case, an error message will occur.
Joanna Fourquet's avatar
Joanna Fourquet committed
231

Joanna Fourquet's avatar
Joanna Fourquet committed
232
#### **`02_assembly` step:**
Joanna Fourquet's avatar
Joanna Fourquet committed
233

Joanna Fourquet's avatar
Joanna Fourquet committed
234
**WARNING 3:** `02_assembly` step depends on `01_clean_qc` step. You need to use mandatory files of these two steps to run `02_assembly`. See [II. Input files](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/usage.md#ii-input-files) and WARNINGS 1 and 2.
Joanna Fourquet's avatar
Joanna Fourquet committed
235

Joanna Fourquet's avatar
Joanna Fourquet committed
236
* `--assembly ["metaspades" or "megahit"]`: allows to indicate the assembly tool. Default: `metaspades`.
Joanna Fourquet's avatar
Joanna Fourquet committed
237

Joanna Fourquet's avatar
Joanna Fourquet committed
238
**WARNING 4:** the user has choice between `metaspades` or `megahit` for `--assembly` parameter. The choice can be based on CPUs and memory availability: `metaspades` needs more CPUs and memory than `megahit` but our tests showed that assembly metrics are better for `metaspades` than `megahit`.
Joanna Fourquet's avatar
Joanna Fourquet committed
239

Joanna Fourquet's avatar
Joanna Fourquet committed
240
* `--metaspades_mem [memory_value]`: memory (in G) used by `metaspades` process. Default: `440`.
Joanna Fourquet's avatar
Joanna Fourquet committed
241

Joanna Fourquet's avatar
Joanna Fourquet committed
242
#### **`03_filtering` step:**
Joanna Fourquet's avatar
Joanna Fourquet committed
243

Joanna Fourquet's avatar
Joanna Fourquet committed
244
**WARNING 5:** `03_filtering` step depends on `01_clean_qc` and `02_assembly` steps. You need to use mandatory files of these three steps to run `03_filtering`. See [II. Input files](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/usage.md#ii-input-files) and WARNINGS 1, 2, 3 and 4.
Joanna Fourquet's avatar
Joanna Fourquet committed
245
246

**WARNING 6:** this step is not done by default when you launch next steps.
Joanna Fourquet's avatar
Joanna Fourquet committed
247

Joanna Fourquet's avatar
Joanna Fourquet committed
248
* `--min_contigs_cpm [cutoff_value]`: CPM (Count Per Million) cutoff to filter contigs with low number of reads. [cutoff_value] can be a decimal number (example: `0.5`). Default: `10`.
Joanna Fourquet's avatar
Joanna Fourquet committed
249

Joanna Fourquet's avatar
Joanna Fourquet committed
250
#### **`04_structural_annot` step:**
Joanna Fourquet's avatar
Joanna Fourquet committed
251

Joanna Fourquet's avatar
Joanna Fourquet committed
252
No parameters.
Joanna Fourquet's avatar
Joanna Fourquet committed
253

Joanna Fourquet's avatar
Joanna Fourquet committed
254
**WARNING 7:** `04_structural_annot` step depends on `01_clean_qc`, `02_assembly` and `03_filtering` (if you use it) steps. You need to use mandatory files of these four steps to run `04_structural_annot`. See [II. Input files](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/usage.md#ii-input-files) and WARNINGS from 1 to 6.
Joanna Fourquet's avatar
Joanna Fourquet committed
255

Joanna Fourquet's avatar
Joanna Fourquet committed
256
257
258
259
260
261
262
263
**WARNING 8:** if you haven't associated this step with `03_filtering`, calculation time of `04_structural_annot` can be important. Some cluster queues have defined calculation time, you need to adapt the queue you use to your data.
> For example, if you are on [genologin cluster](http://bioinfo.genotoul.fr/) and you haven't done `03_filtering` step, you can create into your working directory a file `nextflow.config` containing:
> ```bash
> withName: prokka {
> queue = 'unlimitq'
> }
> ```
> This will launch the `Prokka` command line of step `04_structural_annot` on a calculation queue (`unlimitq`) where the job can last more than 4 days (which is not the case for the usual `workq` queue).
Joanna Fourquet's avatar
Joanna Fourquet committed
264

Joanna Fourquet's avatar
Joanna Fourquet committed
265
#### **`05_alignment` step:**
Joanna Fourquet's avatar
Joanna Fourquet committed
266

Joanna Fourquet's avatar
Joanna Fourquet committed
267
**WARNING 9:** `05_alignment` step depends on `01_clean_qc`, `02_assembly`, `03_filtering` (if you use it) and `04_structural_annot` steps. You need to use mandatory files of these five steps to run `05_alignment`. See [II. Input files](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/usage.md#ii-input-files) and WARNINGS from 1 to 8.
Joanna Fourquet's avatar
Joanna Fourquet committed
268

Joanna Fourquet's avatar
Joanna Fourquet committed
269
* `--diamond_bank "PATH/bank.dmnd"`: path to diamond bank used to align protein sequence of genes. This bank must be previously built with [diamond makedb](https://github.com/bbuchfink/diamond/wiki). Default `""`.
Joanna Fourquet's avatar
Joanna Fourquet committed
270

Joanna Fourquet's avatar
Joanna Fourquet committed
271
#### **`06_func_annot` step:**
Joanna Fourquet's avatar
Joanna Fourquet committed
272

Joanna Fourquet's avatar
Joanna Fourquet committed
273
**WARNING 10:** `06_func_annot` step depends on `01_clean_qc`, `02_assembly`, `03_filtering` (if you use it), `04_structural_annot` and `05_alignment` steps. You need to use mandatory files of these six steps to run `06_func_annot`. See [II. Input files](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/usage.md#ii-input-files) and WARNINGS from 1 to 9.
Joanna Fourquet's avatar
Joanna Fourquet committed
274

Joanna Fourquet's avatar
Joanna Fourquet committed
275
* `--percentage_identity [number]`: corresponds to cd-hit-est -c option to indicate sequence percentage identity for clustering genes. Default: `0.95` corresponding to 95% of sequence identity. Use: `number` must be between 0 and 1, and use `.` when you want to use a float.
Joanna Fourquet's avatar
Joanna Fourquet committed
276

Joanna Fourquet's avatar
Joanna Fourquet committed
277
* `--eggnogmapper_db`: downloads eggNOG-mapper database. If you don't use this parameter, metagWGS doesn't download this database and you must use `--eggnog_mapper_db_dir`. If you run the `06_func_annot` step in different metagenomics projects, downloading the eggNOG-mapper database only once before running metagWGS avoids you to multiply the storage of this database and thus keep free disk space. Use: `--eggnogmapper_db`. See **WARNING 6**.
Joanna Fourquet's avatar
Joanna Fourquet committed
278

Joanna Fourquet's avatar
Joanna Fourquet committed
279
* `--eggnog_mapper_db_dir "PATH/database_directory/"`: indicates path to eggNOG-mapper database if you have already dowloaded it. If you run the `06_func_annot` step in different metagenomics projects, downloading the eggNOG-mapper database only once before running metagWGS avoids you to multiply the storage of this database and thus keep free disk space. See **WARNING 6**.
Joanna Fourquet's avatar
Joanna Fourquet committed
280

Joanna Fourquet's avatar
Joanna Fourquet committed
281
**WARNING 11**: you need to use `--eggnogmapper_db` or `--eggnog_mapper_db_dir`. If it is not the case, an error message will occur.
Joanna Fourquet's avatar
Joanna Fourquet committed
282

Joanna Fourquet's avatar
Joanna Fourquet committed
283
#### **`07_taxo_affi` step:**
Joanna Fourquet's avatar
Joanna Fourquet committed
284

Joanna Fourquet's avatar
Joanna Fourquet committed
285
**WARNING 12:** `07_taxo_affi` step depends on `01_clean_qc`, `02_assembly`, `03_filtering` (if you use it), `04_structural_annot` and `05_alignment` steps. You need to use mandatory files of these six steps to run `07_taxo_affi`. See [II. Input files](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/usage.md#ii-input-files) and WARNINGS from 1 to 9.
Joanna Fourquet's avatar
Joanna Fourquet committed
286

Joanna Fourquet's avatar
Joanna Fourquet committed
287
* `--accession2taxid "FTP_PATH_TO_prot.accession2taxid.gz"`: indicates the FTP adress of the NCBI file `prot.accession2taxid.gz`. Default: `"ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz"`.
Joanna Fourquet's avatar
Joanna Fourquet committed
288

Joanna Fourquet's avatar
Joanna Fourquet committed
289
* `--taxdump "FTP_PATH_TO_taxdump.tar.gz"`: indicates the FTP adress of the NCBI file `taxdump.tar.gz`. Default `"ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz"`.
Joanna Fourquet's avatar
Joanna Fourquet committed
290

MARTIN Pierre's avatar
MARTIN Pierre committed
291
292
* `--taxonomy_dir "PATH/directory": if you have already downloaded the accession2taxid and taxdump databases, indicate their parent directory. Default: `--taxonomy_dir false`.`

Joanna Fourquet's avatar
Joanna Fourquet committed
293
#### **`08_binning` step:**
Joanna Fourquet's avatar
Joanna Fourquet committed
294

Joanna Fourquet's avatar
Joanna Fourquet committed
295
**WARNING 13:** `08_binning` step depends on `01_clean_qc`, `02_assembly`, `03_filtering` (if you use it), `04_structural_annot` and `05_alignment` steps. You need to use mandatory files of these six steps to run `08_binning`. See [II. Input files](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/usage.md#ii-input-files) and WARNINGS from 1 to 9.
Joanna Fourquet's avatar
Joanna Fourquet committed
296

Joanna Fourquet's avatar
Joanna Fourquet committed
297
* `--min_contig_size [cutoff_length]`: contig length cutoff to filter contigs before binning. Must be greater than `1500`. Default: `1500`.
Joanna Fourquet's avatar
Joanna Fourquet committed
298

Joanna Fourquet's avatar
Joanna Fourquet committed
299
* `--busco_reference "PATH/file_db"`: path to BUSCO database. Default: `"https://busco-archive.ezlab.org/v3/datasets/bacteria_odb9.tar.gz"`. **WARNING 14:** We use BUSCO v3 from the `metagWGS.sif` Singularity container. Be careful not to use the BUSCO reference of other BUSCO versions.
Joanna Fourquet's avatar
Joanna Fourquet committed
300

Joanna Fourquet's avatar
Joanna Fourquet committed
301
* `--cat_db "PATH/CAT_prepare_20190108.tar.gz"`: path to CAT/BAT database. Default: `false`. **WARNING 15:** you need to download this database before running metagWGS `08_binning` step. Download it with: `wget tbb.bio.uu.nl/bastiaan/CAT_prepare/CAT_prepare_20210107.tar.gz`.
Joanna Fourquet's avatar
Joanna Fourquet committed
302
303
304

#### Others parameters

Joanna Fourquet's avatar
Joanna Fourquet committed
305
* `--multiqc_config "PATH/multiqc.yaml"`: if you want to change the configuration of multiqc report. Default: `"$baseDir/assets/multiqc_config.yaml"`.
Joanna Fourquet's avatar
Joanna Fourquet committed
306

Joanna Fourquet's avatar
Joanna Fourquet committed
307
* `--outdir "dir_name"`: change name of output directory. Default `"results"`.
Joanna Fourquet's avatar
Joanna Fourquet committed
308

Joanna Fourquet's avatar
Joanna Fourquet committed
309
* `--help`: print metagWGS help. Default: `false`. Use: `--help`.
Joanna Fourquet's avatar
Joanna Fourquet committed
310

Joanna Fourquet's avatar
Joanna Fourquet committed
311
## V. Description of output files
Joanna Fourquet's avatar
Joanna Fourquet committed
312

Joanna Fourquet's avatar
Joanna Fourquet committed
313
See the description of output files in [this part](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/output.md) of the documentation.
Joanna Fourquet's avatar
Joanna Fourquet committed
314

Joanna Fourquet's avatar
Joanna Fourquet committed
315
## VI. Analyze big test dataset with metagWGS in genologin cluster
Joanna Fourquet's avatar
Joanna Fourquet committed
316

Joanna Fourquet's avatar
Joanna Fourquet committed
317
> If you have an account into [genologin cluster](http://bioinfo.genotoul.fr/) and you would like to familiarise yourself with metagWGS, see the tutorial available into the [use case documentation page](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/use_case.md). It allows to analyze big test dataset with metagWGS.
Joanna Fourquet's avatar
Joanna Fourquet committed
318

Joanna Fourquet's avatar
Joanna Fourquet committed
319
**WARNING:** the test dataset into `metagwgs/test` directory used in [I. Basic Usage](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/usage.md#i-basic-usage) is a small test dataset which does not allow to test all steps (`08_binning` doesn't work with this dataset).