Commit a936a696 authored by Joanna Fourquet's avatar Joanna Fourquet
Browse files

Update use_case.md

parent d6933e91
......@@ -580,6 +580,8 @@ The `raw` directory contains others files which are not main files. For more inf
With the next script, we want to run metagWGS on test dataset in order to have **`06_func_annot` step** results. This new script is the same script than `Script_filtering_binning.sh` where we have changed `--step "03_filtering,08_binning"` by `--step "03_filtering,08_binning,06_func_annot"` into the `--step` parameter and where we have added the parameter `--eggnogmapper_db` to build eggNOG-mapper database for functional annotation. All previous choices have beed conserved: we don't know the real host genome for this dataset but we want to test host filtering: we decided to use **sus scrofa** as host genome. We also want to **filter contigs** after assembly with the default cpm value (10). It is this assembly that will be used in the following steps requiring the assembly files. Assembly tool used in this script is `metaspades`.
**NOTE:** keeping `08_binning` into the `--step` parameter allows to keep binning metrics in MultiQC html report file.
### B. Write the script `Script_filtering_functional.sh`
1. Go to `launch_test` directory.
......@@ -664,7 +666,7 @@ Cached : 46
### D. Output files
With `Script_filtering_functional.sh` you have run all steps allowing to run `06_func_annot` step, including `03_filtering` step: `01_clean_qc`, `02_assembly`, `03_filtering`, `04_structural_annot`, `05_alignment` and `06_func_annot`. But with the previous run script (`Script_filtering_binning.sh`) the steps `01_clean_qc`, `02_assembly`, `03_filtering`, `04_structural_annot` and `05_alignment` have already been launched. This is why the jobs associated to these steps in the previous slurm file are indicated as "`cached`". All output files of the pipeline related to these steps haven't been changed because they don't have been re-generated. They are presented into the chapter [IV.D](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/use_case.md#d-output-files).
With `Script_filtering_functional.sh` you have run all steps allowing to run `06_func_annot` step, including `03_filtering` step: `01_clean_qc`, `02_assembly`, `03_filtering`, `04_structural_annot`, `05_alignment` and `06_func_annot`. But with the previous run script (`Script_filtering_binning.sh`) the steps `01_clean_qc`, `02_assembly`, `03_filtering`, `04_structural_annot` and `05_alignment` have already been launched. This is why the jobs associated to these steps in the previous slurm file are indicated as "`cached`". Moreover, we keep `08_binning` and into `--step` parameter so the jobs associated to this step in the previous slurm file are also indicated as "`cached`". Keeping `06_func_annot` allows to have a new MultiQC report file updated with metrics of all steps launched in the two scripts. All output files of the pipeline related to these cached steps haven't been changed because they don't have been re-generated. They are presented into the chapter [IV.D](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/use_case.md#d-output-files).
In the following sections, we will only present the main numerical output files in the subdirectory added to `results/` by this second script: `06_func_annot`.
#### 1. `06_func_annot/06_1_clustering`
......@@ -933,6 +935,8 @@ PFAMs ERR3201914.featureCounts.tsv ERR3201918.featureCounts.tsv ERR3201928.featu
With this last script, we want to run metagWGS on test dataset in order to have **`07_taxo_affi` step** results. This new script is the same script than `Script_filtering_functional.sh` (and so close to `Script_filtering_binning.sh)` where we have added `07_taxo_affi` into the `--step` parameter: `--step "03_filtering,08_binning,06_func_annot,07_taxo_affi"`. All previous choices have beed conserved: we don't know the real host genome for this dataset but we want to test host filtering: we decided to use **sus scrofa** as host genome. We also want to **filter contigs** after assembly with the default cpm value (10). It is this assembly that will be used in the following steps requiring the assembly files. Assembly tool used in this script is `metaspades`.
**NOTE:** keeping `08_binning` and `06_func_annot` into the `--step` parameter allows to keep binning and functional annotation metrics in MultiQC html report file.
### B. Write the script `Script_filtering_taxo.sh`
1. Go to `launch_test` directory.
......@@ -1015,7 +1019,7 @@ Cached : 45
### D. Output files
With `Script_filtering_taxo.sh` you have run all steps allowing to run `07_taxo_affi` step, including `03_filtering` step: `01_clean_qc`, `02_assembly`, `03_filtering`, `04_structural_annot`, `05_alignment` and `07_taxo_affi`. With the first script (`Script_filtering_binning.sh`) the steps `01_clean_qc`, `02_assembly`, `03_filtering`, `04_structural_annot` and `05_alignment` have already been launched. This is why the jobs associated to these steps in the previous slurm file are indicated as "`cached`". All output files of the pipeline related to these steps haven't been changed because they don't have been re-generated. They are presented into the chapter [IV.D](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/use_case.md#d-output-files).
With `Script_filtering_taxo.sh` you have run all steps allowing to run `07_taxo_affi` step, including `03_filtering` step: `01_clean_qc`, `02_assembly`, `03_filtering`, `04_structural_annot`, `05_alignment` and `07_taxo_affi`. With the first script (`Script_filtering_binning.sh`) the steps `01_clean_qc`, `02_assembly`, `03_filtering`, `04_structural_annot` and `05_alignment` have already been launched. This is why the jobs associated to these steps in the previous slurm file are indicated as "`cached`". Moreover, we keep `08_binning` and `06_func_annot` into `--step` parameter so the jobs associated to these steps in the previous slurm file are also indicated as "`cached`". Keeping `08_binning` and `06_func_annot` allows to have a new MultiQC report file updated with metrics of all steps launched in the three scripts. All output files of the pipeline related to these cached steps haven't been changed because they don't have been re-generated. They are presented into the chapters [IV.D](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/use_case.md#d-output-files) and [V.D](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/use_case.md#d-output-files-1).
In the following sections, we will only present the main numerical output files in the subdirectory added to `results/` by this third script: `07_taxo_affi`.
#### 1. `07_taxo_affi/SAMPLE_NAME`
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment