Commit fb9bfc35 authored by Celine Noirot's avatar Celine Noirot
Browse files

Merge branch 'dev' into 'master'

Merge dev into master for version 2.0.

See merge request !8
parents 95a94cd6 54ae7d6c
Pipeline #26999 passed with stages
in 15 minutes and 20 seconds
......@@ -7,16 +7,18 @@ extra_fn_clean_trim:
- "cleaned_"
- "raw_"
- '_kept_contigs'
- '.count_reads_on_contigs.flagstat'
- '.no_filter.flagstat'
- '.host_filter.flagstat'
- '.count_reads_on_contigs'
- '.no_filter'
- '.host_filter'
- '_scaffolds'
- '.txt'
- '_R1'
- '.contigs'
- '.sort'
- '_select_contigs'
- '_kaiju_MEM_verbose'
- '_sickle'
extra_fn_clean_exts:
- "_select_contigs_cpm"
module_order:
- fastqc:
......@@ -27,11 +29,21 @@ module_order:
- sickle:
path_filters:
- '*_sickle.log'
- samtools:
name : 'Reads before host reads filter'
info: 'This section reports of the reads alignement against host genome with bwa.'
path_filters:
- '*.no_filter.flagstat'
- samtools:
name : 'Reads aln on host genome'
info: 'This section of the cleaned reads alignement against host genome with bwa.'
path_filters:
- '*host_filter/*'
- samtools:
name : 'Reads after host reads filter'
info: 'This section reports of the cleaned reads alignement against host genome with bwa.'
path_filters:
- '*.host_filter.flagstat'
- fastqc:
name: 'FastQC (cleaned)'
info: 'This section of the report shows FastQC results after adapter trimming and cleaning.'
......@@ -43,26 +55,16 @@ module_order:
info: 'This section of the report shows quast results after assembly'
path_filters:
- '*_all_contigs_QC/*'
- samtools:
name : 'Reads before host reads filter'
info: 'This section reports of the reads alignement against host genome with bwa.(not sure to be interesting ????)'
path_filters:
- '*.no_filter.flagstat'
- samtools:
name : 'Reads after host reads filter'
info: 'This section reports of the cleaned reads alignement against host genome with bwa.(not sure to be interesting ????)'
- quast:
name: 'Quast filtered assembly'
info: 'This section of the report shows quast results after filtering of assembly'
path_filters:
- '*.host_filter.flagstat'
- '*_select_contigs_QC/*'
- samtools:
name : 'Reads after deduplication'
info: 'This section reports of deduplicated reads alignement against contigs with bwa.'
path_filters:
- '*.count_reads_on_contigs.flagstat'
- quast:
name: 'Quast filtered assembly'
info: 'This section of the report shows quast results after filtering of assembly'
path_filters:
- '*_select_contigs_QC/*'
- prokka
- featureCounts
......
......@@ -58,8 +58,8 @@ The `results/` directory contains a sub-directory for each step launched:
| File or directory/ | Description |
| ----------------------- | --------------------------------------- |
| `SAMPLE_NAME_select_contigs.[percent_identity].fasta` | Nucleotide sequence of contigs selected after filtering step with a percentage of identity of [percent_identity]. |
| `SAMPLE_NAME_discard_contigs.[percent_identity].fasta` | Nucleotide sequence of contigs discarded after filtering step with a percentage of identity of [percent_identity]. |
| `SAMPLE_NAME_select_contigs_cpm[percent_identity].fasta` | Nucleotide sequence of contigs selected after filtering step with a percentage of identity of [percent_identity]. |
| `SAMPLE_NAME_discard_contigs_cpm[percent_identity].fasta` | Nucleotide sequence of contigs discarded after filtering step with a percentage of identity of [percent_identity]. |
| `SAMPLE_NAME_select_contigs_QC/` | Contains metaQUAST quality control files of filtered contigs. |
#### **04_structural_annot/**
......
......@@ -16,41 +16,41 @@
3. Run a basic script:
The next script is a script working on **genologin slurm cluster**. Il allows to run the default [step](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/README.md#metagwgs-steps) `01_clean_qc` of the pipeline (without host reads deletion and taxonomic affiliation of reads).
> The next script is a script working on **genologin slurm cluster**. Il allows to run the default [step](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/README.md#metagwgs-steps) `01_clean_qc` of the pipeline (without host reads deletion and taxonomic affiliation of reads).
**WARNING:** You must adapt it if you want to run it into your cluster. You must install/load Nextflow and Singularity, and define a specific configuration for your cluster.
* Write in a file `Script.sh`:
```bash
#!/bin/bash
#SBATCH -p workq
#SBATCH --mem=6G
module purge
module load bioinfo/Nextflow-v20.01.0
module load system/singularity-3.5.3
nextflow run -profile test_genotoul_workq metagwgs/main.nf --reads "metagwgs/test/*_{R1,R2}.fastq.gz" --skip_removal_host --skip_kaiju
```
> ```bash
> #!/bin/bash
> #SBATCH -p workq
> #SBATCH --mem=6G
> module purge
> module load bioinfo/Nextflow-v20.01.0
> module load system/singularity-3.5.3
> nextflow run -profile test_genotoul_workq metagwgs/main.nf --reads "metagwgs/test/*_{R1,R2}.fastq.gz" --skip_removal_host --skip_kaiju
> ```
**NOTE:** you can change Nextflow and Singularity versions with other versions available on the cluster (see all versions with `search_module ToolName`). Nextflow version must be >= v20 and Singularity version must be >= v3.
> **NOTE:** you can change Nextflow and Singularity versions with other versions available on the cluster (see all versions with `search_module ToolName`). Nextflow version must be >= v20 and Singularity version must be >= v3.
* Run `Script.sh` with this command line:
```bash
sbatch Script.sh
```
> ```bash
> sbatch Script.sh
> ```
See the description of output files in [this part](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/output.md) of the documentation.
`Script.sh` is a basic script that requires only small test data input (available into [`metagwgs/test`](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/tree/dev/test)) and no other files. To analyze real data, in addition to your metagenomic whole genome shotgun `.fastq` files, you need to download different files which are described into the next chapter.
**WARNING:** if you run metagWGS to **analyze real metagenomics data on genologin cluster**, you have to use the `unlimitq` queue to run your Nextflow script: instead of writing in the second line of your script `#SBATCH -p workq` you need to write `#SBATCH -p unlimitq`.
> **WARNING:** if you run metagWGS to **analyze real metagenomics data on genologin cluster**, you have to use the `unlimitq` queue to run your Nextflow script. To do this, instead of writing in the second line of your script `#SBATCH -p workq` you need to write `#SBATCH -p unlimitq`.
## II. Input files
### 1. General mandatory files
Launching metagWGS involves the use of mandatory files:
* The **metagenomic whole genome shotgun data** you want to analyze: `.fastq` or `.fastq.gz` R1 and R2 files (Illumina HiSeq3000 or NovaSeq sequencing, 2*150bp).
* The **metagenomic whole genome shotgun data** you want to analyze: `.fastq` or `.fastq.gz` R1 and R2 files (Illumina HiSeq3000 or NovaSeq sequencing, 2*150bp). For a cleaner MultiQC html report at the end of the pipeline, raw data with extensions `_R1` and `_R2` are preferred to those with extensions `_1` and `_2`.
* The **metagWGS.sif** and **eggnog_mapper.sif** Singularity images (into `metagwgs/dev` folder).
### 2. Mandatory files for certain steps
......@@ -92,8 +92,14 @@ Analyzing your metagenomic data with metagWGS allows you to use all **`nextflow
It allows you to choose the configuration profile among:
* `singularity` to analyze **your files** with metagWGS with **singularity containers**. You must have installed Singularity and downloaded the two Singularity containers associated to metagWGS (see [Installation page](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/installation.md)). Thus, your results will be reproducible. **NOTE:** there is no definition of the type of cluster (SGE, slurm, etc) you use in this profile.
* `conda` to analyze **your files** with metagWGS with **conda environments** already defined. You must have installed Miniconda (see [here](https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html)). **NOTE 1:** on [genologin cluster](http://bioinfo.genotoul.fr/) Miniconda is already installed: you can search Miniconda module with `search_module Miniconda`. Use of `conda` profile is easier than `singularity` profile but your results will be less reproducible. **NOTE 2:** there is no definition of the type of cluster (SGE, slurm, etc) you use in this profile.
* `genotoul` to analyze **your files** with metagWGS **on genologin cluster** with Singularity images `metagWGS.sif` and `eggnog_mapper.sif`.
* `conda` to analyze **your files** with metagWGS with **conda environments** already defined. You must have installed Miniconda (see [here](https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html)).
* **NOTE 1:** Use of `conda` profile is easier than `singularity` profile but your results will be less reproducible.
* **NOTE 2:** there is no definition of the type of cluster (SGE, slurm, etc) you use in this profile. You can precise it into a `nextflow.config` file you can add into your working directory. For example if you are working on a slurm cluster, add this line to your `nextflow.config`:
```bash
process.executor = 'slurm'
```
* > **NOTE 3:** on [genologin cluster](http://bioinfo.genotoul.fr/) Miniconda is already installed. You can search Miniconda module with `search_module Miniconda` and load it with `module load choosen_miniconda_module`.
* `genotoul` to analyze **your files** with metagWGS **on genologin cluster** with Singularity images `metagWGS.sif` and `eggnog_mapper.sif`.
* `test_genotoul_workq` to analyze **small test data files** (used in [I. Basic Usage](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/usage.md#i-basic-usage)) with metagWGS **on genologin cluster** on the **`workq`** queue with Singularity images `metagWGS.sif` and `eggnog_mapper.sif`.
* `test_genotoul_testq` to analyze **small test data files** (used in [I. Basic Usage](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/usage.md#i-basic-usage)) with metagWGS **on genologin cluster** on the **`testq`** queue with Singularity images `metagWGS.sif` and `eggnog_mapper.sif`.
* `big_test_genotoul` to analyze **big test data files** (used in [Use case documentation page](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/use_case.md)) with metagWGS **on genologin cluster** (on the **`workq`** queue) with Singularity images `metagWGS.sif` and `eggnog_mapper.sif`.
......@@ -136,13 +142,19 @@ The next parameters can be used when you run metagWGS.
### 1. Mandatory parameter: `--reads`
`--reads "SAMPLE_NAME.fastq.gz"`: indicate location of `.fastq` or `.fastq.gz` input files. For example, `--reads PATH/*_{R1,R2}.fastq.gz` run the pipeline with all the `R1.fastq.gz` and `R2.fastq.gz` files available in the indicated `PATH`.
`--reads "PATH/*_{R1,R2}.fastq.gz"`: indicate location of `.fastq` or `.fastq.gz` input files. For example, `--reads "PATH/*_{R1,R2}.fastq.gz"` run the pipeline with all the `R1.fastq.gz` and `R2.fastq.gz` files available in the indicated `PATH`. For a cleaner MultiQC html report at the end of the pipeline, raw data with extensions `_R1` and `_R2` are preferred to those with extensions `_1` and `_2`.
### 2. `--step`
`--step "your_step"`: indicate the step of the pipeline you want to run. The steps available are described in the [`README`](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/tree/dev#metagwgs-steps) (`01_clean_qc`, `02_assembly`, `03_filtering`, `04_structural_annot`, `05_alignment`, `06_func_annot`, `07_taxo_affi` and `08_binning`).
**NOTE:** you must indicate only **one** step when you run the pipeline. It runs previous steps automatically. The **only exception** to this rule is the combination of the step `03_filtering` with the next steps `04_structural_annot`, `05_alignment`, `06_func_annot`, `07_taxo_affi` and `08_binning`. `03_filtering` is automatically skipped for the steps `04_structural_annot`, `05_alignment`, `06_func_annot`, `07_taxo_affi` and `08_binning`, if you want to filter your assembly before doing one of these steps, you must use `--step "03_filtering,one_of_these_steps"`, for example `--step "03_filtering,04_structural_annot"`.
**NOTES:**
**i. You can directly indicate the final step that is important to you. For example, if you are interested in binning (and the taxonomic affiliation of bins), just use `--step "08_binning"`. It runs the previous steps automatically (except `03_filtering`, see ii).**
**ii. `03_filtering` is automatically skipped for the next steps `04_structural_annot`, `05_alignment`, `06_func_annot`, `07_taxo_affi` and `08_binning`. If you want to filter your assembly before doing one of these steps, you must use `--step "03_filtering,the_step"`, for example `--step "03_filtering,04_structural_annot"`.**
**iii. When you run one of the three steps `06_func_annot`, `07_taxo_affi` or `08_binning` during a first analysis and then another of these steps interests you and you run metagWGS again to get the result of this other step, you have to indicate `--step "the_first_step,the_second_step"`. This will allow you to have a final MultiQC html report that will take into account the metrics of both analyses performed. If the third of these steps interests you and you run again metagWGS for this step, you also have to indicate `--step "the_first_step,the_second_step,the_third,step"` for the same reasons.**
When you want to run a particular step, you just need to specify its name:
......@@ -165,7 +177,7 @@ When you want to run a particular step, you just need to specify its name:
Default: `01_clean_qc`.
For each [step](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/README.md#metagwgs-steps), specific parameters are available. You can add it to the command line and run the pipeline with it. They are described into the next section: [Other parameters step by step](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/usage.md#other-parameters-step-by-step).
For each [step](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/README.md#metagwgs-steps), specific parameters are available. You can add it to the command line and run the pipeline with it. They are described into the next section: [other parameters step by step](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/usage.md#other-parameters-step-by-step).
### 3. Other parameters step by step
......@@ -241,13 +253,14 @@ No parameters.
**WARNING 7:** `04_structural_annot` step depends on `01_clean_qc`, `02_assembly` and `03_filtering` (if you use it) steps. You need to use mandatory files of these four steps to run `04_structural_annot`. See [II. Input files](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/usage.md#ii-input-files) and WARNINGS from 1 to 6.
**WARNING 8:** if you haven't associated this step with `03_filtering`, calculation time of `04_structural_annot` can be important. Some cluster queues have defined calculation time, you need to adapt the queue you use to your data. For example, if you are on [genologin cluster](http://bioinfo.genotoul.fr/) and you haven't done `03_filtering` step, you can create into your working directory a file `nextflow.config` containing:
```bash
withName: prokka {
queue = 'unlimitq'
}
```
This will launch the `Prokka` command line of step `04_structural_annot` on a calculation queue (`unlimitq`) where the job can last more than 4 days (which is not the case for the usual `workq` queue).
**WARNING 8:** if you haven't associated this step with `03_filtering`, calculation time of `04_structural_annot` can be important. Some cluster queues have defined calculation time, you need to adapt the queue you use to your data.
> For example, if you are on [genologin cluster](http://bioinfo.genotoul.fr/) and you haven't done `03_filtering` step, you can create into your working directory a file `nextflow.config` containing:
> ```bash
> withName: prokka {
> queue = 'unlimitq'
> }
> ```
> This will launch the `Prokka` command line of step `04_structural_annot` on a calculation queue (`unlimitq`) where the job can last more than 4 days (which is not the case for the usual `workq` queue).
#### **`05_alignment` step:**
......@@ -299,6 +312,6 @@ See the description of output files in [this part](https://forgemia.inra.fr/geno
## VI. Analyze big test dataset with metagWGS in genologin cluster
If you have an account into [genologin cluster](http://bioinfo.genotoul.fr/) and you would like to familiarise yourself with metagWGS, see the tutorial available into the [use case documentation page](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/use_case.md). It allows to analyze big test dataset with metagWGS.
> If you have an account into [genologin cluster](http://bioinfo.genotoul.fr/) and you would like to familiarise yourself with metagWGS, see the tutorial available into the [use case documentation page](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/use_case.md). It allows to analyze big test dataset with metagWGS.
**WARNING:** the test dataset into `metagwgs/test` directory used in [I. Basic Usage](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/usage.md#i-basic-usage) is a small test dataset which does not allow to test all steps (`08_binning` doesn't work with this dataset).
# metagWGS: use case with big data test on genologin cluster
**WARNING:** be careful with the output files presented in this documentation page. The order of the columns corresponding to metrics of each sample (number of contigs, number of reads, etc) can be ordered differently in your results than the one presented in this page. However, the content of a column in a given sample must be the same in the results presented on this page as in your results.
## I. Introduction
This document describes example of scripts to run metagWGS on big test dataset on [genologin cluster](http://bioinfo.genotoul.fr/) and output files generated with these scripts.
......@@ -366,15 +368,15 @@ more ERR3201914_dedup_R2.nb_bases
#### 8. `03_filtering`
For each sample you will obtain two files and one directory:
- `SAMPLE_NAME_discard_contigs.10.fasta`
- `SAMPLE_NAME_select_contigs.10.fasta`
- `SAMPLE_NAME_select_contigs_QC`
- `SAMPLE_NAME_discard_contigs_cpm10.fasta`
- `SAMPLE_NAME_select_contigs_cpm10.fasta`
- `SAMPLE_NAME_select_contigs_QC/`
Now we are going to describe the results for the sample `ERR3201914`.
- `ERR3201914_discard_contigs.10.fasta` contains discarded contigs after filtering (contigs with CPM value < CPM_cutoff which is 10 here because we use default cutoff value). First lines of this file are:
- `ERR3201914_discard_contigs_cpm10.fasta` contains discarded contigs after filtering (contigs with CPM value < CPM_cutoff which is 10 here because we use default cutoff value). First lines of this file are:
```bash
head ERR3201914_discard_contigs.10.fasta
head ERR3201914_discard_contigs_cpm10.fasta
>NODE_1970_length_3822_cov_2.534908
CTGTGATCCACAGTTTCCTCCTGTTCCTGTCAACATTGGAATAAACGTATTTAAAATCGG
CATCACTGCAAGCGCTGCCTCAAAATGTCCTAAAATCAAACCTGTGATCGTTGCCGAGAG
......@@ -387,9 +389,9 @@ ACCTGTTGTCAGAAGATCTTTGACATCCACACTTCCAATCAGTTTTCTCTTTTCTGTCAC
ATAACAAGTATAGATCGTCTCTTTATTGATTCCGACCTGCCTGATTTTCAAAATTGCTTC
```
- `ERR3201914_select_contigs.10.fasta` contains contigs passing filter (contigs with CPM value >= CPM_cutoff which is 10 here because we use default cutoff value). First lines of this file are:
- `ERR3201914_select_contigs_cpm10.fasta` contains contigs passing filter (contigs with CPM value >= CPM_cutoff which is 10 here because we use default cutoff value). First lines of this file are:
```bash
head ERR3201914_select_contigs.10.fasta
head ERR3201914_select_contigs_cpm10.fasta
>NODE_1_length_451056_cov_35.997091
AATTGGTTTAGTAATTAAAAAAGGCGCTATATTATTGGCATGACAACAGGAATTAAAAAA
CCAAAACCATTAATGATTCCGTCACCCCATTCTGAGCTTTTAATATTTTAAAATCTGGAG
......@@ -576,7 +578,9 @@ The `raw` directory contains others files which are not main files. For more inf
### A. Explanations
With the next script, we want to run metagWGS on test dataset in order to have **`06_func_annot` step** results. This new script is the same script than `Script_filtering_binning.sh` where we have changed `08_binning` by `06_func_annot` into the `--step` parameter and where we have added the parameter `--eggnogmapper_db` to build eggNOG-mapper database for functional annotation. All previous choices have beed conserved: we don't know the real host genome for this dataset but we want to test host filtering: we decided to use **sus scrofa** as host genome. We also want to **filter contigs** after assembly with the default cpm value (10). It is this assembly that will be used in the following steps requiring the assembly files. Assembly tool used in this script is `metaspades`.
With the next script, we want to run metagWGS on test dataset in order to have **`06_func_annot` step** results. This new script is the same script than `Script_filtering_binning.sh` where we have changed `--step "03_filtering,08_binning"` by `--step "03_filtering,08_binning,06_func_annot"` into the `--step` parameter and where we have added the parameter `--eggnogmapper_db` to build eggNOG-mapper database for functional annotation. All previous choices have beed conserved: we don't know the real host genome for this dataset but we want to test host filtering: we decided to use **sus scrofa** as host genome. We also want to **filter contigs** after assembly with the default cpm value (10). It is this assembly that will be used in the following steps requiring the assembly files. Assembly tool used in this script is `metaspades`.
**NOTE:** keeping `08_binning` into the `--step` parameter allows to keep binning metrics in MultiQC html report file.
### B. Write the script `Script_filtering_functional.sh`
......@@ -594,7 +598,7 @@ module purge
module load bioinfo/Nextflow-v20.01.0
module load system/singularity-3.5.3
nextflow run -profile big_test_genotoul metagwgs/main.nf --reads "test_data/*_{1,2}.fastq.gz" --step "03_filtering,06_func_annot" --host_bwa_index "/work/bank/ebi/ensembl/sus_scrofa_genome/ensembl_sus_scrofa_genome_2020-07-20/bwa/ensembl_sus_scrofa_genome.{amb,ann,bwt,pac,sa}" --host_fasta "/work/bank/ebi/ensembl/sus_scrofa_genome/ensembl_sus_scrofa_genome_2020-07-20/bwa/ensembl_sus_scrofa_genome" --kaiju_db_dir "/bank/kaijudb/kaijudb_refseq_2020-05-25" --assembly metaspades --diamond_bank "/work/bank/diamonddb/nr.dmnd" --cat_db "CAT_prepare_20190108.tar.gz" --eggnogmapper_db -with-report -with-timeline -with-trace -with-dag -resume
nextflow run -profile big_test_genotoul metagwgs/main.nf --reads "test_data/*_{1,2}.fastq.gz" --step "03_filtering,08_binning,06_func_annot" --host_bwa_index "/work/bank/ebi/ensembl/sus_scrofa_genome/ensembl_sus_scrofa_genome_2020-07-20/bwa/ensembl_sus_scrofa_genome.{amb,ann,bwt,pac,sa}" --host_fasta "/work/bank/ebi/ensembl/sus_scrofa_genome/ensembl_sus_scrofa_genome_2020-07-20/bwa/ensembl_sus_scrofa_genome" --kaiju_db_dir "/bank/kaijudb/kaijudb_refseq_2020-05-25" --assembly metaspades --diamond_bank "/work/bank/diamonddb/nr.dmnd" --cat_db "CAT_prepare_20190108.tar.gz" --eggnogmapper_db -with-report -with-timeline -with-trace -with-dag -resume
```
If you want to understand each metagWGS parameter and Nextflow option used in this script, see [usage](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/usage.md) documentation page.
......@@ -640,17 +644,17 @@ executor > slurm (2)
[- ] process > download_taxonomy_db -
[- ] process > genome_length -
[- ] process > diamond_parser -
[- ] process > quantif_and_taxonomic_table... -
[- ] process > metabat -
[- ] process > busco_download_db -
[- ] process > busco -
[- ] process > busco_plot -
[- ] process > quast_bins -
[- ] process > merge_quast_and_busco -
[- ] process > cat_db -
[- ] process > cat -
[- ] process > quantif_and_taxonomic_table... - -
[d6/26aeb6] process > metabat [100%] 3 of 3, cached: 3 ✔
[fa/d48e80] process > busco_download_db [100%] 1 of 1, cached: 1 ✔
[30/617208] process > busco [100%] 60 of 60, cached: 60 ✔
[db/f91457] process > busco_plot [100%] 1 of 1, cached: 1 ✔
[9e/0fc4c3] process > quast_bins [100%] 3 of 3, cached: 3 ✔
[71/5e95c1] process > merge_quast_and_busco [100%] 1 of 1, cached: 1 ✔
[d6/4314ca] process > cat_db [100%] 1 of 1, cached: 1 ✔
[9c/3cf218] process > cat [100%] 3 of 3, cached: 3 ✔
[a8/7c8a44] process > get_software_versions [100%] 1 of 1, cached: 1 ✔
[3f/4ebe6e] process > multiqc [100%] 1 of 1, cached: 1 ✔
[3f/4ebe6e] process > multiqc [100%] 1 of 1 ✔
Completed at: 16-févr.-2021 21:04:06
Duration : (see below)
CPU hours : (see below)
......@@ -662,7 +666,7 @@ Cached : 46
### D. Output files
With `Script_filtering_functional.sh` you have run all steps allowing to run `06_func_annot` step, including `03_filtering` step: `01_clean_qc`, `02_assembly`, `03_filtering`, `04_structural_annot`, `05_alignment` and `06_func_annot`. But with the previous run script (`Script_filtering_binning.sh`) the steps `01_clean_qc`, `02_assembly`, `03_filtering`, `04_structural_annot` and `05_alignment` have already been launched. This is why the jobs associated to these steps in the previous slurm file are indicated as "`cached`". All output files of the pipeline related to these steps haven't been changed because they don't have been re-generated. They are presented into the chapter [IV.D](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/use_case.md#d-output-files).
With `Script_filtering_functional.sh` you have run all steps allowing to run `06_func_annot` step, including `03_filtering` step: `01_clean_qc`, `02_assembly`, `03_filtering`, `04_structural_annot`, `05_alignment` and `06_func_annot`. But with the previous run script (`Script_filtering_binning.sh`) the steps `01_clean_qc`, `02_assembly`, `03_filtering`, `04_structural_annot` and `05_alignment` have already been launched. This is why the jobs associated to these steps in the previous slurm file are indicated as "`cached`". Moreover, we keep `08_binning` into `--step` parameter. The jobs related to this step have been already launched in the first script so they are also indicated as "`cached`". Keeping `06_func_annot` allows to have a new MultiQC report file updated with metrics of all steps launched in the two scripts. All output files of the pipeline related to these cached steps haven't been changed because they don't have been re-generated. They are presented into the chapter [IV.D](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/use_case.md#d-output-files).
In the following sections, we will only present the main numerical output files in the subdirectory added to `results/` by this second script: `06_func_annot`.
#### 1. `06_func_annot/06_1_clustering`
......@@ -929,7 +933,9 @@ PFAMs ERR3201914.featureCounts.tsv ERR3201918.featureCounts.tsv ERR3201928.featu
### A. Explanations
With this last script, we want to run metagWGS on test dataset in order to have **`07_taxo_affi` step** results. This new script is the same script than `Script_filtering_functional.sh` (and so close to `Script_filtering_binning.sh)` where we have changed `06_func_annot` by `07_taxo_affi` into the `--step` parameter. All previous choices have beed conserved: we don't know the real host genome for this dataset but we want to test host filtering: we decided to use **sus scrofa** as host genome. We also want to **filter contigs** after assembly with the default cpm value (10). It is this assembly that will be used in the following steps requiring the assembly files. Assembly tool used in this script is `metaspades`. We have left the parameter `--eggnogmapper_db` but it is not really necessary here since it is not linked to the stage we wish to launch.
With this last script, we want to run metagWGS on test dataset in order to have **`07_taxo_affi` step** results. This new script is the same script than `Script_filtering_functional.sh` (and so close to `Script_filtering_binning.sh)` where we have added `07_taxo_affi` into the `--step` parameter: `--step "03_filtering,08_binning,06_func_annot,07_taxo_affi"`. All previous choices have beed conserved: we don't know the real host genome for this dataset but we want to test host filtering: we decided to use **sus scrofa** as host genome. We also want to **filter contigs** after assembly with the default cpm value (10). It is this assembly that will be used in the following steps requiring the assembly files. Assembly tool used in this script is `metaspades`.
**NOTE:** keeping `08_binning` and `06_func_annot` into the `--step` parameter allows to keep binning and functional annotation metrics in MultiQC html report file.
### B. Write the script `Script_filtering_taxo.sh`
......@@ -947,7 +953,7 @@ module purge
module load bioinfo/Nextflow-v20.01.0
module load system/singularity-3.5.3
nextflow run -profile big_test_genotoul metagwgs/main.nf --reads "test_data/*_{1,2}.fastq.gz" --step "03_filtering,07_taxo_affi" --host_bwa_index "/work/bank/ebi/ensembl/sus_scrofa_genome/ensembl_sus_scrofa_genome_2020-07-20/bwa/ensembl_sus_scrofa_genome.{amb,ann,bwt,pac,sa}" --host_fasta "/work/bank/ebi/ensembl/sus_scrofa_genome/ensembl_sus_scrofa_genome_2020-07-20/bwa/ensembl_sus_scrofa_genome" --kaiju_db_dir "/bank/kaijudb/kaijudb_refseq_2020-05-25" --assembly metaspades --diamond_bank "/work/bank/diamonddb/nr.dmnd" --cat_db "CAT_prepare_20190108.tar.gz" --eggnogmapper_db -with-report -with-timeline -with-trace -with-dag -resume
nextflow run -profile big_test_genotoul metagwgs/main.nf --reads "test_data/*_{1,2}.fastq.gz" --step "03_filtering,08_binning,06_func_annot,07_taxo_affi" --host_bwa_index "/work/bank/ebi/ensembl/sus_scrofa_genome/ensembl_sus_scrofa_genome_2020-07-20/bwa/ensembl_sus_scrofa_genome.{amb,ann,bwt,pac,sa}" --host_fasta "/work/bank/ebi/ensembl/sus_scrofa_genome/ensembl_sus_scrofa_genome_2020-07-20/bwa/ensembl_sus_scrofa_genome" --kaiju_db_dir "/bank/kaijudb/kaijudb_refseq_2020-05-25" --assembly metaspades --diamond_bank "/work/bank/diamonddb/nr.dmnd" --cat_db "CAT_prepare_20190108.tar.gz" --eggnogmapper_db -with-report -with-timeline -with-trace -with-dag -resume
```
If you want to understand each metagWGS parameter and Nextflow option used in this script, see [usage](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/usage.md) documentation page.
......@@ -981,27 +987,27 @@ executor > slurm (9)
[fd/3bbb88] process > reads_alignment_on_contigs [100%] 3 of 3, cached: 3 ✔
[87/8562a3] process > diamond [100%] 3 of 3, cached: 3 ✔
[91/3a9bea] process > diamond_header [100%] 1 of 1, cached: 1 ✔
[- ] process > individual_cd_hit -
[- ] process > global_cd_hit -
[- ] process > quantification -
[- ] process > quantification_table -
[- ] process > eggnog_mapper_db -
[- ] process > eggnog_mapper -
[- ] process > best_hits_diamond -
[- ] process > merge_quantif_and_functiona... -
[- ] process > make_functional_annotation_... -
[96/bdcb3a] process > individual_cd_hit [100%] 3 of 3, cached: 3 ✔
[a5/6b6fea] process > global_cd_hit [100%] 1 of 1, cached: 1 ✔
[57/8f1078] process > quantification [100%] 3 of 3, cached: 3 ✔
[fe/d01e78] process > quantification_table [100%] 1 of 1, cached: 1 ✔
[7c/0f9637] process > eggnog_mapper_db [100%] 1 of 1, cached: 1 ✔
[aa/fbb17b] process > eggnog_mapper [100%] 3 of 3, cached: 3 ✔
[e7/8f3dd1] process > best_hits_diamond [100%] 3 of 3, cached: 3 ✔
[a2/80f1d4] process > merge_quantif_and_functiona... [100%] 1 of 1, cached: 1 ✔
[ea/4b93b2] process > make_functional_annotation_... [100%] 1 of 1, cached: 1 ✔
[61/e8c0f3] process > download_taxonomy_db [100%] 1 of 1 ✔
[56/3ff48d] process > genome_length [100%] 3 of 3 ✔
[ea/e69c9e] process > diamond_parser [100%] 3 of 3 ✔
[ee/9eefb2] process > quantif_and_taxonomic_table... [100%] 1 of 1 ✔
[- ] process > metabat -
[- ] process > busco_download_db -
[- ] process > busco -
[- ] process > busco_plot -
[- ] process > quast_bins -
[- ] process > merge_quast_and_busco -
[- ] process > cat_db -
[- ] process > cat -
[d6/26aeb6] process > metabat [100%] 3 of 3, cached: 3 ✔
[fa/d48e80] process > busco_download_db [100%] 1 of 1, cached: 1 ✔
[30/617208] process > busco [100%] 60 of 60, cached: 60 ✔
[db/f91457] process > busco_plot [100%] 1 of 1, cached: 1 ✔
[9e/0fc4c3] process > quast_bins [100%] 3 of 3, cached: 3 ✔
[71/5e95c1] process > merge_quast_and_busco [100%] 1 of 1, cached: 1 ✔
[d6/4314ca] process > cat_db [100%] 1 of 1, cached: 1 ✔
[9c/3cf218] process > cat [100%] 3 of 3, cached: 3 ✔
[a8/7c8a44] process > get_software_versions [100%] 1 of 1, cached: 1 ✔
[38/01c0a5] process > multiqc [100%] 1 of 1 ✔
Completed at: 21-févr.-2021 19:32:59
......@@ -1013,7 +1019,7 @@ Cached : 45
### D. Output files
With `Script_filtering_taxo.sh` you have run all steps allowing to run `07_taxo_affi` step, including `03_filtering` step: `01_clean_qc`, `02_assembly`, `03_filtering`, `04_structural_annot`, `05_alignment` and `07_taxo_affi`. With the first script (`Script_filtering_binning.sh`) the steps `01_clean_qc`, `02_assembly`, `03_filtering`, `04_structural_annot` and `05_alignment` have already been launched. This is why the jobs associated to these steps in the previous slurm file are indicated as "`cached`". All output files of the pipeline related to these steps haven't been changed because they don't have been re-generated. They are presented into the chapter [IV.D](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/use_case.md#d-output-files).
With `Script_filtering_taxo.sh` you have run all steps allowing to run `07_taxo_affi` step, including `03_filtering` step: `01_clean_qc`, `02_assembly`, `03_filtering`, `04_structural_annot`, `05_alignment` and `07_taxo_affi`. With the first script (`Script_filtering_binning.sh`) the steps `01_clean_qc`, `02_assembly`, `03_filtering`, `04_structural_annot` and `05_alignment` have already been launched. This is why the jobs associated to these steps in the previous slurm file are indicated as "`cached`". Moreover, we keep `08_binning` and `06_func_annot` into `--step` parameter. The jobs related to thess steps have been already launched in the first and in the second script so they are also indicated as "`cached`". Keeping `08_binning` and `06_func_annot` allows to have a new MultiQC report file updated with metrics of all steps launched in the three scripts. All output files of the pipeline related to these cached steps haven't been changed because they don't have been re-generated. They are presented into the chapters [IV.D](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/use_case.md#d-output-files) and [V.D](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/use_case.md#d-output-files-1).
In the following sections, we will only present the main numerical output files in the subdirectory added to `results/` by this third script: `07_taxo_affi`.
#### 1. `07_taxo_affi/SAMPLE_NAME`
......
......@@ -27,9 +27,20 @@
Options:
--step Choose one step into: "01_clean_qc", "02_assembly", "03_filtering", "04_structural_annot", "05_alignment", "06_func_annot", "07_taxo_affi", "08_binning".
Only exception: "03_filtering" step can be associated with coma to "04_structural_annot", "05_alignment", "06_func_annot", "07_taxo_affi" and "08_binning".
--step Choose step(s) into: "01_clean_qc", "02_assembly", "03_filtering",
"04_structural_annot", "05_alignment", "06_func_annot", "07_taxo_affi", "08_binning".
i. You can directly indicate the final step that is important to you. For example,
if you are interested in binning (and the taxonomic affiliation of bins), just use --step "08_binning".
It runs the previous steps automatically (except "03_filtering", see ii).
ii. "03_filtering" is automatically skipped for the next steps
"04_structural_annot", "05_alignment", "06_func_annot", "07_taxo_affi" and "08_binning".
If you want to filter your assembly before doing one of these steps, you must use --step "03_filtering,the_step",
for example --step "03_filtering,04_structural_annot".
iii. When you run one of the three steps "06_func_annot", "07_taxo_affi" or "08_binning" during a first analysis
and then another of these steps interests you and you run metagWGS again to get the result of this other step,
you have to indicate --step "the_first_step,the_second_step". This will allow you to have a final MultiQC html report
that will take into account the metrics of both analyses performed. If the third of these steps interests you and you run again
metagWGS for this step, you also have to indicate --step "the_first_step,the_second_step,the_third,step" for the same reasons.
01_clean_qc options:
--skip_01_clean_qc Skip 01_clean_qc step.
--adapter1 Sequence of adapter 1. Default: Illumina TruSeq adapter.
......@@ -87,12 +98,14 @@
*/
// Show help message.
if (params.help){
helpMessage()
exit 0
}
// Define list of available steps
// Define list of available steps.
def defineStepList() {
return [
'01_clean_qc',
......@@ -106,7 +119,7 @@ def defineStepList() {
]
}
// Check step existence
// Check step existence.
def checkParameterExistence(list_it, list) {
nb_false_step = 0
......@@ -120,27 +133,13 @@ def checkParameterExistence(list_it, list) {
else {return true}
}
// Check number of steps
def numberParameter(list_it, list) {
nb_step = 0
filtering_step = false
for(it in list_it) {
nb_step = nb_step + 1
if(it == '03_filtering') {filtering_step = true}
}
if(nb_step > 2) {return false}
else {
if((nb_step == 2) && (!filtering_step)){return false}
else {return true}
}
}
// Check number of steps.
// Set up parameters.
step = params.step.split(",")
stepList = defineStepList()
if (!checkParameterExistence(step, stepList)) exit 1, "Unknown step ${step}, see --help for more information"
if (!numberParameter(step, stepList)) exit 1, "You can choose one step (or two step only if one of two is 03_filtering step)"
if (!checkParameterExistence(step, stepList)) exit 1, "Unknown step(s) upon ${step}, see --help for more information"
if (!['metaspades','megahit'].contains(params.assembly)){
exit 1, "Invalid aligner option: ${params.assembly}. Valid options: 'metaspades', 'megahit'"
......@@ -326,8 +325,8 @@ if (!params.skip_removal_host && ('01_clean_qc' in step || '02_assembly' in step
output:
set replicateId, file("cleaned_${replicateId}_R1.fastq.gz"), file("cleaned_${replicateId}_R2.fastq.gz") into filter_reads_ch
file("host_filter_flagstat/*.host_filter.flagstat") into host_filter_ch_for_multiqc
file("${replicateId}.no_filter.flagstat") into flagstat_bam_ch_for_multiqc
file("host_filter_flagstat/${replicateId}.host_filter.flagstat") into flagstat_after_host_filter_for_multiqc_ch
file("${replicateId}.no_filter.flagstat") into flagstat_before_filter_for_multiqc_ch
file("${replicateId}_cleaned_R1.nb_bases")
file("${replicateId}_cleaned_R2.nb_bases")
......@@ -350,8 +349,8 @@ if (!params.skip_removal_host && ('01_clean_qc' in step || '02_assembly' in step
}
else {
intermediate_cleaned_ch.set{preprocessed_reads_ch}
Channel.empty().set{host_filter_ch_for_multiqc}
Channel.empty().set{flagstat_bam_ch_for_multiqc}
Channel.empty().set{flagstat_after_host_filter_for_multiqc_ch}
Channel.empty().set{flagstat_before_filter_for_multiqc_ch}
}
preprocessed_reads_ch.into{
......@@ -371,7 +370,8 @@ process fastqc_raw {
set replicateId, file(read1), file(read2) from raw_reads_fastqc
output:
file("${replicateId}") into fastqc_raw_ch_for_multiqc
file("${replicateId}/*.zip") into fastqc_raw_for_multiqc_ch
file("${replicateId}/*.html") into fastqc_raw_ch
when: ('01_clean_qc' in step || '02_assembly' in step || '03_filtering' in step || '04_structural_annot' in step || '05_alignment' in step || '06_func_annot' in step || '07_taxo_affi' in step || '08_binning' in step) && (!params.skip_01_clean_qc)
......@@ -390,7 +390,8 @@ process fastqc_cleaned {
set replicateId, file(read1), file(read2) from clean_reads_for_fastqc_ch
output:
file("cleaned_${replicateId}") into fastqc_cleaned_ch_for_multiqc
file("cleaned_${replicateId}/*.zip") into fastqc_cleaned_for_multiqc_ch
file("cleaned_${replicateId}/*.html") into fastqc_cleaned_ch
when: ('01_clean_qc' in step || '02_assembly' in step || '03_filtering' in step || 'structural' in step || '05_alignment' in step || '06_func_annot' in step || '07_taxo_affi' in step || '08_binning' in step) && (!params.skip_01_clean_qc)
......@@ -589,6 +590,7 @@ process quast {
output:
file("${replicateId}_all_contigs_QC/*") into quast_assembly_ch
file("${replicateId}_all_contigs_QC/report.tsv") into quast_assembly_for_multiqc_ch
when: ('02_assembly' in step || '03_filtering' in step || '04_structural_annot' in step || '05_alignment' in step || '06_func_annot' in step || '07_taxo_affi' in step || '08_binning' in step)
......@@ -615,7 +617,7 @@ process reads_deduplication {
set replicateId, file("${replicateId}_R1_dedup.fastq.gz"), file("${replicateId}_R2_dedup.fastq.gz") into deduplicated_reads_ch, deduplicated_reads_copy_ch
set replicateId, file("${replicateId}.count_reads_on_contigs.idxstats") into (idxstats_filter_logs_ch, idxstats_filter_logs_for_multiqc_ch)
set replicateId, file("${replicateId}.count_reads_on_contigs.flagstat") into flagstat_filter_logs_ch
file("${replicateId}.count_reads_on_contigs.flagstat") into flagstat_filter_logs_for_multiqc_ch
file("${replicateId}.count_reads_on_contigs.flagstat") into flagstat_after_dedup_reads_for_multiqc_ch
file("${replicateId}_dedup_R1.nb_bases")
file("${replicateId}_dedup_R2.nb_bases")
......@@ -650,16 +652,17 @@ process assembly_filter {
val min_cpm from min_contigs_cpm_ch
output:
set replicateId, file("${replicateId}_select_contigs.${min_cpm}.fasta") into select_assembly_ch
set replicateId, file("${replicateId}_discard_contigs.${min_cpm}.fasta") into discard_assembly_ch
set replicateId, file("${replicateId}_select_contigs_cpm${min_cpm}.fasta") into select_assembly_ch
set replicateId, file("${replicateId}_discard_contigs_cpm${min_cpm}.fasta") into discard_assembly_ch
file("${replicateId}_select_contigs_QC/*") into quast_select_contigs_ch
file("${replicateId}_select_contigs_QC/report.tsv") into quast_select_contigs_for_multiqc_ch
when: ('03_filtering' in step)
script:
"""
Filter_contig_per_cpm.py -i ${idxstats} -f ${assembly_file} -c ${min_cpm} -s ${replicateId}_select_contigs.${min_cpm}.fasta -d ${replicateId}_discard_contigs.${min_cpm}.fasta
metaquast.py --threads "${task.cpus}" --rna-finding --max-ref-number 0 --min-contig 0 "${replicateId}_select_contigs.${min_cpm}.fasta" -o "${replicateId}_select_contigs_QC"
Filter_contig_per_cpm.py -i ${idxstats} -f ${assembly_file} -c ${min_cpm} -s ${replicateId}_select_contigs_cpm${min_cpm}.fasta -d ${replicateId}_discard_contigs_cpm${min_cpm}.fasta
metaquast.py --threads "${task.cpus}" --rna-finding --max-ref-number 0 --min-contig 0 "${replicateId}_select_contigs_cpm${min_cpm}.fasta" -o "${replicateId}_select_contigs_QC"
"""
}
......@@ -679,7 +682,8 @@ process prokka {
set replicateId, file(assembly_file) from select_assembly_ch
output:
set replicateId, file("*") into prokka_ch, prokka_for_multiqc_ch
set replicateId, file("*") into prokka_ch
set replicateId, file("PROKKA_${replicateId}/${replicateId}.txt") into prokka_for_multiqc_ch
when: ('04_structural_annot' in step || '05_alignment' in step || '06_func_annot' in step || '07_taxo_affi' in step || '08_binning' in step)
......@@ -1415,9 +1419,9 @@ process cat {
COUNT=`ls -1 bins/*.fa 2>/dev/null | wc -l`
if [ \$COUNT != 0 ]
then
CAT bins -b "bins/" -d database/ -t taxonomy/ -n "${task.cpus}" -s .fa --top 6 -o "${replicateId}" --I_know_what_Im_doing
CAT add_names -i "${replicateId}.ORF2LCA.txt" -o "${replicateId}.ORF2LCA.names.txt" -t taxonomy/
CAT add_names -i "${replicateId}.bin2classification.txt" -o "${replicateId}.bin2classification.names.txt" -t taxonomy/
CAT bins -b "bins/" -d database/ -t taxonomy/ -n "${task.cpus}" -s .fa --top 6 -o "${replicateId}" --I_know_what_Im_doing > stdout_CAT.txt
CAT add_names -i "${replicateId}.ORF2LCA.txt" -o "${replicateId}.ORF2LCA.names.txt" -t taxonomy/ >> stdout_CAT.txt
CAT add_names -i "${replicateId}.bin2classification.txt" -o "${replicateId}.bin2classification.names.txt" -t taxonomy/ >> stdout_CAT.txt
else
echo "Sample ${replicateId}: no bins found, it is impossible to make taxonomic affiliation of bins."
fi
......@@ -1474,23 +1478,23 @@ process multiqc {
input:
file multiqc_config from multiqc_config_ch
file ('*_cutadapt.log') from cutadapt_log_ch_for_multiqc.collect().ifEmpty([])
file ('*_sickle.log') from sickle_log_ch_for_multiqc.collect().ifEmpty([])
file ('raw_*') from fastqc_raw_ch_for_multiqc.collect().ifEmpty([])
file ('cleaned_*') from fastqc_cleaned_ch_for_multiqc.collect().ifEmpty([])
file("*_select_contigs_QC/*") from quast_select_contigs_ch.collect().ifEmpty([])
file("*_all_contigs_QC/*") from quast_assembly_ch.collect().ifEmpty([])
file ('*') from cutadapt_log_ch_for_multiqc.collect().ifEmpty([])
file ('*') from sickle_log_ch_for_multiqc.collect().ifEmpty([])
file ('*') from fastqc_raw_for_multiqc_ch.collect().ifEmpty([])
file ('*') from fastqc_cleaned_for_multiqc_ch.collect().ifEmpty([])
file("*_select_contigs_QC/*") from quast_select_contigs_for_multiqc_ch.collect().ifEmpty([])
file("*_all_contigs_QC/*") from quast_assembly_for_multiqc_ch.collect().ifEmpty([])
file("*") from prokka_for_multiqc_ch.collect().ifEmpty([])
file("*") from kaiju_summary_for_multiqc_ch.collect().ifEmpty([])
file("*.summary") from featureCounts_out_ch_for_multiqc.collect().ifEmpty([])
file("*") from featureCounts_out_ch_for_multiqc.collect().ifEmpty([])
file ('software_versions/*') from software_versions_yaml.collect().ifEmpty([])
file ('short_summary_*.txt') from busco_summary_to_multiqc.collect().ifEmpty([])
file("host_filter_flagstat/*.host_filter.flagstat") from host_filter_ch_for_multiqc.collect().ifEmpty([])
file("*.no_filter.flagstat") from flagstat_bam_ch_for_multiqc.collect().ifEmpty([])
file("*") from flagstat_filter_logs_for_multiqc_ch.collect().ifEmpty([])
file ('*') from busco_summary_to_multiqc.collect().ifEmpty([])
file("host_filter_flagstat/*") from flagstat_after_host_filter_for_multiqc_ch.collect().ifEmpty([])
file("*") from flagstat_before_filter_for_multiqc_ch.collect().ifEmpty([])
file("*") from flagstat_after_dedup_reads_for_multiqc_ch.collect().ifEmpty([])
output:
file "*multiqc_report.html" into ch_multiqc_report
file "multiqc_report.html" into ch_multiqc_report
script:
"""
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment