Commit 61422432 authored by Celine Noirot's avatar Celine Noirot
Browse files

add documentation on how to use

parent 7e321f7d
......@@ -3,6 +3,57 @@
[![pipeline status](https://forgemia.inra.fr/get-nextflow-ngl-bi/template-nf/badges/master/pipeline.svg)](https://forgemia.inra.fr/get-nextflow-ngl-bi/template-nf//-/commits/master)
# Ce repository est un template pour les workflows Get
Ce workflow et ses différentes configurations permettent :
- d'executer un pipeline a partir d'un fichier samples.csv
- d'utiliser une image singularity ou conda ou path (cf profils)
- d'executer un multiqc
- de tracer les versions des logiciels
- d'envoyer un email à la fin du pipeline --email toto@fai.fr
- de générer automatiquement une image singularity et de la mettre a disposition dans le registry de la forge.
## Comment utiliser ce répository ?
Cloner le repo
```
git clone git@forgemia.inra.fr:get-nextflow-ngl-bi/template-nf.git
```
Voici la liste des fichiers a récupérer avec leur utilité :
- `asset` code pour email et config de multiQC
- `conf` configurations utilisées dans `nextflow.config`
- base : conf générale
- path : si profile utilisé est --multipath ajouter un block par process ayant des dépendances
- test : chaque pipeline devra avoir un profil de test pour tester les pipelines
- genomes : devra peut-etre etre centralisé ailleurs pour avoir un seul fichier contenant les genomes utilisés par la pf.
- `doc/output.md` : ce fichier devra etre copié et modifié avec la description des outputs du pipeline. Ce fichier est ensuite converti en html dans le repertoires de resultats du pipelines.
- `.gitlab-ci.yml` si vous souhaitez avoir la génération automatique de l'image singularity à partir des fichiers `Singularityfile` et `environment.yml` mettez ce fichier à la racine de votre projet. L'image sera ensuite recupérable avec la commande suivante :
```
singularity pull template-nf.sif oras://registry.forgemia.inra.fr/get-nextflow-ngl-bi/template-nf/template-nf:latest
```
- les fichiers `CHANGELOG.md`, `LICENCE`, `README.md` a utiliser et modifier
- `main.nf` : le pipeline
- `nextflow.config` : la conf générale du pipeline
- pour le reproductibilité : `Singularityfile` et `environment.yml` (si besoin en plus: `Dockerfile`)
## Et apres ?
- mettre en place des données de tests
- lorsque l'on code un process :
- utiliser les labels (pour la memoire, cpu, temps) définis dans base.config
- ajouter les logiciels utilisés dans get_software_versions
- documenter le quick start ci-dessous et supprimer le paragraphe 'Ce repository est un template pour les workflows Get'
- completer le `doc/output.md` et le `doc/usage.md`
- tagger un pipeline dès que les fonctionnalités attendues sont codées
# Documentation a completer pour les pipelines suivants :
## Introduction
The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker and singularity containers making installation trivial and results highly reproducible.
......@@ -11,16 +62,18 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool
i. Install [`nextflow`](https://nf-co.re/usage/installation)
ii. Install one of [`docker`](https://docs.docker.com/engine/installation/), [`singularity`](https://www.sylabs.io/guides/3.0/user-guide/) or [`conda`](https://conda.io/miniconda.html)
ii. Install one of [`singularity`](https://www.sylabs.io/guides/3.0/user-guide/) or [`conda`](https://conda.io/miniconda.html)
iii. Download the pipeline and test it on a minimal dataset with a single command
iii. Clone the pipeline and download the singularity pipeline
```bash
nextflow run https://forgemia.inra.fr/get-nextflow-ngl-bi/template-nf -profile test,<docker/singularity/conda>
git clone git@forgemia.inra.fr:get-nextflow-ngl-bi/template-nf.git
cd template-nf
singularity pull template-nf.sif oras://registry.forgemia.inra.fr/get-nextflow-ngl-bi/template-nf/template-nf:latest
```
iv. Run the pipeline
singularity pull template.sif oras://registry.forgemia.inra.fr/get-nextflow-ngl-bi/template-nf/template-nf:latest
nextflow run main.nf -with-singularity template.sif
```bash
nextflow run pathto/template-nf/main.nf -profile test,singularity
```
<html>
<head>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="description" content="nf-core/template: get workflow template">
<title>Get/NomWorkflow Pipeline Report</title>
</head>
<body>
<div style="font-family: Helvetica, Arial, sans-serif; padding: 30px; max-width: 800px; margin: 0 auto;">
<img src="cid:nfcorepipelinelogo">
<h1>nf-core/template v${version}</h1>
<h2>Run Name: $runName</h2>
<% if (!success){
out << """
<div style="color: #a94442; background-color: #f2dede; border-color: #ebccd1; padding: 15px; margin-bottom: 20px; border: 1px solid transparent; border-radius: 4px;">
<h4 style="margin-top:0; color: inherit;">GeT/template execution completed unsuccessfully!</h4>
<p>The exit status of the task that caused the workflow execution to fail was: <code>$exitStatus</code>.</p>
<p>The full error message was:</p>
<pre style="white-space: pre-wrap; overflow: visible; margin-bottom: 0;">${errorReport}</pre>
</div>
"""
} else {
out << """
<div style="color: #3c763d; background-color: #dff0d8; border-color: #d6e9c6; padding: 15px; margin-bottom: 20px; border: 1px solid transparent; border-radius: 4px;">
nf-core/template execution completed successfully!
</div>
"""
}
%>
<p>The workflow was completed at <strong>$dateComplete</strong> (duration: <strong>$duration</strong>)</p>
<p>The command used to launch the workflow was as follows:</p>
<pre style="white-space: pre-wrap; overflow: visible; background-color: #ededed; padding: 15px; border-radius: 4px; margin-bottom:30px;">$commandLine</pre>
<h3>Pipeline Configuration:</h3>
<table style="width:100%; max-width:100%; border-spacing: 0; border-collapse: collapse; border:0; margin-bottom: 30px;">
<tbody style="border-bottom: 1px solid #ddd;">
<% out << summary.collect{ k,v -> "<tr><th style='text-align:left; padding: 8px 0; line-height: 1.42857143; vertical-align: top; border-top: 1px solid #ddd;'>$k</th><td style='text-align:left; padding: 8px; line-height: 1.42857143; vertical-align: top; border-top: 1px solid #ddd;'><pre style='white-space: pre-wrap; overflow: visible;'>$v</pre></td></tr>" }.join("\n") %>
</tbody>
</table>
<p>GeT/template</p>
<p><a href="https://forgemia.inra.fr/get-nextflow-ngl-bi/template-nf">https://forgemia.inra.fr/get-nextflow-ngl-bi/template-nf</a></p>
</div>
</body>
</html>
----------------------------------------------------
,--./,-.
___ __ __ __ ___ /,-._.--~\\
|\\ | |__ __ / ` / \\ |__) |__ } {
| \\| | \\__, \\__/ | \\ |___ \\`-._,-`-,
`._,._,'
GeT/template v${version}
----------------------------------------------------
......
To: $email
Subject: $subject
Mime-Version: 1.0
Content-Type: multipart/related;boundary="nfcoremimeboundary"
--nfcoremimeboundary
Content-Type: text/html; charset=utf-8
$email_html
--nfcoremimeboundary
Content-Type: image/png;name="get_logo.png"
Content-Transfer-Encoding: base64
Content-ID: <nfcorepipelinelogo>
Content-Disposition: inline; filename="get_logo.png"
<% out << new File("$baseDir/assets/get_logo.png").
bytes.
encodeBase64().
toString().
tokenize( '\n' )*.
toList()*.
collate( 76 )*.
collect { it.join() }.
flatten().
join( '\n' ) %>
<%
if (mqcFile){
def mqcFileObj = new File("$mqcFile")
if (mqcFileObj.length() < mqcMaxSize){
out << """
--nfcoremimeboundary
Content-Type: text/html; name=\"multiqc_report\"
Content-Transfer-Encoding: base64
Content-ID: <mqcreport>
Content-Disposition: attachment; filename=\"${mqcFileObj.getName()}\"
${mqcFileObj.
bytes.
encodeBase64().
toString().
tokenize( '\n' )*.
toList()*.
collate( 76 )*.
collect { it.join() }.
flatten().
join( '\n' )}
"""
}}
%>
--nfcoremimeboundary--
//not tested.
withName:fastqc {
process.beforeScript = "export PATH=/path/to/fastqc:$PATH"
}
withName:multiqc {
process.beforeScript = "export PATH=/path/to/multiqc:$PATH"
}
\ No newline at end of file
# get-nextflow-ngl-bi/template-nf: Usage
## Inputs
You will need to create a samplesheet `samples.csv` file with information about the samples in your input directory before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 5 columns, and a header row as shown in the examples below.
```bash
--inputdir '/directory/to/data'
```
or
```bash
--inputdir '/directory/to/data' --samplesheet /path/to/samples.csv
```
Below is an example for a single reads, than paireds reasd:
```bash
#id,name,fastq_1,fastq_2
1,sample1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
2,control,AEG588A2_S2_L002_R1_001.fastq.gz,AEG588A2_S2_L002_R2_001.fastq.gz
3,controlsingle,AEG588A3_S3_L002_R1_001.fastq.gz
```
## Running the pipeline
The typical command for running the pipeline is as follows:
```bash
nextflow run path/to/main.nf --inputdir '/directory/to/data' --samplesheet /path/to/samples.csv -profile singularity
```
This will launch the pipeline with the `singularity` configuration profile. See below for more information about profiles.
Note that the pipeline will create the following files in your working directory:
```bash
work # Directory containing the nextflow working files
results # Finished results (configurable, see below)
.nextflow_log # Log file from Nextflow
# Other nextflow hidden files, eg. history of pipeline runs and old logs.
```
## Pipeline arguments
### `--contaminant`
Set value define in `conf/genomes.config`. Depend on your pipeline needs.
### `--email myemail@fai.com`
Set to receive email when pipeline is complete.
> Add parameters specific to your pipeline
## Core Nextflow arguments
> **NB:** These options are part of Nextflow and use a _single_ hyphen (pipeline parameters use a double-hyphen).
### `-profile`
Use this parameter to choose a configuration profile. Profiles can give configuration presets for different compute environments.
Several generic profiles are bundled with the pipeline which instruct the pipeline to use software packaged using different methods (Docker, Singularity, Podman, Conda) - see below.
> We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported.
Note that multiple profiles can be loaded, for example: `-profile test,docker` - the order of arguments is important!
They are loaded in sequence, so later profiles can overwrite earlier profiles.
If `-profile` is not specified, the pipeline will run locally and expect all software to be installed and available on the `PATH`. This is _not_ recommended.
* `docker`
* A generic configuration profile to be used with [Docker](https://docker.com/)
* Pulls software from Docker Hub: [`nfcore/rnaseq`](https://hub.docker.com/r/nfcore/rnaseq/)
* `singularity`
* A generic configuration profile to be used with [Singularity](https://sylabs.io/docs/)
* Pulls software from Docker Hub: [`nfcore/rnaseq`](https://hub.docker.com/r/nfcore/rnaseq/)
* `conda`
* Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker, Singularity or Podman.
* A generic configuration profile to be used with [Conda](https://conda.io/docs/)
* Pulls most software from [Bioconda](https://bioconda.github.io/)
* `test`
* A profile with a complete configuration for automated testing
* Includes links to test data so needs no other parameters
* `path`
* A profile with a configuration to use binaries store in directory specified with --globalPath
* `multipath`
* A profile with a specific configuration for each process
* The user must configure file in `conf/path.config`
### `-resume`
Specify this when restarting a pipeline. Nextflow will used cached results from any pipeline steps where the inputs are the same, continuing from where it got to previously.
You can also supply a run name to resume a specific run: `-resume [run-name]`. Use the `nextflow log` command to show previous run names.
#### Custom resource requests
Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with an error code of `143` (exceeded requested resources) it will automatically resubmit with higher requests (2 x original, then 3 x original). If it still fails after three times then the pipeline is stopped.
Whilst these default requirements will hopefully work for most people with most data, you may find that you want to customise the compute resources that the pipeline requests. You can do this by creating a custom config file. For example, to give the workflow process `star` 32GB of memory, you could use the following config:
```nextflow
process {
withName: star {
memory = 32.GB
}
}
```
See the main [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for more information.
### Running in the background
Nextflow handles job submissions and supervises the running jobs. The Nextflow process must run until the pipeline is finished.
The Nextflow `-bg` flag launches Nextflow in the background, detached from your terminal so that the workflow does not stop if you log out of your session. The logs are saved to a file.
Alternatively, you can use `screen` / `tmux` or similar tool to create a detached session which you can log back into at a later time.
Some HPC setups also allow you to run nextflow within a cluster job submitted your job scheduler (from where it submits more jobs).
#### Nextflow memory requirements
In some cases, the Nextflow Java virtual machines can start to request a large amount of memory.
We recommend adding the following line to your environment to limit this (typically in `~/.bashrc` or `~./bash_profile`):
```bash
NXF_OPTS='-Xms1g -Xmx4g'
```
......@@ -206,7 +206,7 @@ process fastqc {
set val(name), file(reads) from ch_read_files_for_fastqc
output:
file "*_fastqc.{zip,html}" into fastqc_results_for_multiqc
file "*_fastqc.{zip,html}" into ch_fastqc_results_for_multiqc
script:
"""
......
......@@ -27,6 +27,9 @@ params {
config_profile_description = false
config_profile_contact = false
config_profile_url = false
// if use -profile path specify path where all binaries are stored
globalPath = ""
}
params {
......@@ -50,6 +53,8 @@ profiles {
docker { docker.enabled = true }
singularity { singularity.enabled = true }
test { includeConfig 'conf/test.config' }
path { process.beforeScript = "export PATH=${params.globalPath}:$PATH" }
multipath { includeConfig 'conf/path.config' }
}
// Avoid this error:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment