Voici la liste des fichiers a récupérer avec leur utilité :
-`asset` code pour email et config de multiQC
-`conf` configurations utilisées dans `nextflow.config`
- base : conf générale
- path : si profile utilisé est --multipath ajouter un block par process ayant des dépendances
- test : chaque pipeline devra avoir un profil de test pour tester les pipelines
- genomes : devra peut-etre etre centralisé ailleurs pour avoir un seul fichier contenant les genomes utilisés par la pf.
-`doc/output.md` : ce fichier devra etre copié et modifié avec la description des outputs du pipeline. Ce fichier est ensuite converti en html dans le repertoires de resultats du pipelines.
-`.gitlab-ci.yml` si vous souhaitez avoir la génération automatique de l'image singularity à partir des fichiers `Singularityfile` et `environment.yml` mettez ce fichier à la racine de votre projet. L'image sera ensuite recupérable avec la commande suivante :
- les fichiers `CHANGELOG.md`, `LICENCE`, `README.md` a utiliser et modifier
-`main.nf` : le pipeline
-`nextflow.config` : la conf générale du pipeline
- pour le reproductibilité : `Singularityfile` et `environment.yml` (si besoin en plus: `Dockerfile`)
## Et apres ?
- mettre en place des données de tests
- lorsque l'on code un process :
- utiliser les labels (pour la memoire, cpu, temps) définis dans base.config
- ajouter les logiciels utilisés dans get_software_versions
- documenter le quick start ci-dessous et supprimer le paragraphe 'Ce repository est un template pour les workflows Get'
- completer le `doc/output.md` et le `doc/usage.md`
- tagger un pipeline dès que les fonctionnalités attendues sont codées
# Documentation a completer pour les pipelines suivants :
## Introduction
The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker and singularity containers making installation trivial and results highly reproducible.
...
...
@@ -11,16 +62,18 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool
i. Install [`nextflow`](https://nf-co.re/usage/installation)
ii. Install one of [`docker`](https://docs.docker.com/engine/installation/), [`singularity`](https://www.sylabs.io/guides/3.0/user-guide/) or [`conda`](https://conda.io/miniconda.html)
ii. Install one of [`singularity`](https://www.sylabs.io/guides/3.0/user-guide/) or [`conda`](https://conda.io/miniconda.html)
iii. Download the pipeline and test it on a minimal dataset with a single command
iii. Clone the pipeline and download the singularity pipeline
```bash
nextflow run https://forgemia.inra.fr/get-nextflow-ngl-bi/template-nf -profiletest,<docker/singularity/conda>
You will need to create a samplesheet `samples.csv` file with information about the samples in your input directory before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 5 columns, and a header row as shown in the examples below.
The typical command for running the pipeline is as follows:
```bash
nextflow run path/to/main.nf --inputdir'/directory/to/data'--samplesheet /path/to/samples.csv -profile singularity
```
This will launch the pipeline with the `singularity` configuration profile. See below for more information about profiles.
Note that the pipeline will create the following files in your working directory:
```bash
work # Directory containing the nextflow working files
results # Finished results (configurable, see below)
.nextflow_log # Log file from Nextflow
# Other nextflow hidden files, eg. history of pipeline runs and old logs.
```
## Pipeline arguments
### `--contaminant`
Set value define in `conf/genomes.config`. Depend on your pipeline needs.
### `--email myemail@fai.com`
Set to receive email when pipeline is complete.
> Add parameters specific to your pipeline
## Core Nextflow arguments
> **NB:** These options are part of Nextflow and use a _single_ hyphen (pipeline parameters use a double-hyphen).
### `-profile`
Use this parameter to choose a configuration profile. Profiles can give configuration presets for different compute environments.
Several generic profiles are bundled with the pipeline which instruct the pipeline to use software packaged using different methods (Docker, Singularity, Podman, Conda) - see below.
> We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported.
Note that multiple profiles can be loaded, for example: `-profile test,docker` - the order of arguments is important!
They are loaded in sequence, so later profiles can overwrite earlier profiles.
If `-profile` is not specified, the pipeline will run locally and expect all software to be installed and available on the `PATH`. This is _not_ recommended.
*`docker`
* A generic configuration profile to be used with [Docker](https://docker.com/)
* Pulls software from Docker Hub: [`nfcore/rnaseq`](https://hub.docker.com/r/nfcore/rnaseq/)
*`singularity`
* A generic configuration profile to be used with [Singularity](https://sylabs.io/docs/)
* Pulls software from Docker Hub: [`nfcore/rnaseq`](https://hub.docker.com/r/nfcore/rnaseq/)
*`conda`
* Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker, Singularity or Podman.
* A generic configuration profile to be used with [Conda](https://conda.io/docs/)
* Pulls most software from [Bioconda](https://bioconda.github.io/)
*`test`
* A profile with a complete configuration for automated testing
* Includes links to test data so needs no other parameters
*`path`
* A profile with a configuration to use binaries store in directory specified with --globalPath
*`multipath`
* A profile with a specific configuration for each process
* The user must configure file in `conf/path.config`
### `-resume`
Specify this when restarting a pipeline. Nextflow will used cached results from any pipeline steps where the inputs are the same, continuing from where it got to previously.
You can also supply a run name to resume a specific run: `-resume [run-name]`. Use the `nextflow log` command to show previous run names.
#### Custom resource requests
Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with an error code of `143` (exceeded requested resources) it will automatically resubmit with higher requests (2 x original, then 3 x original). If it still fails after three times then the pipeline is stopped.
Whilst these default requirements will hopefully work for most people with most data, you may find that you want to customise the compute resources that the pipeline requests. You can do this by creating a custom config file. For example, to give the workflow process `star` 32GB of memory, you could use the following config:
```nextflow
process {
withName: star {
memory = 32.GB
}
}
```
See the main [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for more information.
### Running in the background
Nextflow handles job submissions and supervises the running jobs. The Nextflow process must run until the pipeline is finished.
The Nextflow `-bg` flag launches Nextflow in the background, detached from your terminal so that the workflow does not stop if you log out of your session. The logs are saved to a file.
Alternatively, you can use `screen` / `tmux` or similar tool to create a detached session which you can log back into at a later time.
Some HPC setups also allow you to run nextflow within a cluster job submitted your job scheduler (from where it submits more jobs).
#### Nextflow memory requirements
In some cases, the Nextflow Java virtual machines can start to request a large amount of memory.
We recommend adding the following line to your environment to limit this (typically in `~/.bashrc` or `~./bash_profile`):