Commit 7bf31003 authored by Celine Noirot's avatar Celine Noirot
Browse files

WIP : Add workflow

parent ac123a43
Pipeline #37594 passed with stages
in 6 minutes and 18 seconds
# get-nextflow-ngl-bi/template: Changelog
## v1.0dev - [date]
Initial release of get-nextflow-ngl-bi/template, created with the [nf-core](http://nf-co.re/) template and customized
FROM nfcore/base:1.7
LABEL authors="Céline Noirot" \
description="Docker image containing all requirements for get/template pipeline"
COPY environment.yml /
RUN conda env create -f /environment.yml && conda clean -a
ENV PATH /opt/conda/envs/GeT-template-1.0dev/bin:$PATH
[![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A50.32.0-brightgreen.svg)](https://www.nextflow.io/)
[![pipeline status](https://forgemia.inra.fr/get-nextflow-ngl-bi/template-nf/badges/master/pipeline.svg)](https://forgemia.inra.fr/get-nextflow-ngl-bi/template-nf//-/commits/master)
## Introduction
The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker and singularity containers making installation trivial and results highly reproducible.
## Quick Start
i. Install [`nextflow`](https://nf-co.re/usage/installation)
ii. Install one of [`docker`](https://docs.docker.com/engine/installation/), [`singularity`](https://www.sylabs.io/guides/3.0/user-guide/) or [`conda`](https://conda.io/miniconda.html)
iii. Download the pipeline and test it on a minimal dataset with a single command
```bash
nextflow run https://forgemia.inra.fr/get-nextflow-ngl-bi/template-nf -profile test,<docker/singularity/conda>
```
singularity pull template.sif oras://registry.forgemia.inra.fr/get-nextflow-ngl-bi/template-nf/template-nf:latest
nextflow run main.nf -with-singularity template.sif
<html>
<head>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="description" content="nf-core/template: get workflow template">
<title>Get/NomWorkflow Pipeline Report</title>
</head>
<body>
<div style="font-family: Helvetica, Arial, sans-serif; padding: 30px; max-width: 800px; margin: 0 auto;">
<img src="cid:nfcorepipelinelogo">
<h1>nf-core/template v${version}</h1>
<h2>Run Name: $runName</h2>
<% if (!success){
out << """
<div style="color: #a94442; background-color: #f2dede; border-color: #ebccd1; padding: 15px; margin-bottom: 20px; border: 1px solid transparent; border-radius: 4px;">
<h4 style="margin-top:0; color: inherit;">GeT/template execution completed unsuccessfully!</h4>
<p>The exit status of the task that caused the workflow execution to fail was: <code>$exitStatus</code>.</p>
<p>The full error message was:</p>
<pre style="white-space: pre-wrap; overflow: visible; margin-bottom: 0;">${errorReport}</pre>
</div>
"""
} else {
out << """
<div style="color: #3c763d; background-color: #dff0d8; border-color: #d6e9c6; padding: 15px; margin-bottom: 20px; border: 1px solid transparent; border-radius: 4px;">
nf-core/template execution completed successfully!
</div>
"""
}
%>
<p>The workflow was completed at <strong>$dateComplete</strong> (duration: <strong>$duration</strong>)</p>
<p>The command used to launch the workflow was as follows:</p>
<pre style="white-space: pre-wrap; overflow: visible; background-color: #ededed; padding: 15px; border-radius: 4px; margin-bottom:30px;">$commandLine</pre>
<h3>Pipeline Configuration:</h3>
<table style="width:100%; max-width:100%; border-spacing: 0; border-collapse: collapse; border:0; margin-bottom: 30px;">
<tbody style="border-bottom: 1px solid #ddd;">
<% out << summary.collect{ k,v -> "<tr><th style='text-align:left; padding: 8px 0; line-height: 1.42857143; vertical-align: top; border-top: 1px solid #ddd;'>$k</th><td style='text-align:left; padding: 8px; line-height: 1.42857143; vertical-align: top; border-top: 1px solid #ddd;'><pre style='white-space: pre-wrap; overflow: visible;'>$v</pre></td></tr>" }.join("\n") %>
</tbody>
</table>
<p>GeT/template</p>
<p><a href="https://forgemia.inra.fr/get-nextflow-ngl-bi/template-nf">https://forgemia.inra.fr/get-nextflow-ngl-bi/template-nf</a></p>
</div>
</body>
</html>
----------------------------------------------------
,--./,-.
___ __ __ __ ___ /,-._.--~\\
|\\ | |__ __ / ` / \\ |__) |__ } {
| \\| | \\__, \\__/ | \\ |___ \\`-._,-`-,
`._,._,'
GeT/template v${version}
----------------------------------------------------
Run Name: $runName
<% if (success){
out << "## GeT/template execution completed successfully! ##"
} else {
out << """####################################################
## GeT/template execution completed unsuccessfully! ##
####################################################
The exit status of the task that caused the workflow execution to fail was: $exitStatus.
The full error message was:
${errorReport}
"""
} %>
The workflow was completed at $dateComplete (duration: $duration)
The command used to launch the workflow was as follows:
$commandLine
Pipeline Configuration:
-----------------------
<% out << summary.collect{ k,v -> " - $k: $v" }.join("\n") %>
--
GeT/template
https://forgemia.inra.fr/get-nextflow-ngl-bi/template-nf
report_comment: >
This report has been generated by the <a href="https://forgemia.inra.fr/get-nextflow-ngl-bi/template-nf" target="_blank">nf-core/template</a>
analysis pipeline. For information about how to interpret these results, please see the
<a href="https://forgemia.inra.fr/get-nextflow-ngl-bi/template-nf" target="_blank">documentation</a>.
report_section_order:
nf-core/template-software-versions:
order: -1000
export_plots: true
To: $email
Subject: $subject
Mime-Version: 1.0
Content-Type: multipart/related;boundary="nfcoremimeboundary"
--nfcoremimeboundary
Content-Type: text/html; charset=utf-8
$email_html
--nfcoremimeboundary
Content-Type: image/png;name="get_logo.png"
Content-Transfer-Encoding: base64
Content-ID: <nfcorepipelinelogo>
Content-Disposition: inline; filename="get_logo.png"
<% out << new File("$baseDir/assets/get_logo.png").
bytes.
encodeBase64().
toString().
tokenize( '\n' )*.
toList()*.
collate( 76 )*.
collect { it.join() }.
flatten().
join( '\n' ) %>
<%
if (mqcFile){
def mqcFileObj = new File("$mqcFile")
if (mqcFileObj.length() < mqcMaxSize){
out << """
--nfcoremimeboundary
Content-Type: text/html; name=\"multiqc_report\"
Content-Transfer-Encoding: base64
Content-ID: <mqcreport>
Content-Disposition: attachment; filename=\"${mqcFileObj.getName()}\"
${mqcFileObj.
bytes.
encodeBase64().
toString().
tokenize( '\n' )*.
toList()*.
collate( 76 )*.
collect { it.join() }.
flatten().
join( '\n' )}
"""
}}
%>
--nfcoremimeboundary--
/*
* -------------------------------------------------
* nf-core/template Nextflow base config file
* -------------------------------------------------
* A 'blank slate' config file, appropriate for general
* use on most high performace compute environments.
* Assumes that all software is installed and available
* on the PATH. Runs in `local` mode - all jobs will be
* run on the logged in environment.
*/
process {
// TODO nf-core: Check the defaults for all processes
cpus = { check_max( 1 * task.attempt, 'cpus' ) }
memory = { check_max( 7.GB * task.attempt, 'memory' ) }
time = { check_max( 4.h * task.attempt, 'time' ) }
errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'finish' }
maxRetries = 1
maxErrors = '-1'
// Process-specific resource requirements
// NOTE - Only one of the labels below are used in the fastqc process in the main script.
// If possible, it would be nice to keep the same label naming convention when
// adding in your processes.
// TODO nf-core: Customise requirements for specific processes.
// See https://www.nextflow.io/docs/latest/config.html#config-process-selectors
withLabel:process_low {
cpus = { check_max( 2 * task.attempt, 'cpus' ) }
memory = { check_max( 14.GB * task.attempt, 'memory' ) }
time = { check_max( 6.h * task.attempt, 'time' ) }
}
withLabel:process_medium {
cpus = { check_max( 6 * task.attempt, 'cpus' ) }
memory = { check_max( 42.GB * task.attempt, 'memory' ) }
time = { check_max( 8.h * task.attempt, 'time' ) }
}
withLabel:process_high {
cpus = { check_max( 12 * task.attempt, 'cpus' ) }
memory = { check_max( 84.GB * task.attempt, 'memory' ) }
time = { check_max( 10.h * task.attempt, 'time' ) }
}
withLabel:process_long {
time = { check_max( 20.h * task.attempt, 'time' ) }
}
withName:get_software_versions {
cache = false
}
}
params {
// Defaults only, expecting to be overwritten
max_memory = 12.GB
max_cpus = 8
max_time = 4.h
}
/*
* -------------------------------------------------
* Nextflow config file for Genomes paths and indexes
* -------------------------------------------------
* Defines reference genomes, using Genome paths
* Can be used by any config that customises the base
*/
params {
genomes {
'GRCh37' {
bed12 = "${params.genomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.bed"
fasta = "${params.genomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa"
gtf = "${params.genomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf"
star = "${params.genomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/"
bowtie2 = "${params.genomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/"
bwa = "${params.genomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/"
}
'GRCm38' {
bed12 = "${params.genomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.bed"
fasta = "${params.genomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/genome.fa"
gtf = "${params.genomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.gtf"
star = "${params.genomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/STARIndex/"
bowtie2 = "${params.genomes_base}/Mus_musculus/Ensembl/GRCh37/Sequence/Bowtie2Index/"
bwa = "${params.genomes_base}/Mus_musculus/Ensembl/GRCh37/Sequence/BWAIndex/"
}
}
}
/*
* -------------------------------------------------
* Nextflow config file for running tests
* -------------------------------------------------
* Defines bundled input files and everything required
* to run a fast and simple test. Use as follows:
* nextflow run nf-core/template -profile test
*/
params {
config_profile_name = 'Test profile'
config_profile_description = 'Minimal test dataset to check pipeline function'
// Limit resources so that this can run on Travis
max_cpus = 2
max_memory = 6.GB
max_time = 48.h
// Input data
// TODO nf-core: Specify the paths to your test data on nf-core/test-datasets
// TODO nf-core: Give any required params for the test so that command line flags are not needed
singleEnd = false
readPaths = [
['Testdata', ['https://github.com/nf-core/test-datasets/raw/exoseq/testdata/Testdata_R1.tiny.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/exoseq/testdata/Testdata_R2.tiny.fastq.gz']],
['SRR389222', ['https://github.com/nf-core/test-datasets/raw/methylseq/testdata/SRR389222_sub1.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/methylseq/testdata/SRR389222_sub2.fastq.gz']]
]
}
......@@ -47,6 +47,7 @@ if (params.help) {
exit 0
}
// NOTE - THIS IS NOT USED IN THIS PIPELINE, EXAMPLE ONLY
// If you want to use the channel below in a process, define the following:
// input:
......@@ -64,19 +65,35 @@ if (params.readPaths) {
.from(params.readPaths)
.map { row -> [ row[0], [ file(row[1][0], checkIfExists: true) ] ] }
.ifEmpty { exit 1, "params.readPaths was empty - no input files supplied" }
.into { read_files_fastqc; read_files_trimming }
.into { read_files_fastqc; read_files_trimming ; raw_reads_fastqc; raw_reads_assembly}
} else {
Channel
.from(params.readPaths)
.map { row -> [ row[0], [ file(row[1][0], checkIfExists: true), file(row[1][1], checkIfExists: true) ] ] }
.ifEmpty { exit 1, "params.readPaths was empty - no input files supplied" }
.into { read_files_fastqc; read_files_trimming }
.into { read_files_fastqc; read_files_trimming; raw_reads_fastqc; raw_reads_assembly}
}
} else {
Channel
.fromFilePairs( params.reads, size: params.singleEnd ? 1 : 2 )
.ifEmpty { exit 1, "Cannot find any reads matching: ${params.reads}\nNB: Path needs to be enclosed in quotes!\nIf this is single-end data, please specify --singleEnd on the command line." }
.into { read_files_fastqc; read_files_trimming }
.into { read_files_fastqc; read_files_trimming; raw_reads_fastqc; raw_reads_assembly }
}
/* In case of modular pipeline*/
params.step = "qc"
availableStepList =
[
'qc',
'assembly',
'filtering',
'binning'
]
/*Check if step exist in stepList*/
step = params.step.split(",")
for (String a_step: step) {
assert (a_step in availableStepList)
}
/*
......@@ -100,3 +117,40 @@ process fastqc {
"""
}
/*
* STEP 2 - Fake QC
*/
process qc1 {
input:
set replicateId, file(reads) from raw_reads_fastqc
output:
file("${replicateId}.qc1") into fastqc_raw_ch_for_multiqc
when: "qc" in step
script:
"""
echo "mkdir ${replicateId} ; fastqc --nogroup --quiet -o ${replicateId} --threads ${task.cpus} ${reads[0]} ${reads[1]}" > ${replicateId}.qc1
"""
}
/*
* STEP 3 - Fake assembly
*/
process assembly {
input:
set file (qc) from fastqc_raw_ch_for_multiqc
set replicateId, file(reads) from raw_reads_assembly
output:
file("${replicateId}.assembly") into assembly_ch
when: "assembly" in step
script:
"""
echo "ASSEMBLY ${replicateId} ; " > ${replicateId}.assembly
"""
}
/*
* -------------------------------------------------
* nf-core/template Nextflow config file
* -------------------------------------------------
* Default config options for all environments.
*/
// Global default params, used in configs
params {
// Workflow flags
// TODO nf-core: Specify your pipeline's command line flags
genome = false
reads = "data/*{1,2}.fastq.gz"
singleEnd = false
outdir = './results'
// Boilerplate options
name = false
multiqc_config = "$baseDir/assets/multiqc_config.yaml"
tracedir = "${params.outdir}/pipeline_info"
email = false
email_on_fail = false
maxMultiqcEmailFileSize = 25.MB
plaintext_email = false
monochrome_logs = false
help = false
config_profile_description = false
config_profile_contact = false
config_profile_url = false
}
params {
// Defaults only, expecting to be overwritten
max_memory = 20.GB
max_cpus = 4
max_time = 40.h
}
// Container slug. Stable releases should specify release tag!
// Developmental code should specify :dev
process.container = "$baseDir/img.sif"
// Load base.config by default for all pipelines
includeConfig 'conf/base.config'
profiles {
conda { process.conda = "$baseDir/environment.yml" }
debug { process.beforeScript = 'echo $HOSTNAME' }
docker { docker.enabled = true }
singularity { singularity.enabled = true }
test { includeConfig 'conf/test.config' }
}
// Avoid this error:
// WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
// Testing this in nf-core after discussion here https://github.com/nf-core/tools/pull/351, once this is established and works well, nextflow might implement this behavior as new default.
docker.runOptions = '-u \$(id -u):\$(id -g)'
// Capture exit codes from upstream processes when piping
process.shell = ['/bin/bash', '-euo', 'pipefail']
timeline {
enabled = true
file = "${params.tracedir}/execution_timeline.html"
}
trace {
enabled = true
file = "${params.tracedir}/execution_trace.txt"
fields = 'task_id,name,status,exit,realtime,%cpu,rss,script'
}
report {
enabled = true
file = "${params.tracedir}/execution_report.html"
}
dag {
enabled = true
file = "${params.tracedir}/pipeline_dag.svg"
}
manifest {
name = 'get-nextflow-ngl-bi/template-nf'
author = 'Céline Noirot'
homePage = 'https://forgemia.inra.fr/get-nextflow-ngl-bi/template-nf'
description = 'get workflow template'
mainScript = 'main.nf'
nextflowVersion = '>=0.32.0'
version = '1.0dev'
}
// Function to ensure that resource requirements don't go beyond
// a maximum limit
def check_max(obj, type) {
if (type == 'memory') {
try {
if (obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1)
return params.max_memory as nextflow.util.MemoryUnit
else
return obj
} catch (all) {
println " ### ERROR ### Max memory '${params.max_memory}' is not valid! Using default value: $obj"
return obj
}
} else if (type == 'time') {
try {
if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1)
return params.max_time as nextflow.util.Duration
else
return obj
} catch (all) {
println " ### ERROR ### Max time '${params.max_time}' is not valid! Using default value: $obj"
return obj
}
} else if (type == 'cpus') {
try {
return Math.min( obj, params.max_cpus as int )
} catch (all) {
println " ### ERROR ### Max cpus '${params.max_cpus}' is not valid! Using default value: $obj"
return obj
}
}
}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment