Purpose

The 'Load' pipeline enable users to load his own data (alignement, annotation, variant), if you have only the fasta of yours contigs and the fastq of yours libraries you should perform the 'Process' pipeline.

Each pipelines can be launch by two way :

with a config file
with parameters

Create an instance

An instance correspond to an biomart instance with different project (species or applications).

 usage: ngspipelines_cli.py addinstance [-h] --instance-name STR [--port INT]
                                       [--mem INT] [--url STR]
                                       [--metadata STR]
 optional arguments:
  -h, --help           show this help message and exit
  --instance-name STR  Which is the name of the instance
  --port INT           HTTP deployment port [9000]
  --mem INT            Instance allocated memory in megabytes [1024]
  --url STR            HTTP public url
  --metadata STR       Which metadata should be linked to this workflow

Example :

python ./bin/ngspipelines_cli.py addinstance --instance-name myinstance --port 9090 --url http://myserver.fr

Your instance will be available at http://myserver.fr:9090

Usage

ngspipelines_cli.py load-rnaseqdenovo -h

Load a new project using a config file

All command line options (describe above) can be provided in a configuration file. Example :

python ./bin/ngspipelines_cli.py load-rnaseqdenovo @workflows/rnaseqdenovo/data/rnaseqdenovo.cfg

Load a new project with command line

Minimal command line

Here is a minimal command line (be aware that your web-server interface will be quite poor) :

Example :

 python ./bin/ngspipelines_cli.py load-rnaseqdenovo --instance-name myinstance --project-name MyProject --species "Latin Name" --species-common-name "common" --project-description "Project description" \
 --assembly file=workflows/rnaseqdenovo/data/contigs.fasta software-name=oases software-parameters="" software-version="0.2.06" comments="Transcript assembly" \
 --library library-name=brain_400 sample-name=Brain replicat=1 tissue=Brain type=pe insert-size=400 remark="100bp to 400bp insert" sequencer=HiSeq2000 files=workflows/rnaseqdenovo/data/brain_400.fastq.gz
 --alignment file=workflows/rnaseqdenovo/data/brain_400.bam software-name=bwa software-parameters="sampe" software-version="0.9" \
 --assembly-annot file=workflows/rnaseqdenovo/data/best_annotation_file.gff3 software-name=blastall software-parameters="-e 10e-10" software-version="2.2.26" comments="Best annotations against swissprot" is-best

At least one bam is required, and the bam file name must correspond to the library name. If you've done your own counting expression you can provide the matrix file with contigs names, see option --count-matrix for more details.

## General options

### --library [mandatory] You can provide fastq files which will be copied to the data directory and be available in the download page. If the fastq file is not provided, you must use the nb-sequence attribute to populate the database in order to compute the histograms presented in the user interface.

List of available attribute (with * mandatory attribute):

library-name* : [string] the internal library name, must be uniq
sample-name* : [string] sample name
replicat* : [int] replicate number
tissue : [string] tissue
dev_stage : [string] developpement stage
type* : [string] library type , available options :
se : single end
pe : paired end
ose : oriented single end
ope : oriented paired end
mp : mate pair
insert-size : [int] for paired end library you can provide the insert size
remark : [string] any comment
sequencer : [string] sequencer type
public : [int] 0 if library is private, 1 if public
accession : [string] accession number if the library has been published in SRA or ENA
database : [string] database where library is store eg : SRA, ENA (if available)
nb-sequence : [int] number of sequences in library, can be provide to avoid cputime consumption
files* : [string] fastq file path ( if paired space separate file names)

If you have several library you have to use the library option several times. Example :

 --library library-name=brain_400 sample-name=Brain replicat=1 tissue=Brain type=pe insert-size=400 remark="100bp to 400bp insert" \
 sequencer=HiSeq2000 files=workflows/rnaseqdenovo/data/brain_400.1.fastq.gz,workflows/rnaseqdenovo/data/brain_400.2.fastq.gz

--assembly [mandatory]

The assembly option is mandatory. The possible attributes are :

file* : Fasta file, can be gz.
software-name* : [string] assembly software name
software-parameters* : [string] assembly software parameters
software-version* : [string] assembly software version
comments : [string] any comments on this analysis

Example :

 --assembly file=workflows/rnaseqdenovo/data/contigs.fasta software-name=oases software-parameters="" software-version="0.2.06" comments="Transcript assembly"

--assembly-annot [mandatory]

The annotation option attributes are :

file* : contigs annotation file in GFF3 with some specials attributes.
software-name* : [string] name of the software with which the annotation has been produced
software-parameters* : [string] annotation software parameters
software-version* : [string] annotation software version
comments : [string] annotation software comments
is-best : [bool] to define if the file corresponds to the best annotation file [true|false]

If you have computed several annotations you have to use the annotation option several times. Example :

 --assembly-annot file=workflows/rnaseqdenovo/data/best_annotation_file.gff3 software-name=blastall software-parameters="-e 10e-10" \
 software-version="2.2.26" comments="Annotations against swissprot"

If you do not provide a best annotation file (one contig per line in this file, see annotation file format) then you can specify which annotation source (source column of the GFF3 file) has to be used by the pipeline in order to compute the contigs best annotation.

Example :

 --assembly-annot file=workflows/rnaseqdenovo/data/annotation_swissprot.gff3 software-name=blastall software-parameters="-e 10e-10" \
 software-version="2.2.26" comments="Annotations against swissprot" --best-annotation-source swissprot

--alignment [mandatory]

To provide alignment file (bam) and associate analysis, the user must use the option --alignment. The pipeline will sort and index the bam files. If the --count-matrix option is not provided, the expression measurment is performed.

Here is the list of attributes of this options :

file : [string] bam file, you can provide several times this attribute. The bam file name must match the library name : library_name.bam
software-name : [string] name of alignment software
software-parameters : [string] parameters of alignment software
software-version : [string] version of alignment software
comments : [string] any comments on this analysis

Example :

 --alignment file=/path/to/lib1.bam file=/path/to/lib2.bam software-name=bwa software-parameters=aln/samse software-version="0.7.2-r351" comments="Library alignment against contigs"

--go

A GO (Gene Ontology) file enables to associate GO names, evidences ... to each contig

Example :

 --go go.txt

--keyword

This file contains for each contig one line with keywords (separated by tabulation).

Example:

 --keyword keywords.txt

--count-matrix

The pipeline performs the expression measurement (see above for more information). You can skip this step if you've built your own matrix and provide it using the --count-matrix option. Contigs are in line and library count in column. First line must contain the libraries names. Example :

 --count-matrix matrix.txt

--variant

This file contains for each the variation informations of the contigs contigs : snps, insertion or deletion. The expected file format is VCF (Variant Calling Format). If the VCF file has been produced using GATK, the allelic count per library will be extracted from the VCF file.

Here is the list of attributes for this options :

file : in VCF
software-name : [string] detection software name
software-parameters : [string] detection software parameters
software-version : [string] detection software version
comments : [string] comments on analysis

Example :

 --variant file=variant.vcf software-name=GATK software-parameters="realignement/recalibration/glm BOTH" software-version="v2.4-9-g532efad"

--variant-annot

Variation annotation information has to be provided in gff3 format and includes some specific attributes. This option was design to store SNP annotations. Its usually produced by alignment versus a closely genome. The alignment position on the genome enables to extract :

distance to the exon limits
SNP position in the codon
consequence (synonyme, stop gainned ...)
amino acid modification
related gene

Here is the list of attributes of this options :

file : in GFF3
software-name : [string] detection software name
software-parameters : [string] detection software parameters
software-version : [string] detection software version
comments : [string] comments on analysis
is-best : [bool] defines if the file correspond to the best annotation file [true|false]

If you have several annotation file you have to use this option several times. Example :

 --variant-annot file=workflows/rnaseqdenovo/data/variant_best_annotation.gff software-name=tSNPannot software-parameters="-p blastall -e 10-e10 --species Danio rerio" \
 software-version="1" comments="Best annotation of snp" is-best=true

As for contig annotations, if you don't have a best annotation file you can use the option --variant-best-annotation-source. Example :

 --variant-annot file=workflows/rnaseqdenovo/data/variant_annotation.gff software-name=tSNPannot software-parameters="-p blastall -e 10-e10 --species Danio rerio" \
 software-version="1" comments="Annotation of snp" --variant-best-annotation-source "tSNPannot"

Delete a project

The deleteproject option permits to remove a project from an instance. Example :

 python ./bin/ngspipelines_cli.py deleteproject --project-name MyProject

Launch web server

Once you have loaded the data in you project you can give access to the user interface by launching the instance using the runinstance option. This will start the corresponding web-server. Example :

 python ./bin/ngspipelines_cli.py runinstance --instance-name myinstance

To stop the web-server use : Example :

 python ./bin/ngspipelines_cli.py runinstance --instance-name myinstance --command stop

Web-server connection

Once the web-server is started you will be able to access it using the URL. The URL has to include the port separated by ':' . Example :

 http://ngspipelines.toulouse.inra.fr:9000/

Admin message

Admin message

load

Purpose

Create an instance

Usage

Load a new project using a config file

Load a new project with command line

Minimal command line

--assembly [mandatory]

--assembly-annot [mandatory]

--alignment [mandatory]

--go

--keyword

--count-matrix

--variant

--variant-annot

Delete a project

Launch web server

Web-server connection