Commit 409f873f authored by Jerome Mariette's avatar Jerome Mariette
Browse files

No commit message

No commit message
parent 8babbf36
#
# NG6 - Next Generation Sequencing Information System
# Copyright (C) 2009 INRA
# Jflow - JavaScript Workflow Management System
# Copyright (C) 2012 INRA
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
......@@ -16,21 +16,17 @@
# along with this program. If not, see <http://www.gnu.org/licenses/>.
#
1. WHAT IS NG6
1. WHAT IS Jflow
NG6 is a user friendly information system able to manage large sets of sequencing data. It includes, on one hand,
a set of pipelines adapted to different input formats (sff, fastq), different sequencers (Roche 454, Illumina HiSeq,
ABI Solid) and various analysis (quality control, variation discovery, RNAseq, diversity studies, ...) and, on the
other hand, a secured web site giving access to the results. The user will be able to download raw and processed
data and browse through the analysis results statistics. The provided workflows are easy to build, modify and extend.
Jflow is ...
The system is based on makeflow and runs the analysis locally or in a cluster environment such as Sun Grid Engine.
2. DEPENDENCIES
NG6 has some dependencies :
Jflow has some dependencies :
- python 2.6 or higher
- python Mysqldb : You can download mysql-python at http://mysql-python.sourceforge.net/
- makeflow
3. INSTALLATION GUIDELINES
......
......@@ -21,22 +21,12 @@ batch_system_type = local
# add these options to all batch submit files
batch_options =
[database]
host = localhost
user = typo3
passwd = typo3
dbname = typo3
[storage]
# the Typo3 pid where to link datas
pid = 5
# where should be written the log file
log_file = /home/jmariett/scratch/work/ng6.log
log_file = /home/jmariett/scratch/work/jflow.log
# Where should the pipelines write results, should be accessible
# by all cluster nodes
work_directory = /home/jmariett/scratch/work
# where should the results be save
save_directory = /home/jmariett/scratch/save
# Where should the pipelines write temporary files, should be
# accessible by all cluster nodes
tmp_directory = /home/jmariett/scratch/tmp
......@@ -48,291 +38,4 @@ sfffile = /usr/bin/sfffile
fastqc = /usr/bin/fastqc
runAssembly = /usr/bin/runAssembly
bwa = /usr/bin/bwa
samtools = /usr/bin/samtools
[454_mids]
MID1 = ACGAGTGCGT
MID2 = ACGCTCGACA
MID3 = AGACGCACTC
MID4 = AGCACTGTAG
MID5 = ATCAGACACG
MID6 = ATATCGCGAG
MID7 = CGTGTCTCTA
MID8 = CTCGCGTGTC
MID9 = TAGTATCAGC
MID10 = TCTCTATGCG
MID11 = TGATACGTCT
MID12 = TACTGAGCTA
MID13 = CATAGTAGTG
MID14 = CGAGAGATAC
MID15 = ATACGACGTA
MID16 = TCACGTACTA
MID17 = CGTCTAGTAC
MID18 = TCTACGTAGC
MID19 = TGTACTACTC
MID20 = ACGACTACAG
MID21 = CGTAGACTAG
MID22 = TACGAGTATG
MID23 = TACTCTCGTG
MID24 = TAGAGACGAG
MID25 = TCGTCGCTCG
MID26 = ACATACGCGT
MID27 = ACGCGAGTAT
MID28 = ACTACTATGT
MID29 = ACTGTACAGT
MID30 = AGACTATACT
MID31 = AGCGTCGTCT
MID32 = AGTACGCTAT
MID33 = ATAGAGTACT
MID34 = CACGCTACGT
MID35 = CAGTAGACGT
MID36 = CGACGTGACT
MID37 = TACACACACT
MID38 = TACACGTGAT
MID39 = TACAGATCGT
MID40 = TACGCTGTCT
MID41 = TAGTGTAGAT
MID42 = TCGATCACGT
MID43 = TCGCACTAGT
MID44 = TCTAGCGACT
MID45 = TCTATACTAT
MID46 = TGACGTATGT
MID47 = TGTGAGTAGT
MID48 = ACAGTATATA
MID49 = ACGCGATCGA
MID50 = ACTAGCAGTA
MID51 = AGCTCACGTA
MID52 = AGTATACATA
MID53 = AGTCGAGAGA
MID54 = AGTGCTACGA
MID55 = CGATCGTATA
MID56 = CGCAGTACGA
MID57 = CGCGTATACA
MID58 = CGTACAGTCA
MID59 = CGTACTCAGA
MID60 = CTACGCTCTA
MID61 = CTATAGCGTA
MID62 = TACGTCATCA
MID63 = TAGTCGCATA
MID64 = TATATATACA
MID65 = TATGCTAGTA
MID66 = TCACGCGAGA
MID67 = TCGATAGTGA
MID68 = TCGCTGCGTA
MID69 = TCTGACGTCA
MID70 = TGAGTCAGTA
MID71 = TGTAGTGTGA
MID72 = TGTCACACGA
MID73 = TGTCGTCGCA
MID74 = ACACATACGC
MID75 = ACAGTCGTGC
MID76 = ACATGACGAC
MID77 = ACGACAGCTC
MID78 = ACGTCTCATC
MID79 = ACTCATCTAC
MID80 = ACTCGCGCAC
MID81 = AGAGCGTCAC
MID82 = AGCGACTAGC
MID83 = AGTAGTGATC
MID84 = AGTGACACAC
MID85 = AGTGTATGTC
MID86 = ATAGATAGAC
MID87 = ATATAGTCGC
MID88 = ATCTACTGAC
MID89 = CACGTAGATC
MID90 = CACGTGTCGC
MID91 = CATACTCTAC
MID92 = CGACACTATC
MID93 = CGAGACGCGC
MID94 = CGTATGCGAC
MID95 = CGTCGATCTC
MID96 = CTACGACTGC
MID97 = CTAGTCACTC
MID98 = CTCTACGCTC
MID99 = CTGTACATAC
MID100 = TAGACTGCAC
MID101 = TAGCGCGCGC
MID102 = TAGCTCTATC
MID103 = TATAGACATC
MID104 = TATGATACGC
MID105 = TCACTCATAC
MID106 = TCATCGAGTC
MID107 = TCGAGCTCTC
MID108 = TCGCAGACAC
MID109 = TCTGTCTCGC
MID110 = TGAGTGACGC
MID111 = TGATGTGTAC
MID112 = TGCTATAGAC
MID113 = TGCTCGCTAC
MID114 = ACGTGCAGCG
MID115 = ACTCACAGAG
MID116 = AGACTCAGCG
MID117 = AGAGAGTGTG
MID118 = AGCTATCGCG
MID119 = AGTCTGACTG
MID120 = AGTGAGCTCG
MID121 = ATAGCTCTCG
MID122 = ATCACGTGCG
MID123 = ATCGTAGCAG
MID124 = ATCGTCTGTG
MID125 = ATGTACGATG
MID126 = ATGTGTCTAG
MID127 = CACACGATAG
MID128 = CACTCGCACG
MID129 = CAGACGTCTG
MID130 = CAGTACTGCG
MID131 = CGACAGCGAG
MID132 = CGATCTGTCG
MID133 = CGCGTGCTAG
MID134 = CGCTCGAGTG
MID135 = CGTGATGACG
MID136 = CTATGTACAG
MID137 = CTCGATATAG
MID138 = CTCGCACGCG
MID139 = CTGCGTCACG
MID140 = CTGTGCGTCG
MID141 = TAGCATACTG
MID142 = TATACATGTG
MID143 = TATCACTCAG
MID144 = TATCTGATAG
MID145 = TCGTGACATG
MID146 = TCTGATCGAG
MID147 = TGACATCTCG
MID148 = TGAGCTAGAG
MID149 = TGATAGAGCG
MID150 = TGCGTGTGCG
MID151 = TGCTAGTCAG
MID152 = TGTATCACAG
MID153 = TGTGCGCGTG
RL1 = ACACGACGACT,AGTCGTGGTGT
RL2 = ACACGTAGTAT,ATACTAGGTGT
RL3 = ACACTACTCGT,ACGAGTGGTGT
RL4 = ACGACACGTAT,ATACGTGGCGT
RL5 = ACGAGTAGACT,AGTCTACGCGT
RL6 = ACGCGTCTAGT,ACTAGAGGCGT
RL7 = ACGTACACACT,AGTGTGTGCGT
RL8 = ACGTACTGTGT,ACACAGTGCGT
RL9 = ACGTAGATCGT,ACGATCTGCGT
RL10 = ACTACGTCTCT,AGAGACGGAGT
RL11 = ACTATACGAGT,ACTCGTAGAGT
RL12 = ACTCGCGTCGT,ACGACGGGAGT
RL13 = AGACTCGACGT,ACGTCGGGTCT
RL14 = AGTACGAGAGT,ACTCTCGGACT
RL15 = AGTACTACTAT,ATAGTAGGACT
RL16 = AGTAGACGTCT,AGACGTCGACT
RL17 = AGTCGTACACT,AGTGTAGGACT
RL18 = AGTGTAGTAGT,ACTACTAGACT
RL19 = ATAGTATACGT,ACGTATAGTAT
RL20 = CAGTACGTACT,AGTACGTGCTG
RL21 = CGACGACGCGT,ACGCGTGGTCG
RL22 = CGACGAGTACT,AGTACTGGTCG
RL23 = CGATACTACGT,ACGTAGTGTCG
RL24 = CGTACGTCGAT,ATCGACGGACG
RL25 = CTACTCGTAGT,ACTACGGGTAG
RL26 = GTACAGTACGT,ACGTACGGTAC
RL27 = GTCGTACGTAT,ATACGTAGGAC
RL28 = GTGTACGACGT,ACGTCGTGCAC
RL29 = ACACAGTGAGT,ACTCACGGTGT
RL30 = ACACTCATACT,AGTATGGGTGT
RL31 = ACAGACAGCGT,ACGCTGTGTGT
RL32 = ACAGACTATAT,ATATAGTGTGT
RL33 = ACAGAGACTCT,AGAGTCTGTGT
RL34 = ACAGCTCGTGT,ACACGAGGTGT
RL35 = ACAGTGTCGAT,ATCGACAGTGT
RL36 = ACGAGCGCGCT,AGCGCGCGCGT
RL37 = ACGATGAGTGT,ACACTCAGCGT
RL38 = ACGCGAGAGAT,ATCTCTGGCGT
RL39 = ACGCTCTCTCT,AGAGAGGGCGT
RL40 = ACGTCGCTGAT,ATCAGCGGCGT
RL41 = ACGTCTAGCAT,ATGCTAGGCGT
RL42 = ACTAGTGATAT,ATATCACGAGT
RL43 = ACTCACACTGT,ACAGTGGGAGT
RL44 = ACTCACTAGCT,AGCTAGGGAGT
RL45 = ACTCTATATAT,ATATATGGAGT
RL46 = ACTGATCTCGT,ACGAGATGAGT
RL47 = ACTGCTGTACT,AGTACAGGAGT
RL48 = ACTGTAGCGCT,AGCGCTAGAGT
RL49 = AGACACTCACT,AGTGAGGGTCT
RL50 = AGACATATAGT,ACTATAGGTCT
RL51 = AGACGTGATCT,AGATCAGGTCT
RL52 = AGAGTACAGAT,ATCTGTAGTCT
RL53 = AGAGTATCTCT,AGAGATAGTCT
RL54 = AGATACGCTGT,ACAGCGTGTCT
RL55 = AGATCTAGTCT,AGACTAGGTCT
RL56 = AGCAGCGTAGT,ACTACGCGGCT
RL57 = AGCGCACGAGT,ACTCGTGGGCT
RL58 = AGCGTGTGCGT,ACGCACAGGCT
RL59 = AGCTAGATACT,AGTATCTGGCT
RL60 = AGCTGTCGACT,AGTCGACGGCT
RL61 = AGTATGCACGT,ACGTGCAGACT
RL62 = AGTCGCGCTAT,ATAGCGGGACT
RL63 = AGTCTGTCTGT,ACAGACGGACT
RL64 = ATACACACGAT,ATCGTGGGTAT
RL65 = ATACGCGTGCT,AGCACGGGTAT
RL66 = ATACTAGCACT,AGTGCTGGTAT
RL67 = ATAGAGCTAGT,ACTAGCTGTAT
RL68 = ATATAGAGTAT,ATACTCTGTAT
RL69 = ATCGCTCACGT,ACGTGAGGGAT
RL70 = ATCGTCAGTCT,AGACTGAGGAT
RL71 = ATCTCTCGTAT,ATACGAGGGAT
RL72 = ATCTGAGACGT,ACGTCTCGGAT
RL73 = ATGCTACGTCT,AGACGTGGCAT
RL74 = ATGTGACTACT,AGTAGTCGCAT
RL75 = CACGAGACAGT,ACTGTCTGGTG
RL76 = CACGCGAGTCT,AGACTCGGGTG
RL77 = CACGCTACGAT,ATCGTAGGGTG
RL78 = CACGTGTATAT,ATATACAGGTG
RL79 = CACTACGATGT,ACATCGTGGTG
RL80 = CACTATACTCT,AGAGTATGGTG
RL81 = CAGCGTACTGT,ACAGTAGGCTG
RL82 = CAGTCTCTAGT,ACTAGAGGCTG
RL83 = CATAGTCGCGT,ACGCGACGATG
RL84 = CGAGACACTAT,ATAGTGTGTCG
RL85 = CGAGAGTGTGT,ACACACTGTCG
RL86 = CGAGTCATCGT,ACGATGAGTCG
RL87 = CGATCGTATAT,ATATACGGTCG
RL88 = CGCAGTACGCT,AGCGTACGGCG
RL89 = CGCGATCGTAT,ATACGATGGCG
RL90 = CGCGCTATACT,AGTATAGGGCG
RL91 = CGTACAGATAT,ATATCTGGACG
RL92 = CGTAGCTCTCT,AGAGAGCGACG
RL93 = CGTATAGTGCT,AGCACTAGACG
RL94 = CGTCAGCGACT,AGTCGCGGACG
RL95 = CGTCGCAGTGT,ACACTGGGACG
RL96 = CGTCTCACGAT,ATCGTGGGACG
RL97 = CGTGACTCAGT,ACTGAGTGACG
RL98 = CTACACGCTCT,AGAGCGGGTAG
RL99 = CTACGATATGT,ACATATGGTAG
RL100 = CTAGACAGACT,AGTCTGTGTAG
RL101 = CTAGTACTCAT,ATGAGTAGTAG
RL102 = CTATATGTCGT,ACGACATGTAG
RL103 = CTATCGACACT,AGTGTCGGTAG
RL104 = CTATGTAGAGT,ACTCTACGTAG
RL105 = CTCACGTACAT,ATGTACGGGAG
RL106 = CTCGAGTCTCT,AGAGACTGGAG
RL107 = CTCGTCGAGAT,ATCTCGAGGAG
RL108 = CTCTACAGCGT,ACGCTGTGGAG
RL109 = CTGTCGTGCGT,ACGCACGGCAG
RL110 = CTGTGACGTGT,ACACGTCGCAG
RL111 = GACGCTGTCGT,ACGACAGGGTC
RL112 = GACGTATGACT,AGTCATAGGTC
RL113 = GACTAGCTAGT,ACTAGCTGGTC
RL114 = GAGACGTCGCT,AGCGACGGCTC
RL115 = GAGAGAGACGT,ACGTCTCGCTC
RL116 = GCGTAGACTAT,ATAGTCTGCGC
RL117 = GCGTCGTGTCT,AGACACGGCGC
RL118 = GCTCTCTACGT,ACGTAGGGAGC
RL119 = GTACACTGTAT,ATACAGGGTAC
RL120 = GTACGCGACAT,ATGTCGGGTAC
RL121 = GTACTATAGAT,ATCTATGGTAC
RL122 = GTACTGAGTCT,AGACTCGGTAC
RL123 = GTAGCTAGCGT,ACGCTAGGTAC
RL124 = GTAGTCACTGT,ACAGTGAGTAC
RL125 = GTAGTGTCACT,AGTGACAGTAC
RL126 = GTATACATAGT,ACTATGTGTAC
RL127 = GTCATCGTCGT,ACGACGAGGAC
RL128 = GTCGACACGCT,AGCGTGTGGAC
RL129 = GTCGAGTGAGT,ACTCACTGGAC
RL130 = GTCTACTATCT,AGATAGTGGAC
RL131 = GTGTCTAGACT,AGTCTAGGCAC
RL132 = GTGTGTATCGT,ACGATACGCAC
\ No newline at end of file
samtools = /usr/bin/samtools
\ No newline at end of file
#
# Copyright (C) 2012 INRA
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
#
import os
from jflow.component import Component
from jflow.iotypes import OutputFile, OutputFileList, InputFile, InputFileList, Formats
from weaver.function import ShellFunction
from weaver.abstraction import Map
from jflow.abstraction import MultiMap
class BWA (Component):
def define_parameters(self, reference_genome, read1, read2=None, algorithm="aln"):
self.read1 = InputFileList(read1, Formats.FASTQ)
self.read2 = None
if algorithm == "aln":
self.sai1 = OutputFileList(self.get_outputs('{basename_woext}.sai', self.read1))
else:
self.sai1 = None
if read2:
self.read2 = InputFileList(read2, Formats.FASTQ)
if algorithm == "aln":
self.sai2 = OutputFileList(self.get_outputs('{basename_woext}.sai', self.read2))
else:
self.sai2 = None
self.bam_files = OutputFileList(self.get_outputs('{basename_woext}.bam', [self.read1, self.read2]))
else:
self.sai2 = None
self.bam_files = OutputFileList(self.get_outputs('{basename_woext}.bam', self.read1))
self.algorithm = algorithm
self.reference_genome = reference_genome
self.stderr = os.path.join(self.output_directory, 'bwa.stderr')
def process(self):
if self.algorithm=="bwasw":
if self.read2:
bwa = ShellFunction(self.get_exec_path("bwa") + " " + self.algorithm + " " + self.reference_genome + \
" $1 $2 2>> " + self.stderr + " | " + self.get_exec_path("samtools") + " view -bS - > $3 2>> " + self.stderr, cmd_format='{EXE} {IN} {OUT}')
bwasw = MultiMap(bwa, inputs=[self.read1, self.read2], outputs=self.bam_files)
else:
bwa = ShellFunction(self.get_exec_path("bwa") + " " + self.algorithm + " " + self.reference_genome + \
" $1 2>> " + self.stderr + " | " + self.get_exec_path("samtools") + " view -bS - > $2 2>> " + self.stderr, cmd_format='{EXE} {IN} {OUT}')
bwasw = Map(bwa, self.read1, self.bam_files)
else:
reads, sais = [], []
reads.extend(self.read1)
sais.extend(self.sai1)
bwa = ShellFunction(self.get_exec_path("bwa") + " " + self.algorithm + " " + self.reference_genome + \
" $1 > $2 2>> " + self.stderr, cmd_format='{EXE} {IN} {OUT}')
if self.read2:
reads.extend(self.read2)
sais.extend(self.sai2)
bwa_aln = Map(bwa, inputs=reads, outputs=sais)
bwasampe = ShellFunction(self.get_exec_path("bwa") + " sampe " + self.reference_genome + \
" $1 $2 $3 $4 2>> " + self.stderr + " | " + self.get_exec_path("samtools") + " view -bS - > $5 2>> " + self.stderr, cmd_format='{EXE} {IN} {OUT}')
bwasampe = MultiMap(bwasampe, inputs=[self.sai1, self.sai2, self.read1, self.read2], outputs=self.bam_files)
else:
bwa_aln = Map(bwa, inputs=reads, outputs=sais)
bwasamse = ShellFunction(self.get_exec_path("bwa") + " samse " + self.reference_genome + \
" $1 $2 2>> " + self.stderr + " | " + self.get_exec_path("samtools") + " view -bS - > $3 2>> " + self.stderr, cmd_format='{EXE} {IN} {OUT}')
bwasamse = MultiMap(bwasamse, inputs=[self.sai1, self.read1], outputs=self.bam_files)
......@@ -164,9 +164,6 @@ class Workflow(threading.Thread):
"""
Only require for Threading
"""
self.execute()
def execute(self):
self.start_time = time.time()
self.status = self.STATUS_STARTED
self.end_time = None
......
......@@ -24,4 +24,7 @@ class Alignment (Workflow):
"""
Run the workflow
"""
pass
# index the reference genome
bwaindex = self.add_component("BWAIndex", [self.args["reference_genome"]])
# align reads against indexed genome
bwa = self.add_component("BWA", [bwaindex.databank, self.args["read_1"], self.args["read_2"]])
\ No newline at end of file
......@@ -17,7 +17,7 @@
[global]
name = alignment
description = add a brand new project
description = align reads against a reference genome
#
# Parameter section
......@@ -32,20 +32,20 @@ description = add a brand new project
# .action [store]: the basic type of action to be taken when this argument is encountered at the command line.
#
[parameters]
name.name = project_name
name.flag = --name
name.help = Give a name to your project (has to be unique)
name.required = True
read_1.name = read_1
read_1.flag = --read-1
read_1.help = Which read1 files should be used
read_1.required = True
read_1.action = append
description.name = project_description
description.flag = --description
description.help = Give a description to your project
description.required = True
read_2.name = read_2
read_2.flag = --read-2
read_2.help = Which read2 files should be used (if single end, leave empty)
read_2.action = append
admin_login.name = admin_login
admin_login.flag = --admin-login
admin_login.help = Who is the project administrator
admin_login.required = True
reference_genome.name = reference_genome
reference_genome.flag = --reference-genome
reference_genome.help = Which genome should the read being align on
#
# Bellow workflow specifc sections
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment