Commit b2c20440 authored by Jerome Mariette's avatar Jerome Mariette
Browse files

new version of jflow article

parent eb42a80b
......@@ -44,41 +44,38 @@ documentation, quickstart and a running test portal.
\section{Introduction}
While improving NG6 (Mariette et al., 2012), an integrated high throughput
generation sequencing storage environment, we sought a technical solution
enabling users to load their own data. The solution had to be easily integrated
in the existing web interface and able to process large datasets using an HPC
cluster. It was also desirable to allow developers to add software modules and
organize them into workflows.
Building rich web environments aimed at helping scientists analyse their data is
a common trend in bioinformatics. Specialized web portals such as MG-RAST (Meyer
et al., 2008), MetaVir (Roux et al., 2011) or NG6 (Mariette et al., 2012) provide
multiple services and analysis tools in an integrated manner for specific
experiments or data types. These applications requires WMS features to manage
and execute their computational pipelines.
Generic WMS, such as Galaxy (Goecks et al., 2010), provide a user friendly
graphical interface making easier workflow creation and execution, hiding the
complexity of tool installation and tuning. Today, it is probably the most used
WMS due to its intuitiveness and large software package collection.
Unfortunatly, using such environments does not enable an easy integration within
already existing web interfaces.
Other tools such as weaver (Bui et al., 2012) and Cosmos (Gafni et al., 2014)
provide a framework or a domain-specific language. These software packages
offer the flexibility and power of a high level programming language to
developers wanting to build and run workflows. Using one of these
solutions and adding lacking features such as a user interface, component
and workflow definition or even file format checking seemed relevant.
Providing users with a simple interface to load and process their datasets is
now a common need. Specialized web portals such as MG-RAST (Meyer et al., 2008)
or MetaVir (Roux et al., 2011) provide multiple services and analysis tools in
an integrated manner for specific experiments or data type. These
applications usually hide the processing steps in back-office and implement
their own way to manage job executions. Using Jflow, developers of such tools
could easily integrate WMS features in their applications.
graphical interface easing workflow creation and execution. Today, Galaxy is
probably the most used WMS by the biologist community thanks to its
intuitiveness and large software package collection. Unfortunately, such
environments come with their own interface, complicating their integration
within already existing web interfaces. Other tools such as weaver (Bui et al.,
2012) or Cosmos (Gafni et al., 2014) provide a framework or a domain-specific
language to developers wanting to build and run workflows. These software
packages offer the flexibility and power of a high level programming language.
But they do not provide a user interface, enable component and workflow
definition or check file formats.
To our knowledge, Jflow, presented in this article, is the only WMS designed to
be embedded in any web applications, thanks to its organization as five Jquery
(http://jquery.com/) plugins. The remaining of the paper is organized as follows:
in Section~\ref{Methods}, Jflow web plugins and its python API are described.
Section~\ref{Example}, presents a concrete integration. Conclusions and further
work are discussed in the last section.
\section{Methods}
Jflow is, to our knowledge, the only WMS designed to be embeded in any web
applications. Its originality relies on five Jquery plugins (http://jquery.com/)
providing user oriented views.
Jflow user interface comes with five Jquery plugins providing user oriented
views.
\begin{itemize}
\item \textit{availablewf} lists all runnable workflows accessible to users,
\item \textit{activewf} monitors all started, completed, failed, aborted and
......@@ -86,69 +83,66 @@ providing user oriented views.
\item \textit{wfform} presents workflow editable parameters in a form,
\item \textit{wfoutputs} displays all outputs produced by the workflow
organized per component,
\item \textit{wfstatus} shows as a list or an execution graph, the workflow
execution state. The graph visualisation uses the Cytoscape web (Lopes et al.,
2010) JavaScript plugin.
\item \textit{wfstatus} shows the workflow execution state as a list or an
execution graph. The graph visualisation uses the Cytoscape web JavaScript
plugin (Lopes et al., 2010).
\end{itemize}
The plugins give access to multiple communication methods and events and
communicate with the server side by requesting the Jflow REST API running under
a cherrypy (http://www.cherrypy.org/) web server. The provided server uses the
JSONP communication technique enabling cross-domain requests.
The plugins give access to multiple communication methods and events. They
interact with the server side through Jflow's REST API, running under a cherrypy
(http://www.cherrypy.org/) web server. The provided server uses the JSONP
communication technique enabling cross-domain requests.
To be available from the different Jquery plugins, the workflows have to be
implemented using the Jflow API which includes classes and functions to build
components and workflows. A Jflow component is in charge of a command line
execution. A workflow chains several components. Adding a component to the
system requires to write a Python \textit{Component} subclass. In Jflow,
different solutions are available to ease component creation. To wrap a single
command line, the developer can give a position or a flag for each parameter.
Jflow also embeds an XML parser which allows it to run geniune Mobyle (Neron et
al., 2009) components. Finally, to allow developpers to integrate components
from other WMS, Jflow provides a class skeleton, in which only the parsing step
has to be implemented. In the same way, a Jflow workflow is built from a
\textit{Workflow} subclass. Components are added as variables and chained
linking outputs and inputs.
implemented using the Jflow API. A Jflow component is in charge of an execution
step. Adding a component to the system requires to write a Python
\textit{Component} subclass. In Jflow, different solutions are available to ease
component creation. To wrap a single command line, the developer can give a
position or a flag for each parameter. Jflow also embeds an XML parser which
allows it to run geniune Mobyle (Neron et al., 2009) components. Finally, to
allow developpers to integrate components from other WMS, Jflow provides a class
skeleton. This class only requires to implement the parsing step. A workflow
chains several components. A Jflow workflow is built as a \textit{Workflow}
subclass. Components are added to the workflow as variables and chained linking
outputs and inputs.
To define the parameters presented to the final user, Jflow gives access to
different class methods. Each parameter has at least a name, a user help text
and a data type. For type parameters such as files or directories, it is
possible to set required file format, size limitation and location. Jflow
handles server side files with regular expressions, but also URL files and
client side files, in which case, it automatically uploads them. Before running
the workflow, Jflow checks data type compliance for every parameter. To manage
job submission, status checking and error handling, it relies on Makeflow
(Albrecht et al., 2012) and weaver (Bui et al., 2012). It benefits from the
error recovery feature and the support of most distributed resource management
systems (Condor, SGE, Work Queue or a single multicore machine, \ldots) provided
by Makeflow.
and a data type. For files or directories parameters, it is possible to set
required file format, size limitation and location. Jflow handles server side
files with regular expressions, but also URLs and client side files, in which
case, it automatically uploads them. Before running the workflow, Jflow checks
data type compliance for every parameter. To manage job submission, status
checking and error handling, it relies on Makeflow (Albrecht et al., 2012) and
weaver (Bui et al., 2012). It benefits from the error recovery feature and the
support of most distributed resource management systems (Condor, SGE, Work Queue
or a single multicore machine, \ldots) provided by Makeflow.
\section{Example}
Jflow user interface has been designed to allow an easy integration in mashup
web applications. Hereunder, we present an integration example within NG6, wich
provides a user-friendly interface to process, store and download
high-throughput sequencing data. The environment presents sequencing runs onto a
table. From this view, the user can add new data into the system by running
workflows in charge to load the data and to process some quality check analyses.
Multiple workflows are available considering the data type and the sequencing
technology.
Workflows are listed thanks to the \textit{availablewf} plugin built within a
web applications. Hereunder, we present its integration in NG6, which provides a
user-friendly interface to process, store and download high-throughput
sequencing data. The environment displays sequencing runs as a table. From
this view, the user can add new data by running workflows in charge of loading
the data and checking its quality. Different workflows are available considering
data type and sequencing technology.
Workflows are listed by the \textit{availablewf} plugin built within a
NG6 modal box. It requests the server to get the workflows implemented by the
developer. A \textit{select.availablewf} event thrown by the
\textit{availablewf} plugin is listened and catched to generate the parameter
form using the \textit{wfform} plugin. Considering the parameter type, Jflow
adapts its display. As example, a date is displayed as a calendar, where a
boolean is represented by a checkbox.
adapts its display. For example, a date is displayed as a calendar, and a
boolean as a checkbox.
As it is dedicated to biological data, NG6 inputs are often experimental sample,
composed of reads files and metadata such as name, tissue, developpement stage.
To help providing such parameters, Jflow enables to use structured data inputs.
Such parameter sets are represented within the \textit{wfform} plugin in a
spreadsheet, allowing to copy and paste multiple lines. Iterating over a set of
samples is thus as easy as filling a spreadsheet.
Being dedicated to biological data, NG6 inputs are often experimental
samples, composed of read files and metadata such as name, tissue, developpement
stage. To help biologists to provide such information, Jflow enables to use
structured data inputs. Such parameter sets are displayed within the
\textit{wfform} plugin as spreadsheet, allowing to copy and paste multiple
lines. Iterating over a set of samples is thus as easy as filling a spreadsheet.
\begin{figure}[ht]
\centering
......@@ -166,27 +160,25 @@ samples is thus as easy as filling a spreadsheet.
\end{figure}
To monitor the running workflows, NG6 provides a table on a different page of
the application. This one uses the \textit{activewf} plugin. In the same way as
To monitor running workflows, NG6 provides a table in a specific page. The
table is filled by the \textit{activewf} plugin. In the same way as
described above, the \textit{wfstatus} is built on a modal box when a
\textit{select.activewf} event is thrown by the \textit{activewf} plugin, as
presented on Figure~\ref{fig::jflow_example}. This view shows the workflow's
execution graph where a component is represented by a node and an input / output
link by an edge.
execution graph where nodes represent components and edges links between inputs
and outputs.
NG6 was first implemented using the Ergatis (Orvis et al., 2010) WMS. It comes
with its own user interface like Galaxy. Using NG6 led to set, run and monitor
workflows from Ergatis and to browse sequencing runs and analyses from NG6. With
Jflow, all actions are available from the same interface, what is a real gain
for the user. In its new version, the environment is in production since 2013.
Jflow has been used to process xxx sequencing runs on a 5 000 cores HPC.
NG6 was first implemented using the Ergatis (Orvis et al., 2010) WMS, which had
a separate user interface. With Jflow, all actions are available from the same
application, which makes it really user friendly.
\section{Conclusion}
Jflow is a simple and efficient solution to embed WMS features within a web
application. It is, to our knowledge, the only WMS designed with that purpose.
It is already embedded in NG6 (Mariette et al., 2012) and RNAbrowse (Mariette et
al., 2014).
It is already embedded in RNAbrowse (Mariette et al., 2014) and NG6 (Mariette et
al., 2012), where Jflow has been used to process xxx sequencing runs on a 5 000
cores HPC.
\paragraph{Conflict of Interest\textcolon} none declared.
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment