Commit 84b77cd3 authored by Jerome Mariette's avatar Jerome Mariette
Browse files

first revision of the article

parent b1645c3d
......@@ -25,12 +25,12 @@ Castanet-Tolosan Cedex, France.}
\section{Summary:}
Biologists produce large data sets and are in demand of rich and simple
WEB portals in which they can upload and analyse their files. Providing such
web portals in which they can upload and analyse their files. Providing such
tools requires to mask the complexity induced by the needed High Perfomance
Computing (HPC) environment. The connexion between interface and computing
infrastructure is usually specific to each portal. With jflow, we introduce
a Workflow Management System (WMS), composed of Jquery plugins which can easily be
embedded in any WEB application and a Python library providing all requested
embedded in any web application and a Python library providing all requested
features to setup, run and monitor workflows.
\section{Availability:}
......@@ -47,7 +47,7 @@ documentation, quickstart and a running test portal.
While improving NG6 (Mariette et al., 2012), an integrated high throughput
generation sequencing storage environment, we sought a technical solution
enabling users to load their own data. The solution had to be easily integrated
in the existing WEB interface and able to process large datasets using an HPC
in the existing web interface and able to process large datasets using an HPC
cluster. It was also desirable to allow developers to add software modules and
organize them into workflows.
......@@ -56,7 +56,7 @@ graphical interface making easier workflow creation and execution, hiding the
complexity of tool installation and tuning. Today, it is probably the most used
WMS due to its intuitivness and large software package collection.
Unfortunatly, using such environments does not enable an easy integration within
already existing WEB interfaces.
already existing web interfaces.
Other tools such as weaver (Bui et al., 2012) and Cosmos (Gafni et al., 2014)
provide a framework or a domain-specific language. These software packages
......@@ -66,7 +66,7 @@ solutions and adding lacking features such as a user interface, component
and workflow definition or even file format checking seemed relevant.
Providing users with a simple interface to load and process their datasets is
now a common need. Specialized WEB portals such as MG-RAST (Meyer et al., 2008)
now a common need. Specialized web portals such as MG-RAST (Meyer et al., 2008)
or MetaVir (Roux et al., 2011) provide multiple services and analysis tools in
an integrated manner for specific experiments or data type. These
applications usually hide the processing steps in back-office and implement
......@@ -74,16 +74,11 @@ their own way to manage job executions. Using jflow, developers of such tools
could easily integrate WMS features in their applications.
\section{Method}
\section{Methods}
Jflow is a software package composed of a Python library including classes and
functions to build components and workflows, and five Jquery plugins
(http://jquery.com/) providing user oriented views
(Figure~\ref{fig::jflow_architecture}).
\subsection{User interface}
Jflow user interface implements five plugins:
Jflow is, to our knowledge, the only WMS designed to be embeded in any web
applications. Its originality relies on five Jquery plugins (http://jquery.com/)
providing user oriented views.
\begin{itemize}
\item \textit{availablewf} lists all runnable workflows accessible to users,
\item \textit{activewf} monitors all started, completed, failed, aborted and
......@@ -92,86 +87,97 @@ Jflow user interface implements five plugins:
\item \textit{wfoutputs} displays all outputs produced by the workflow
organized per component,
\item \textit{wfstatus} shows as a list or an execution graph, the workflow
execution state. The graph visualisation uses the Cytoscape WEB (Lopes et al.,
execution state. The graph visualisation uses the Cytoscape web (Lopes et al.,
2010) JavaScript plugin.
\end{itemize}
Jflow user interface has been designed to allow an easy integration in mashup
WEB applications.
The plugin gives access to multiple communication methods and events. For
example, to generate the parameter form when launching a new workflow the
\textit{click} event of the \textit{availablewf} plugin is listened and catched
to present the \textit{wfform} plugin in a different location of the WEB page.
The different plugins also communicate with the server side by requesting the
jflow REST API running under a cherrypy (http://www.cherrypy.org/) WEB server.
The provided server uses the JSONP communication technique enabling cross-domain
requests.
All the previously presented features are also available from the jflow Command
Line Interface (CLI).
The plugins give access to multiple communication methods and events and
communicate with the server side by requesting the jflow REST API running under
a cherrypy (http://www.cherrypy.org/) web server. The provided server uses the
JSONP communication technique enabling cross-domain requests.
\subsection{Jflow API}
Jflow relies on Makeflow (Albrecht et al., 2012) and weaver (Bui et al., 2012)
to manage job submission, status checking and error handling. It benefits from
the error recovery feature and the support of most distributed resource
management systems (Condor, SGE, Work Queue or a single multicore machine,
\ldots) provided by Makeflow.
A jflow component is in charge of a command line execution. A workflow chains
several components. Adding a new component requires to write a Python
\textit{Component} subclass, which possesses functions such as map/reduce or
multimap in order to define the way to apply the command line pattern. In jflow,
To be available from the different Jquery plugins, the workflows have to be
implemented using the jflow API which includes classes and functions to build
components and workflows. A jflow component is in charge of a command line
execution. A workflow chains several components. Adding a component to the
system requires to write a Python \textit{Component} subclass. In jflow,
different solutions are available to ease component creation. To wrap a single
command line, the developer can give a position or a flag for each parameter.
Jflow also embeds an XML parser which allows it to run geniune Mobyle (Neron et
al., 2009) components. Finally, to allow developpers to integrate components
from other WMS, jflow provides a class skeleton, in which only the parsing step
has to be implemented.
has to be implemented. In the same way, a jflow workflow is built from a
\textit{Workflow} subclass. Components are added as variables and chained
linking outputs and inputs.
A jflow workflow is built from a \textit{Workflow} subclass. Components are
added as variables and chained linking outputs and inputs.
To define the parameters presented to the final user, jflow gives access to
different class methods. Each parameter has at least a name, a user help text
and a data type. For type parameters such as files or directories, it is
possible to set required file format, size limitation and location. Jflow
handles server side files with regular expressions, but also URL files and
client side files, in which case, it automatically uploads them. Before running
the workflow, jflow checks data type compliance for every parameter. To manage
job submission, status checking and error handling, it relies on Makeflow
(Albrecht et al., 2012) and weaver (Bui et al., 2012). It benefits from the
error recovery feature and the support of most distributed resource management
systems (Condor, SGE, Work Queue or a single multicore machine, \ldots) provided
by Makeflow.
\begin{figure}[ht]
\centering
\includegraphics[width=\linewidth]{jflow_architecture.png}
\caption{\textbf{Jflow architecture:} Workflows and Components are defined
using the jflow API. Both jflow user interfaces (CLI and WEB) use the API to give
access to users WMS features. Jflow produces a Makeflow DAG in charge to manage
jobs submission and errors handling.}
\label{fig::jflow_architecture}
\end{figure}
\subsection{Workflow inputs and Parameters}
\section{Example}
To define the parameters presented to the final user, jflow gives access to
different class methods. Each parameter has at least a name, a user help text
and a data type. Before running the workflow, jflow checks data type compliance
for every parameter. Some types have a specific display in the \textit{wfform}
plugin, such as the date type, which will be displayed as a calendar in the WEB
page.
Jflow user interface has been designed to allow an easy integration in mashup
web applications. Hereunder, we present an integration example within NG6, wich
provides a user-friendly interface to process, store and download
high-throughput sequencing data. The environment presents sequencing runs onto a
table. From this view, the user can add new data into the system by running
workflows in charge to load the data and to process some quality check analyses.
Multiple workflows are available considering the data type and the sequencing
technology.
Workflows are listed thanks to the \textit{availablewf} plugin built within a
NG6 modal box. It requests the server to get the workflows implemented by the
developer. A \textit{select.availablewf} event thrown by the
\textit{availablewf} plugin is listened and catched to generate the parameter
form using the \textit{wfform} plugin. Considering the parameter type, jflow
adapts its display. As example, a date is displayed as a calendar, where a
boolean is represented by a checkbox.
As it is dedicated to biological data, NG6 inputs are often experimental sample,
composed of reads files and metadata such as name, tissue, developpement stage.
To help providing such parameters, jflow enables to use structured data inputs.
Such parameter sets are represented within the \textit{wfform} plugin in a
spreadsheet, allowing to copy and paste multiple lines. Iterating over a set of
samples is thus as easy as filling a spreadsheet.
When adding parameters such as files or directories, it is possible to set
required file format, size limitation and location. Jflow handles server side
files with regular expressions, but also URL files and client side files, in
which case, it automatically uploads them.
\begin{figure}[ht]
\centering
\includegraphics[width=\linewidth]{jflow_example.png}
\caption{\textbf{Jflow integration:} (a) A piece of the NG6 HTML code source in
which is positioned an empty div to build the \textit{activewf} plugin and a
modal box for the \textit{wfstatus} plugin. (b) The Jquery code in charge to
build jflow plugins and manage user action. When the \textit{select.activewf}
event is thrown from \textit{activewf-div}, a function is called with two
parameters: \textit{event} and \textit{workflow}. The last parameter stores
all the workflow's information, such as its name and its id, used in this
example to update the modal box title and to build the \textit{wfstatus}
plugin. (c) The workflow status displayed as a graph.}
\label{fig::jflow_example}
\end{figure}
Extending jflow types can easily be done by implementing a Python
function testing if the input value fits the defined criteria. In the same way,
new file formats can be added to the system.
Jflow enables to use structured data inputs. For example, an experimental sample
is often composed of reads files and metadata such as name, tissue,
developpement stage. Such parameter sets are represented within the
\textit{wfform} plugin in a spreadsheet, allowing to copy and paste multiple
lines.
To monitor the running workflows, NG6 provides a table on a different page of
the application. This one uses the \textit{activewf} plugin. In the same way as
described above, the \textit{wfstatus} is built on a modal box when a
\textit{select.activewf} event is thrown by the \textit{activewf} plugin, as
presented on Figure~\ref{fig::jflow_example}. This view shows the workflow's
execution graph where a component is represented by a node and an input / output
link by an edge.
\section{Conclusion}
Jflow is a simple and efficient solution to embed WMS features within a WEB
Jflow is a simple and efficient solution to embed WMS features within a web
application. It is, to our knowledge, the only WMS designed with that purpose.
It is already embedded in NG6 (Mariette et al., 2012) and RNAbrowse (Mariette et
al., 2014).
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment