Skip to content
Snippets Groups Projects

Journal of Open Source Software Paper

Merged Cedric Midoux requested to merge paper into main
1 file
+ 16
18
Compare changes
  • Side-by-side
  • Inline
+ 16
18
@@ -29,7 +29,7 @@ authors:
affiliations:
- name: Université Paris-Saclay, INRAE, PROSE, 92761, Antony, France
index: 1
- name: Université Paris-Saclay, INRAE, MaIAGE, 78350, Jouy-en-Josas, France
- name: Université Paris-Saclay, INRAE, MaIAGE, 78350, Jouy-en-Josas, France
index: 2
- name: Université Paris-Saclay, INRAE, BioinfOmics, MIGALE bioinformatics facility, 78350, Jouy-en-Josas, France
index: 3
@@ -40,50 +40,48 @@ joss-doi: 10.21105/joss.xxxxx
# Summary
The analysis of microbiome data has become a major asset for the study of microbial diversity and dynamics, whether in health [@LeChatelier2013], environmental studies [@Karimi2020], food-processing [@Chaillou2014], or environmental biotechnologies [@Poirier2016]. Due to sequencing advances, microbiome studies now require analysis and interpretation of large and high-dimensional datasets. Metabarcoding approaches in particular are based on a two-step process. First, a bioinformatics step relies on a pipeline which processes the raw sequencing reads to produce counts and taxonomic affiliations for each OTU/ASV. Second, those tables are enriched with sample metadata, visualizations as well as biostatistical analyses are performed to answer biological questions. The affordability of amplicon sequencing has led to its widespread use in microbial ecology. Therefore, there is presently a high demand for user-friendly, well-calibrated and interactive tools, empowering researchers to analyze their own data without systematically relying on bioinformaticians and biostatisticians or acquiring skills in R language.
The analysis of microbiome data has become a major asset for investigating microbial diversity and dynamics, in diverse fields, like health [@LeChatelier2013], environmental studies [@Karimi2020], food-processing [@Chaillou2014], or environmental biotechnologies [@Poirier2016]. Due to sequencing advances, microbiome studies now require analysis and interpretation of large and high-dimensional datasets. Metabarcoding approaches, in particular, are based on a two-step process. First, a bioinformatics pipeline processes raw sequencing reads, generating counts and taxonomic affiliations for each Operational Taxonomic Unit (OTU) or Amplicon Sequence Variant (ASV). Second, these tables are enriched with sample metadata for biostatistical analyses to address relevant biological questions. The affordability of amplicon sequencing has led to its widespread use in microbial ecology. Therefore, there is a growing demand for user-friendly, well-calibrated, and interactive tools, enabling researchers to analyze their data independently, alleviating the dependence on bioinformaticians, biostatisticians or the need to acquire skills in R programming.
Regarding the bioinformatic processing, many solutions are available to produce tables from reads, relying either on command-lines or on the Galaxy platform (ie: `QIIME` [@QIIME], `FROGS` [@FROGS-ITS]), or on R pipelines (ie: `DADA2` [@DADA2]). These solutions require various levels of investment from the users to master them. For the second step, several packages dedicated to the analysis and representation of microbiomes are available, such as `phyloseq` [@phyloseq], `microbiome` [@microbiome], `metacoder` [@metacoder], however their use can sometimes be complex. There are relatively few tools available for rapid, interactive analysis of microbiome data (`shiny-phyloseq` [@shiny-phyloseq] is no longer supported ; `animalcules` [@animalcules] requires local installation ; `shaman` [@shaman] is complex and requires specific skills).
We present here Easy16S, an interactive Shiny application [@shiny] to facilitate exploratory microbiome data analysis, data visualization and biostatistical analysis. This tool is intended for biologists eager to explore their data and create figures rapidly and interactively. It is easy-to-use and especially focused on the mapping of covariates of interest.
# Statement of need
We present here `Easy16S`, a R-package and interactive Shiny application [@shiny] to facilitate exploratory microbiome data analysis, data visualization and biostatistical analysis. This tool is intended for biologists eager to explore their data and create figures rapidly and interactively. It is easy-to-use and especially focused on the mapping of covariates of interest.
Here, we introduce `Easy16S`, an R-package and an interactive Shiny application [@shiny], aiming to facilitate exploratory microbiome data analysis, data visualization, and biostatistical analysis. This tool is specifically designed for biologists eager to swiftly explore their data and generate figures interactively. It is easy-to-use and especially focused on the mapping of covariates of interest.
This application is based on the use of phyloseq objects [@phyloseq]. Such objects contain a matrix of OTU or ASV abundance per sample, a data.frame of metadata and covariates associated to samples, a matrix of taxonomic affiliations for each OTU/ASV and, optionally, a phylogenetic tree.
This application is built on the utilization of phyloseq objects [@phyloseq]. These objects encompass a matrix of OTU or ASV abundance per sample, a data.frame of metadata and covariates associated to samples, a matrix of taxonomic affiliations for each OTU/ASV, and optionally, a phylogenetic tree.
# App Features
To load data, the user has three options: use a demo dataset, upload files to let the application build a phyloseq object or uplaod phyloseq object.
For data loading, users have three options: they can utilize a demo dataset, upload files to enable the application to construct a phyloseq object, or directly upload a phyloseq object.
Before analysis, uploaded data can undergo preprocessing, allowing users to refine and clean the raw data, such as filtering samples, modifying the taxonomy table, rarefying data or applying mathematical functions on the count matrix.
Before analysis, uploaded data can undergo preprocessing, allowing users to refine and clean raw data. This includes options such as sample filtering, modification of the taxonomy table, rarefaction of data, and application of mathematical functions to the count matrix.
The various exploration and analysis sections are the following :
- Key tables constituting the phyloseq object.
- Metadata visualization with `esquisse` [@esquisse].
- Metadata visualization using `esquisse` [@esquisse].
- Taxonomic composition barplot.
- Rarefaction curves.
- Abundance heatmap.
- Richness within a sample (α-diversity) : table, scatterplot and ANOVA.
- Dissimilarity between samples (β-diversity) : table, sample heatmap, samples clustering, MultiDimensional Scaling, Multivariate ANOVA.
- Richness within a sample (α-diversity): table, scatterplot, and ANOVA.
- Dissimilarity between samples (β-diversity): table, sample heatmap, sample clustering, MultiDimensional Scaling, and Multivariate ANOVA.
- Principal Component Analysis.
- Differential abundance with `DESeq2` [@DESeq2].
- Differential abundance analysis with `DESeq2` [@DESeq2].
Users can export their potentially preprocessed data for enables further analysis within R or for use in Easy16S. Tables and plots can also be easily exported. These export features enhance the usability and accessibility of both data and results, allowing users to seamlessly integrate Easy16S with their preferred analysis tools and workflows.
Users can export potentially preprocessed data for further analysis within R or for use in Easy16S. Additionally, tables and plots can be easily exported, enhancing the usability and accessibility of both data and results. This flexibility allows users to seamlessly integrate Easy16S with their preferred analysis tools and workflows.
# Use-case
We have identified three major use cases:
Three major use cases have been identified for Easy16S:
- Easy16S enables beginner users to perform their analyses in complete autonomy,
- Easy16S enables more advanced users to quickly explore the data and identify interesting patterns, before adjusting their R code to analyse them more in-depth,
- Easy16S is used during training sessions and enables users to focus on mastering biological concepts, by freeing them from small programming troubles.
- Easy16S empowers beginner users to independently conduct their analyses.
- More advanced users can utilize Easy16S to swiftly explore data, identify patterns, before adjusting their R code for a more in-depth analysis.
- During training sessions, Easy16S serves as a valuable tool, allowing users to concentrate on mastering biological concepts without being encumbered by minor programming challenges.
![Summary in Easy16S and three examples of data visualization](screenshot/image.png)
# Acknowledgements
We are grateful to the INRAE @MIGALE bioinformatics facility (MIGALE, INRAE, 2020. Migale bioinformatics Facility, doi: 10.15454/1.5572390655343293E12) for providing help, computing and storage resources. We are grateful to Chrystelle Bureau, Patrick Dabert and all testers for their tests and suggestions during Easy16S development.
We are grateful to the INRAE @MIGALE bioinformatics facility (MIGALE, INRAE, 2020. Migale bioinformatics Facility, doi: 10.15454/1.5572390655343293E12) for providing help, computing and storage resources. Special thanks to Chrystelle Bureau, Patrick Dabert, and all testers for their invaluable tests and suggestions during the development of Easy16S. We also extend our thanks to the INRAE SK8 Team for their advice and recommendations throughout the Easy16S development and deployment.
# References
Loading