|
|
# ANOMALY Workflow tutorial.
|
|
|
|
|
|
This workflow is inpired by [DADA2 workflow](https://benjjneb.github.io/dada2/tutorial.html) and [Phyloseq analysis](https://joey711.github.io/phyloseq/index.html).
|
|
|
|
|
|
ANOMALY improvements:
|
|
|
- A versatile and customizable workflow, from fastq treatment to statistics analysis.
|
|
|
- Integrating contaminant filtering step with decontam R package.
|
|
|
- Taxonomic assignment based on IDTAXA with two databases, with validation step to keep best assignment.
|
|
|
- Integrated statistical analysis (ANOVA, permanova, non-parametric test).
|
|
|
- Multiple differential analysis methods (DESeq2, metacoder, metagenomeSeq)
|
|
|
|
|
|
|
|
|
## Clone ANOMALY repository
|
|
|
```{bash}
|
|
|
# Set ANOMALYpath
|
|
|
ANOMALYpath=/pathtoanomaly/
|
|
|
git clone git@forgemia.inra.fr:umrf/anomaly.git
|
|
|
cd /PATHTO/anomaly/test
|
|
|
```
|
|
|
|
|
|
## Tests samples
|
|
|
|
|
|
36 fastq subsamples of amplified 16S DNA extracted from cow teat skin surface are available in `./reads/` directory. There are 15 control samples for contamination filtering.
|
|
|
|
|
|
Here are a preview of samples metadatas:
|
|
|
|
|
|
| sample-id | nom | lot | temps | origin | type | extraction | temps_lot |
|
|
|
|-----------|------|------|-------|--------|--------|------------|-----------|
|
|
|
| 1181-T1 | 1181 | lot3 | T1 | trayon | sample | 2 | T1_lot3 |
|
|
|
| 1181-T3 | 1181 | lot3 | T3 | trayon | sample | 4 | T3_lot3 |
|
|
|
| 1181-T6 | 1181 | lot3 | T6 | trayon | sample | 9 | T6_lot3 |
|
|
|
| 1181-T9 | 1181 | lot3 | T9 | trayon | sample | 14 | T9_lot3 |
|
|
|
| 153-T1 | 153 | lot2 | T1 | trayon | sample | 1 | T1_lot2 |
|
|
|
| 153-T3 | 153 | lot2 | T3 | trayon | sample | 5 | T3_lot2 |
|
|
|
| 153-T6 | 153 | lot2 | T6 | trayon | sample | 9 | T6_lot2 |
|
|
|
|
|
|
|
|
|
## DADA2
|
|
|
```{bash}
|
|
|
Rscript $ANOMALYpath/dada2/dada2_process.R -a 16S -p ./reads -q TRUE -c TRUE
|
|
|
```
|
|
|
|
|
|
Main output:
|
|
|
- read_tracking.csv as below
|
|
|
|
|
|
| | input | filtered | denoisedF | denoisedR | merged | nonchim |
|
|
|
|---------|-------|----------|-----------|-----------|--------|---------|
|
|
|
| 1181-T1 | 10000 | 6644 | 6298 | 6344 | 4726 | 4245 |
|
|
|
| 1181-T3 | 10000 | 6370 | 5834 | 5784 | 4247 | 4049 |
|
|
|
| 1181-T6 | 10000 | 6581 | 6486 | 6509 | 6110 | 6089 |
|
|
|
| 1181-T9 | 10000 | 6652 | 6422 | 6402 | 5570 | 5419 |
|
|
|
| 153-T1 | 10000 | 6579 | 5880 | 6120 | 4027 | 3773 |
|
|
|
| 153-T3 | 10000 | 6630 | 6093 | 6097 | 4247 | 3975 |
|
|
|
|
|
|
- robjects.Rdata with raw otu table and representative sequences.
|
|
|
|
|
|
## Taxonomic assignment
|
|
|
|
|
|
This script uses IDTAXA function from DECIPHER package, and allows to use 2 differents databases. It keeps the best assignation on 2 criteria, resolution and confidence. The final taxonomy is validated by multiple ancestors taxa and incongruity correction step.
|
|
|
```{bash}
|
|
|
DB1=./bank/SILVA_SSU_r132_March2018.RData
|
|
|
DB2=./bank/DAIRYdb_v1.2.0_20190222_IDTAXA.RData
|
|
|
|
|
|
Rscript $ANOMALYpath/taxonomy/assign_taxo.R -r dada2_out/robjects.Rdata \
|
|
|
-o tax_out -i ${DB1},${DB2}
|
|
|
```
|
|
|
|
|
|
Main output:
|
|
|
- robjects.Rdata with taxonomy in phyloseq format.
|
|
|
- csv table with the different assignations.
|
|
|
- csv table with final assignation.
|
|
|
|
|
|
|
|
|
## Tree generation
|
|
|
Using phangorn and DECIPHER packages.
|
|
|
```{bash}
|
|
|
Rscript $ANOMALYpath/tools/generate_tree.R -r dada2_out/robjects.Rdata -o tree_out/
|
|
|
```
|
|
|
Main output:
|
|
|
- robjects.Rdata with phylogenetic tree in phyloseq format.
|
|
|
|
|
|
## Phyloseq object generation
|
|
|
```{bash}
|
|
|
Rscript $ANOMALYpath/tools/generate_phyloseq.R -r tree_out/robjects.Rdata \
|
|
|
-t dada2_out/robjects.Rdata -q tax_out/robjects.Rdata \
|
|
|
-m sample-metadata.csv
|
|
|
```
|
|
|
Main output:
|
|
|
- robjects.Rdata with phyloseq object grouping otu table, metadata, taxonomy, tree and sequences. Phyloseq object in rdata is named `data`.
|
|
|
|
|
|
## Decontamination
|
|
|
|
|
|
`decontam.r` scripts uses `decontam` R package with control samples to filter contaminants. Then it proceeds to a basic frequency (<0.005%) and prevalence in samples (2) filtering on otus.
|
|
|
|
|
|
```{bash}
|
|
|
Rscript $ANOMALYpath/filtering/decontam.r -r phyloseq/robjects.Rdata \
|
|
|
-o decontam_out/ -c type -i control -s sample \
|
|
|
-m prevalence -t 0.5 -p FALSE
|
|
|
```
|
|
|
|
|
|
Main output:
|
|
|
- robjects.Rdata with contaminant filtered phyloseq object
|
|
|
- `Exclu_out.csv` list filtered otus for each filtering step.
|
|
|
- Kronas before and after filtering.
|
|
|
|
|
|
|
|
|
## Statistic inferences
|
|
|
### Plot composition
|
|
|
Using [phyloseq-extended repository](https://github.com/mahendra-mariadassou/phyloseq-extended)
|
|
|
|
|
|
```{bash}
|
|
|
Rscript $ANOMALYpath/plot_stats/bars.R -r decontam_out/robjects.Rdata \
|
|
|
-a temps -b lot -i Family,Genus -e lot -s TRUE
|
|
|
```
|
|
|
![barplot](figures/plot_bar_Genus.png)
|
|
|
![plot composition](figures/plot_composition_lot3_Genus.png)
|
|
|
|
|
|
|
|
|
### Alpha diversity analysis
|
|
|
```{bash}
|
|
|
Rscript $ANOMALYpath/plot_stats/diversity.R -r decontam_out/robjects.Rdata \
|
|
|
-a temps -b lot -m Observed,Shannon,Simpson
|
|
|
```
|
|
|
![alpha boxplot](figures/alpha_diversity.png)
|
|
|
|
|
|
- Test ANOVA + pairwise wilcox.Test
|
|
|
|
|
|
```
|
|
|
############
|
|
|
Shannon
|
|
|
############
|
|
|
############
|
|
|
ANOVA + pairwise wilcox test
|
|
|
[1] "Shannon ~ Depth + temps + lot"
|
|
|
Df Sum Sq Mean Sq F value Pr(>F)
|
|
|
Depth 1 0.9210 0.9210 9.980 0.00368 **
|
|
|
temps 3 0.9146 0.3049 3.304 0.03407 *
|
|
|
lot 2 0.0113 0.0057 0.061 0.94056
|
|
|
Residuals 29 2.6763 0.0923
|
|
|
---
|
|
|
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
|
|
|
#############
|
|
|
post hoc LSD.test
|
|
|
$statistics
|
|
|
MSerror Df Mean CV t.value MSD
|
|
|
0.09228501 29 1.808946 16.79345 2.04523 0.2536484
|
|
|
|
|
|
$parameters
|
|
|
test p.ajusted name.t ntr alpha
|
|
|
Fisher-LSD fdr lot 3 0.05
|
|
|
|
|
|
$means
|
|
|
Shannon std r LCL UCL Min Max Q25
|
|
|
lot1 1.834951 0.4029377 12 1.655595 2.014308 0.6883945 2.196777 1.767502
|
|
|
lot2 1.825569 0.2704076 12 1.646213 2.004926 1.3592307 2.267488 1.617003
|
|
|
lot3 1.766318 0.4155738 12 1.586961 1.945674 0.6669089 2.229003 1.637089
|
|
|
Q50 Q75
|
|
|
lot1 1.973264 2.054566
|
|
|
lot2 1.836834 1.975197
|
|
|
lot3 1.788470 1.997920
|
|
|
|
|
|
$comparison
|
|
|
NULL
|
|
|
|
|
|
$groups
|
|
|
Shannon groups
|
|
|
lot1 1.834951 a
|
|
|
lot2 1.825569 a
|
|
|
lot3 1.766318 a
|
|
|
|
|
|
attr(,"class")
|
|
|
[1] "group"
|
|
|
|
|
|
##pvalues of pairwise wilcox test on Shannon
|
|
|
T1 T3 T6
|
|
|
T3 0.310 NA NA
|
|
|
T6 0.605 0.285 NA
|
|
|
T9 0.004 0.003 0.004
|
|
|
|
|
|
##pvalues of pairwise wilcox test on Shannon
|
|
|
lot1 lot2
|
|
|
lot2 0.771 NA
|
|
|
lot3 0.771 0.932
|
|
|
```
|
|
|
|
|
|
|
|
|
### Beta diversity analysis
|
|
|
```{bash}
|
|
|
Rscript $ANOMALYpath/plot_stats/diversity_beta.R -r decontam_out/robjects.Rdata \
|
|
|
-a temps -b lot
|
|
|
```
|
|
|
![global beta](figures/global_beta.png)
|
|
|
|
|
|
```
|
|
|
#####################
|
|
|
##PERMANOVA on Weighted UniFrac distances
|
|
|
#####################
|
|
|
|
|
|
Call:
|
|
|
adonis(formula = as.formula(paste("wUF.dist ~ Depth +", paste(cov1, collapse = "+"), "+", col)), data = mdata, permutations = 1000)
|
|
|
|
|
|
Permutation: free
|
|
|
Number of permutations: 1000
|
|
|
|
|
|
Terms added sequentially (first to last)
|
|
|
|
|
|
Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
|
|
|
Depth 1 0.15713 0.157128 4.3868 0.09074 0.002997 **
|
|
|
temps 3 0.51220 0.170733 4.7667 0.29580 0.000999 ***
|
|
|
lot 2 0.02350 0.011749 0.3280 0.01357 0.977023
|
|
|
Residuals 29 1.03872 0.035818 0.59988
|
|
|
Total 35 1.73155 1.00000
|
|
|
---
|
|
|
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
|
|
|
|
|
|
#####################
|
|
|
##pairwisePERMANOVA on Weighted UniFrac distances
|
|
|
#####################
|
|
|
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted
|
|
|
1 T1-lot3 vs T3-lot3 1 0.015712098 1.1899034 0.22927274 0.4 0.4888889
|
|
|
2 T1-lot3 vs T6-lot3 1 0.100176529 5.3879380 0.57392134 0.1 0.2357143
|
|
|
3 T1-lot3 vs T9-lot3 1 0.104963611 4.6977974 0.54011346 0.1 0.2357143
|
|
|
...
|
|
|
```
|
|
|
|
|
|
![lot beta](figures/lot_beta.png)
|
|
|
![temps beta](figures/temps_beta.png)
|
|
|
|
|
|
|
|
|
|
|
|
### Multiple differential analysis
|
|
|
- Generating output from analysis
|
|
|
|
|
|
```{bash}
|
|
|
# METACODER
|
|
|
Rscript $ANOMALYpath/plot_stats/metacoder.r -r decontam_out/robjects.Rdata \
|
|
|
-a temps_lot -i Genus
|
|
|
|
|
|
# DESEQ2
|
|
|
Rscript $ANOMALYpath/plot_stats/DESeq2.R -r decontam_out/robjects.Rdata \
|
|
|
-a temps_lot -i Genus -o ./deseq/
|
|
|
|
|
|
# METAGENOMESEQ
|
|
|
Rscript $ANOMALYpath/plot_stats/metagenomeSeq.R -r ./decontam_out/robjects.Rdata \
|
|
|
-a temps_lot -i Genus -o ./metagenomeSeq/
|
|
|
```
|
|
|
|
|
|
![metacoder](figures/metacoder_temps_lot_Genus.png)
|
|
|
|
|
|
|
|
|
|
|
|
- Grouping significant results
|
|
|
|
|
|
```{bash}
|
|
|
Rscript $ANOMALYpath/plot_stats/aggregate_diff.R -r ./decontam_out/robjects.Rdata \
|
|
|
-m ./metacoder/metacoder__temps_lot_Genus.csv \
|
|
|
-d ./deseq/ -g ./metagenomeSeq/ \
|
|
|
-a temps_lot -i Genus -o ./aggregate_diff/
|
|
|
```
|
|
|
![venn diagram](figures/venndiag_temps_lot_T1_lot2_vs_T9_lot2.png)
|
|
|
|
|
|
|
|
|
- Boxplot on differentially abundant taxa
|
|
|
```{bash}
|
|
|
Rscript $ANOMALYpath/plot_stats/boxplot.R -r ./decontam_out/robjects.Rdata \
|
|
|
-a temps -i Genus
|
|
|
```
|
|
|
|
|
|
![boxplot](figures/boxplot_temps_g__Pseudomonas.png)
|
|
|
|
|
|
|
|
|
## Supplementary tools
|
|
|
|
|
|
- export_to_stamp.R, to generate STAMP inputs.
|
|
|
```
|
|
|
Rscript $ANOMALYpath/tools/export_to_stamp.R -r ./decontam_out/robjects.Rdata
|
|
|
```
|
|
|
- generate_krona.R.
|
|
|
```
|
|
|
Rscript $ANOMALYpath/tools/generate_krona.R -r ./decontam_out/robjects.Rdata
|
|
|
```
|
|
|
- update_metadata.R, to update sample data of the final phyloseq object (ie new columns).
|
|
|
```
|
|
|
Rscript $ANOMALYpath/tools/update_metadata.R -r ./decontam_out/robjects.Rdata -o new_robjects.Rdata -m new_metadata.tsv
|
|
|
```
|
|
|
- phy2tsv.R, to generate tabulated ASV table with abundance, taxonomies and sequences.
|
|
|
```
|
|
|
Rscript $ANOMALYpath/tools/phy2tsv.R -r ./decontam_out/robjects.Rdata
|
|
|
```
|
|
|
- ASVenn.R, to generate venn diagram with ASV (or taxa) according to
|
|
|
factor, fixed by user. A table providing taxonomies/sequences of common or exclusive ASV is also generated.
|
|
|
```
|
|
|
Rscript $ANOMALYpath/tools/ASVenn.R -r ./decontam_out/robjects.Rdata -b lot
|
|
|
```
|
|
|
|
|
|
![boxplot](figures/lot_venndiag.png)
|
|
|
|
|
|
- csv2phyloseq.R, to generate phyloseq object from external datas (tabulated files).
|
|
|
|
|
|
- phy2cyto.R, to generate input file for Cytoscape allowing to plot bipartite network.
|
|
|
```
|
|
|
Rscript $ANOMALYpath/tools/phy2cyto.R -r ./decontam_out/robjects.Rdata -a lot
|
|
|
```
|
|
|
Example of bipartite network on real data:
|
|
|
![boxplot](figures/sif_tab1.tsv.png) |