--- title: Analyses 16s projet Rotenone Paula Perez Pardo description: | Analyses 16S author: - name: Magali Monnoye affiliation: INRAE - MICALIS date: "2021-11-16 (Last updated: `r Sys.Date()`)" bibliography: resources/biblio.bib # don't change csl: resources/biomed-central.csl # don't change output: html_document: lib_dir: site_libs self_contained: false number_sections: FALSE code_folding: "hide" toc: true toc_depth: 5 toc_float: true df_print: paged css: ['css/styles.css', 'https://use.fontawesome.com/releases/v5.0.9/css/all.css'] pdf_document: toc: yes word_document: toc: yes always_allow_html: yes --- ```{r setup, include=FALSE} knitr::opts_chunk$set(eval=FALSE, echo =FALSE, #echo=TRUE pour afficher le code dans rapport cache = FALSE, message = FALSE, warning = FALSE, cache.lazy = FALSE, fig.height = 5, fig.width = 10.5) ``` ```{r datatable global options, echo=FALSE, eval=TRUE} options(DT.options = list(pageLength = 10, scrollX = TRUE, language = list(search = 'Filter:'), dom = 'lftipB', buttons = c('copy', 'csv', 'excel'))) ``` - Espace de travail collaboratif [GitLab](https://forgemia.inra.fr/magali.monnoye/analyses_16s/202111_Rotenone) - Adresse du dépot: git@forgemia.inra.fr:magali.monnoye/analyses_16s.git - Repository sur la plateforme migale: ssh -X mmonnoye@migale.jouy.inrae.fr dans save/analyses_16S/202111_Rotenone/ ```{bash} #Pour faire des git clone / git add.....: #Via le terminal, se positionner dans le repertoire ou se trouve le depot donc ici: #~/work/Analyses_16S/202109_Utrech/ #et faire les git souhaités... #Tout ce qu'on veut pusher git add FROGS*.tsv git add FROGS*.biom git add *.html git add *affiliation.tsv #Enusite faire les commits git commit -m "fichiers tsv" #Puis faire les push git push #Si gitlab fatal: pack exceeds maximum allowed size #Pour supprimer dernier commit: git reset HEAD~ #Si gros conflits entre les git/Si impossible de faire de git pull/push (quelques pertes d'infos !!!!) git fetch --all git reset --hard origin/master #Pour supprimer les fichiers venndiagram rm -f VennDiagram*.log #Pour supprimer de force un dossier et tout ce qu'il contient rm -rf ... # Si j'ai ce message d'erreur: "The following untracked working tree files would be overwritten by merge" # Faire ceci dan le terminal du rstudio: git add * git stash git pull ``` # Rappel du contexte de l'étude Deux groupes de 10 souris ont été traitées ou non avec de la rotenone (modèle de Parkinson) # Metabarcoding analysis ## Import des séquences En premier, créer un nouveau dossier dans GitLab / Aller dans Rstudio / Faire un pull / Ouvrir Project Sur le terminal, je copie les dossiers nécéssaires pour l'export html ```{bash} #sur le terminal, je me mets dans le repertoire cd 202110_BALBc/ #puis je copie les fichiers ci dessous dans le nouveau dossier Rotenone: cp report_BALBc.Rmd ~/save/analyses_16s/202111_Rotenone/ cp gitlab-ci.yml ~/save/analyses_16s/202111_Rotenone/ cp -r resources/ ~/save/analyses_16s/202111_Rotenone/ cp -r site_libs/ ~/save/analyses_16s/202111_Rotenone/ cp -r css/ ~/save/analyses_16s/202111_Rotenone/ ``` Puis j'importe les séquences ```{bash} Import des séquences (fichiers FASTQ) sur migale : Je crée un zip de mon fichier séquences (Selection des séquences / clic droit / 7-zip / Ajouter à l'archive / Format de l'archive .zip) et j'importe avec Upload Ou si trop de séquences, copier le dossier de séquences de l'ordi sur le serveur comme ceci: Ouvrir le terminal et après C:\Users\magali> copier cette ligne: scp -r .\Documents\1-MANIPS\2021\2021-09_Lea\Sequences_BALBc\ mmonnoye@migale.jouy.inrae.fr:~/save/analyses_16s/202110_BALBc/ ``` ## Contrôle qualité Je me place dans le repertoire 202111_Rotenone/ Je lance l'outil fastqc dédié à l'analyse de la qualité des basemodules issues d'un séquençage haut-débit J'utilise aussi multiqc ```{bash} #Creation dossier FASTQC mkdir FASTQC mkdir LOGS #Je lance fastq for i in *.fastq.gz ; do echo "conda activate fastqc-0.11.8 && fastqc $i -o FASTQC && conda deactivate" >> fastqc.sh ; done qarray -cwd -V -N fastqc -o LOGS -e LOGS fastqc.sh #Je lance multiqc (-o pour specifier le dossier de destination) qsub -cwd -V -N multiqc -o LOGS -e LOGS -b y "conda activate multiqc-1.8 && multiqc FASTQC -o MULTIQC && conda deactivate" ``` Numéro du job **fastqc** sur le cluster de migale: 4045739.1-40:1 Numéro du job **multiqc** sur le cluster de migale: 4045740 Lien vers le [rapport multiqc](MULTIQC/multiqc_report.html)
En résumé, à part pour l'échantillon 2.1, les séquences sont de très bonnes qualités
```{bash} #Creer archive tar zcvf data.tar.gz *.fastq.gz #suppression des fichiers fastq car ils sont dans l'archive rm -f *.fastq.gz #suppression des fichiers zip dans FASTQC rm -f FASTQC/*.zip #Je copie l'archive data.tar.gz dans 202109_Utrech cp data.tar.gz ~/save/analyses_16s/202110_BALBc/ cp -r MULTIQC ~/save/analyses_16s/202110_BALBc/ #Suppression dossier Sequences_BALBc rm -rf Sequences_BALBc ``` ## FROGS ### Nettoyage des reads Je vais d'abord utiliser l'outil preprocess ```{bash} #Je me déplace dans 202110_BALBc et je cree fichier FROGS mkdir FROGS mkdir LOGS qsub -cwd -V -N preprocess -o LOGS -e LOGS -pe thread 8 -R y -b y "conda activate frogs-3.2.2 && preprocess.py illumina --input-archive data.tar.gz --min-amplicon-size 200 --max-amplicon-size 490 --merge-software pear --five-prim-primer ACGGGAGGCAGCAG --three-prim-primer GGATTAGATACCCTGGTA --R1-size 250 --R2-size 250 --nb-cpus 8 --output-dereplicated FROGS/preprocess.fasta --output-count FROGS/preprocess.tsv --summary FROGS/preprocess.html --log-file FROGS/preprocess.log && conda deactivate" #Pour voir avancé d'un job qstat #Pour voir infos quand job terminé qacct -j 3982109 ``` Numéro du job **preprocess** sur le cluster de migale: 4046130 Les paramètres suivants ont été choisis : |Parametre | Valeur | |----------|--------| |--min-amplicon-size | 200 | |--max-amplicon-size | 490 | |--already-contiged |Séquences non contiguées | Lien vers le [rapport preprocess](FROGS/preprocess.html) - Le nombre de séquences par échantillon |Preprocess |Min |Max | |---------|--------|--------| |Avant|27598|49717| |Après|26228|47574|
Hormis pour l'échantillon 2.1, le pourcentrage de séquences gardées est de 92 à 96 %, ce qui est très bien et normal à ce stade
### Clustering Clustering à l'aide de swarm et d=1 ```{bash} qsub -cwd -V -N clustering -o LOGS -e LOGS -pe thread 8 -R y -b y "conda activate frogs-3.2.2 && clustering.py --input-fasta FROGS/preprocess.fasta --input-count FROGS/preprocess.tsv --distance 1 --fastidious --nb-cpus 8 --log-file FROGS/clustering.log --output-biom FROGS/clustering.biom --output-fasta FROGS/clustering.fasta --output-compo FROGS/clustering_otu_compositions.tsv && conda deactivate" #Pour voir infos quand job en cours: qstat #Pour avoir infos quand job fini: qacct -j 4046530 ``` Numéro du job **clustering** sur le cluster de migale: 4046530 ### Remove Chimera On détecte les chimeres avec vsearch ```{bash} qsub -cwd -V -N chimera -o LOGS -e LOGS -pe thread 8 -R y -b y "conda activate frogs-3.2.2 && remove_chimera.py --input-fasta FROGS/clustering.fasta --input-biom FROGS/clustering.biom --non-chimera FROGS/remove_chimera.fasta --nb-cpus 8 --log-file FROGS/remove_chimera.log --out-abundance FROGS/remove_chimera.biom --summary FROGS/remove_chimera.html && conda deactivate" qacct -j 4046537 ``` Numéro du job **chimera** sur le cluster de migale: 4046537 Lien vers le [rapport remove_chimera](FROGS/remove_chimera.html) - Voici le nombre de séquences gardées et retirées |Remove Chimera|Kept|Removed |---------|--------|--------| |Clusters|48711|55712| |Abundance|536743|158730|
Le nombre de clusters gardés par échantillon varie lui de 2325 (3.6) à 3973(4.3), ceci est normal à ce stade car on retire les séquences chimériques
### Filtres sur l'abondance et la prevalence On retire les OTUs les moins abondants, limite 0,005% On retire egalement les séquences phiX J'utlise l'outil otu_filters ```{bash} qsub -cwd -V -N filters -o LOGS -e LOGS -pe thread 8 -R y -b y "conda activate frogs-3.2.2 && otu_filters.py --input-fasta FROGS/remove_chimera.fasta --input-biom FROGS/remove_chimera.biom --output-fasta FROGS/filters.fasta --nb-cpus 8 --log-file FROGS/filters.log --output-biom FROGS/filters.biom --summary FROGS/filters.html --excluded FROGS/filters_excluded.tsv --contaminant /db/frogs_databanks/contaminants/phi.fa --min-sample-presence 1 --min-abundance 0.00005 && conda deactivate" qacct -j 4046541 ``` Numéro du job **filters** sur le cluster de migale: 4046541 Lien vers le [rapport filters](FROGS/filters.html) - Voici le nombre d'OTUs et de séquences gardés et retirés |Filters|Kept|Removed | |---------|--------|--------| |OTUs|620|48091| |Abundance|473260|63483| ### Affiliation Maintenant, je vais affillier les séquences sur la database silva v138 Création fichier **affiliation.biom** pour import **affiliationExplorer** ```{bash} qsub -cwd -V -N affiliation -o LOGS -e LOGS -pe thread 8 -R y -b y "conda activate frogs-3.2.2 && affiliation_OTU.py --input-fasta FROGS/filters.fasta --input-biom FROGS/filters.biom --nb-cpus 8 --log-file FROGS/affiliation.log --output-biom FROGS/affiliation.biom --summary FROGS/affiliation.html --reference /db/frogs_databanks/assignation/silva_138_16S/silva_138_16S.fasta && conda deactivate" Job 4046545 qstat #Rapport de l'affiliation des OTUs qsub -cwd -V -N affiliations_stats -o LOGS -e LOGS -b y "conda activate frogs-3.2.2 && affiliations_stat.py --input-biom FROGS/affiliation.biom --output-file FROGS/affiliations_stats.html --log-file FROGS/affiliations_stats.log --multiple-tag blast_affiliations --tax-consensus-tag blast_taxonomy --identity-tag perc_identity --coverage-tag perc_query_coverage && conda deactivate" ``` Numéro du job **affiliation** sur le cluster de migale: 4046545 Lien vers le [rapport affiliation_OTU](FROGS/affiliation.html) Numéro du job **affiliation_stats** sur le cluster de migale: 4046552 Lien vers le [rapport affiliations_stats](FROGS/affiliations_stats.html)
La quasi totalité des échantillons ont un pourcentage d'identité >95% et un pourcentage de couverture de >99%
### Multi affiliation Création fichiers **multi_aff.tsv** pour import **affiliationExplorer** et **affiliation.tsv** qui sera la table d'abondance ```{bash} qsub -cwd -V -N biom_to_tsv -o LOGS -e LOGS -b y "conda activate frogs-3.2.2 && biom_to_tsv.py --input-biom FROGS/affiliation.biom --input-fasta FROGS/filters.fasta --output-tsv FROGS/affiliation.tsv --output-multi-affi FROGS/multi_aff.tsv --log-file FROGS/biom_to_tsv.log && conda deactivate" ``` Numéro du job **biom_to_tsv** sur le cluster de migale: 4046554 Je vais "rectifier" toutes les multis affiliations sur **affiliationExplorer** en important **multi_aff.tsv** et **affiliation.biom** * Lien pour accéder au module: [affiliationExplorer](https://shiny.migale.inrae.fr/app/affiliationexplorer) Je refais un affiliations_stats sur le biom nettoyé des multis affiliations ```{bash} #Il faut importer le fichiers "cleaned_biom-2021-06-17.biom" généré par affiliationExplorer #Affiliation stat pour comparer avant et après correction avec affiliationExplorer qsub -cwd -V -N affiliations_stats_cleaned -o LOGS -e LOGS -b y "conda activate frogs-3.2.2 && affiliations_stat.py --input-biom cleaned_biom-2021-11-17.biom --output-file FROGS/affiliations_stats_cleaned.html --log-file FROGS/affiliations_stats_cleaned.log --multiple-tag blast_affiliations --tax-consensus-tag blast_taxonomy --identity-tag perc_identity --coverage-tag perc_query_coverage && conda deactivate" ``` Numéro du job **affiliations_stats_cleaned** sur le cluster de migale: 4048514 Lien vers le [rapport affiliations_stats_cleaned](FROGS/affiliations_stats_cleaned.html)
Suite aux différents outils FROGS utilisés pour le nettoyage des séquences, l'échantillon 2.1 n'a plus que 42 séquences alors que tous les autres échantillons ont un nombre de séquences > 19000, cet échantillon devra être supprimé pour les analyses statistiques.
### Tree Je crée l'arbre phylogénique avec tree ```{bash} #Pour avoir aide sur outils tree.py conda activate frogs-3.2.2 && tree.py --help #Creation arbre phylogenique qsub -cwd -V -N tree -o LOGS -e LOGS -b y "conda activate frogs-3.2.2 && tree.py --input-sequences FROGS/filters.fasta --biom-file cleaned_biom-2021-11-17.biom --out-tree FROGS/tree.nhx --html FROGS/tree.html --log-file FROGS/tree.log && conda deactivate" ``` Numéro du job **tree** sur le cluster de migale: 4048515 Lien vers l'arbre [Tree](FROGS/tree.html) ```{bash} Copier fichier du serveur sur mon ordi: 1. Sur le terminal, je me place dans le dossier de destination des fichiers PS C:\Users\magali> cd .\Documents\1-MANIPS\2021\2021-11_Rotenone\ 2. Puis je copie cette ligne scp mmonnoye@migale.jouy.inrae.fr:~/save/analyses_16s/202111_Rotenone/data.tar.gz . ``` # Metagenomic phyloseq analysis ## Import packages et données ```{r load-packages, eval=TRUE, cache = FALSE} library(kableExtra) library(DT) library(tidyverse) library(phyloseq) library(phyloseq.extended) library(data.table) library(plotly) library(cowplot) library(vegan) ``` J'importe les données ```{r import-data, eval=TRUE} #si impossible de trouver la fonction "read.tree" library(ape) if(file.exists("frogs.data.rds")){ frogs.data <- readRDS("frogs.data.rds") }else{ frogsBiom <- "cleaned_biom-2021-11-17.biom" frogs.data.biom <- import_frogs(frogsBiom, taxMethod = "blast") metadata<-read.table("metadatasRotenone.txt",row.names=1, header=T) sample_data(frogs.data.biom)<-metadata sample_names(frogs.data.biom) <- get_variable(frogs.data.biom, "Name") frogs.data<-frogs.data.biom treefile<- read.tree("FROGS/tree.nhx") phy_tree(frogs.data) <- treefile tax_table(frogs.data) <- gsub("\"", "", tax_table(frogs.data)) tax_table(frogs.data) <- gsub("unknown.*", "Unknown", tax_table(frogs.data)) #sauvegarde de l'objet forgs.data et .rds saveRDS(frogs.data, file = "frogs.data.rds") } ``` Je dispose de l'object phyloseq suivant: ```{r data-preview, eval=TRUE, echo=TRUE} frogs.data ``` qui contient les métadonnées suivantes: ```{r metadata, eval=TRUE, echo=TRUE} sample_variables(frogs.data) ``` ## Sélection des données ```{r sample selection,fig.width=10,fig.height=4, eval=TRUE} #Je supprime les échantillons TF51_2, T026, TF50_2 et T023_2 qui ont un nombre de séquences < 5000 frogs.data <- subset_samples(frogs.data, !(sample_names(frogs.data) %in% c("2.1")) ) frogs.data ``` * Nombre d'échantillons par groupe: ```{r variable-1, eval = TRUE, echo=TRUE} table(get_variable(frogs.data, "Group")) ``` Le groupe **Vehicule** sans rotenone Le groupe **Rotenone** avec rotenone ## Visualisation de l'abondance par échantillon {.tabset} ### Abondances brutes ```{r 01-plot abondances, eval = TRUE} p<-plot_bar(frogs.data) plot(p) ``` ### Abondances niveau Phylum ```{r 02-plot abondances phylum, eval = TRUE} p <- plot_bar(frogs.data, fill = "Phylum") plot(p) ``` ## Composition des communautés ### Représentation des communautés au niveau Phylum {.tabset} * Nombre d'OTUs par Phylum: ```{r count-phylum, eval = TRUE} #Get count of phyla table<-table(phyloseq::tax_table(frogs.data)[, "Phylum"]) table ``` #### Tableaux relatives abondances Phylum * Relatives Abondances par échantillons: ```{r relative abundance-phylum, eval = TRUE} #Convert to relative abundance data_rel_abund = phyloseq::transform_sample_counts(frogs.data, function(x) 100 * x/sum(x)) #Agglomerate to phylum-level and rename data_phylum <- phyloseq::tax_glom(data_rel_abund, "Phylum") phyloseq::taxa_names(data_phylum) <- phyloseq::tax_table(data_phylum)[, "Phylum"] rel_abun<-phyloseq::otu_table(data_phylum) rel_abun %>% DT::datatable(extensions = 'Buttons', options = list(dom = 'Bfrtip', pageLength = 20, buttons = list('copy', 'csv', 'excel'))) ``` * Relatives abondances par groupes: ```{r 9-relative abundance-phylum, eval = TRUE} #Convert to relative abundance data_rel_abund = phyloseq::transform_sample_counts(frogs.data, function(x) 100 * x/sum(x)) #Agglomerate to phylum-level and rename data_phylum <- phyloseq::tax_glom(data_rel_abund, "Phylum") ps1 <- merge_samples(data_phylum, "Group") data_phylum_2 <- transform_sample_counts(ps1, function(x) 100 * x/sum(x)) phyloseq::taxa_names(data_phylum_2) <- phyloseq::tax_table(data_phylum_2)[, "Phylum"] rel_abun2<-phyloseq::otu_table(data_phylum_2) rel_abun2 %>% DT::datatable(extensions = 'Buttons', options = list(dom = 'Bfrtip', pageLength = 10, buttons = list('copy', 'csv', 'excel'))) ``` #### Plot composition Phylum ```{r 03-composition-phylum, eval = TRUE, fig.height=5, fig.width=10} correct.order <- c("Vehicule","Rotenone") sample_data(frogs.data)$Group <- factor(sample_data(frogs.data)$Group, levels = correct.order) p <- plot_composition(frogs.data, "Kingdom", "Bacteria", "Phylum",numberOfTaxa = 6, fill = "Phylum") p<- p + facet_wrap(~Group, scales = "free_x", nrow = 1) + theme(strip.text.x = element_text(size = 14, color = "black")) + scale_y_continuous(label = scales::percent) plot(p) ``` ```{r 04-composition-phylum-merged, eval = TRUE, fig.height=5, fig.width=6} frogs.data.merged<-merge_samples(frogs.data,group="Group") sample_data(frogs.data.merged)$Group <- sample_names(frogs.data.merged) correct.order <- c("Vehicule","Rotenone") sample_data(frogs.data.merged)$Group <- factor(sample_data(frogs.data.merged)$Group, levels = correct.order) p <- plot_composition(frogs.data.merged, "Kingdom", "Bacteria", "Phylum",numberOfTaxa = 6, fill = "Phylum") p <- p + facet_wrap(~Group, scales = "free_x", nrow = 1) p <- p + ggtitle("Phylum Composition (12 top Phylum)")+ scale_y_continuous(label = scales::percent) + theme(axis.text.x = element_blank()) plot(p) ``` #### Stats Phylum ##### Boxplot Phylum avec stats On va se focaliser sur les 6 Phylums et représenter leurs abondances dans les différents groupe sous forme de boxplot. Le nombre d'étoiles codent le niveau de significativité, les comparaisons non-significatives ne sont pas indiqués ```{r 05-Phylum stats,fig.width=6,fig.height=8, eval = TRUE} library(ggpubr) library(cowplot) #Select groups frogs.data_bis <- subset_samples(frogs.data, Group %in% c("Vehicule","Rotenone")) ## Select only some families families <- c("Actinobacteriota","Bacteroidota","Deferribacterota", "Desulfobacterota" ,"Firmicutes", "Proteobacteria") phy <- frogs.data_bis %>% subset_taxa(Phylum %in% families) %>% tax_glom(taxrank="Phylum") #agglomerate at family level ## Transform count to relative abundances depth <- sample_sums(frogs.data)[1] plotdata<-psmelt(phy) %>% mutate(Abundance = Abundance / depth, Group = factor(Group, labels = c("Vehicule","Rotenone" ))) p <- ggplot(plotdata,aes(x = Group,y=Abundance, color = Group, Group = Group)) + stat_boxplot(geom = "errorbar", width = 0.5) + geom_boxplot(outlier.alpha = 1, outlier.size = 0.8) + facet_wrap(~Phylum, scales = "free_y", ncol = 3) + scale_color_manual(values = c("Vehicule" = "#227432", "Rotenone" = "#8C67A9"), guide = "none") + # theme_classic() + ## fond blanc labs(x = NULL) + theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 10)) ## Compare reference level T0 to all other levels using a wilcoxon test and adjust p-values using the holm correction. p <- p + stat_compare_means(aes(label = ..p.signif..), method = "wilcox.test", p.adjust.method = "holm", ref.group = "Vehicule", hide.ns = T, label.y.npc = c(0.90), size = 7, fontface = "bold") plot(p) ``` ##### Kruskal test Test de Kruskal (Anova non paramétrique) pour chaque Phylum pour tester si les abondances sont similaires (pour ce Phylum) entre les différents groupes. ```{r data.-phylum, eval=TRUE} # depth <- sample_sums(frogs.data)[1] data.phylum <- frogs.data %>% transform_sample_counts(function(x) { x / sum(x)}) %>% ## transform counts to proportions fast_tax_glom(taxrank = "Phylum") %>% psmelt() %>% mutate(Group = factor(Group, labels = c("Vehicule","Rotenone"))) ``` ```{r test kruskal-phylum, eval=TRUE} data.test <- compare_means(Abundance ~ Group, data = data.phylum, method = "kruskal", group.by = "Phylum") data.test ``` ### Représentation des communautés au niveau Family {.tabset} * Nombre d'OTUs par Family: ```{r count-family, eval=TRUE} #Get count of phyla table(phyloseq::tax_table(frogs.data)[, "Family"]) ``` #### Tableaux relatives abondances Family * Relative Abondance par échantillon: ```{r relative abundance-phylum-family-1, eval = TRUE} library(openxlsx) #Convert to relative abundance data_rel_abund = phyloseq::transform_sample_counts(frogs.data, function(x) 100 * x/sum(x)) #Agglomerate to phylum-level and rename data_phylum <- phyloseq::tax_glom(data_rel_abund, "Family") phyloseq::taxa_names(data_phylum) <- phyloseq::tax_table(data_phylum)[, "Family"] rel_abun_3<-phyloseq::otu_table(data_phylum) rel_abun_3 %>% DT::datatable(extensions = 'Buttons', options = list(dom = 'Bfrtip', pageLength = 10, buttons = list('copy', 'csv', 'excel'))) ``` * Relative Abondance par groupe: ```{r 2.1-relative abundance-family-2, eval=TRUE} #Convert to relative abundance data_rel_abund = phyloseq::transform_sample_counts(frogs.data, function(x) 100 * x/sum(x)) #Agglomerate to phylum-level and rename data_phylum <- phyloseq::tax_glom(data_rel_abund, "Family") ps1 <- merge_samples(data_phylum, "Group") data_phylum_2 <- transform_sample_counts(ps1, function(x) 100 * x/sum(x)) phyloseq::taxa_names(data_phylum_2) <- phyloseq::tax_table(data_phylum_2)[, "Family"] rel_abun_4<-phyloseq::otu_table(data_phylum_2) rel_abun_4 %>% DT::datatable(extensions = 'Buttons', options = list(dom = 'Bfrtip', pageLength = 10, buttons = list('copy', 'csv', 'excel'))) ``` #### Plot composition Family ```{r 06-composition-phylum,fig.width=10,fig.height=5, eval=TRUE} correct.order <- c("Vehicule","Rotenone") sample_data(frogs.data)$Group <- factor(sample_data(frogs.data)$Group, levels = correct.order) p <- plot_composition(frogs.data, "Kingdom", "Bacteria", "Family",numberOfTaxa = 10, fill = "Family") p <- p + facet_wrap(~Group, scales = "free_x", nrow = 1) plot(p) ``` ```{r 07-composition-family merged,fig.width=6,fig.height=5, eval=TRUE} frogs.data.merged<-merge_samples(frogs.data,group="Group") sample_data(frogs.data.merged)$Group <- sample_names(frogs.data.merged) correct.order <- c("Vehicule","Rotenone") sample_data(frogs.data.merged)$Group <- factor(sample_data(frogs.data.merged)$Group, levels = correct.order) p <- plot_composition(frogs.data.merged, "Kingdom", "Bacteria", "Family",numberOfTaxa = 10, fill = "Family") p <- p + facet_wrap(~Group, scales = "free_x", nrow = 1) + theme(axis.text.x = element_blank()) plot(p) ``` #### Stats Family ##### Boxplot Family avec stats On va se focaliser sur les 10 Family et représenter leurs abondances dans les différents groupe sous forme de boxplot. Le nombre d'étoiles codent le niveau de significativité, les comparaisons non-significatives ne sont pas indiqués ```{r 08-Family stats,fig.width=10,fig.height=8, eval = TRUE} #Select groups frogs.data_quat <- subset_samples(frogs.data, Group %in% c("Vehicule","Rotenone")) ## Select only some families families <- c("Muribaculaceae","Marinifilaceae", "Rikenellaceae", "Tannerellaceae","Prevotellaceae" , "Bacteroidaceae", "Lachnospiraceae", "Ruminococcaceae","Anaerovoracaceae", "Peptostreptococcaceae") phy <- frogs.data_quat %>% subset_taxa(Family %in% families) %>% tax_glom(taxrank="Family") #agglomerate at family level ## Transform count to relative abundances depth <- sample_sums(frogs.data_quat)[1] plotdata<-psmelt(phy) %>% mutate(Abundance = Abundance / depth, Group = factor(Group, labels = c("Vehicule","Rotenone"))) p <- ggplot(plotdata,aes(x = Group,y=Abundance, color = Group, Group = Group)) + stat_boxplot(geom = "errorbar", width = 0.5) + geom_boxplot(outlier.alpha = 1, outlier.size = 0.8) + facet_wrap(~Family, scales = "free_y", ncol = 5) + scale_color_manual(values = c("Vehicule" = "#227432", "Rotenone" = "#8C67A9"), guide = "none") + # theme_classic() + ## fond blanc labs(x = NULL) + theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 10)) ## Compare reference level T0 to all other levels using a wilcoxon test and adjust p-values using the holm correction. p <- p + stat_compare_means(aes(label = ..p.signif..), method = "wilcox.test", p.adjust.method = "holm", ref.group = "Vehicule", hide.ns = T, label.y.npc = c(0.90), size = 7, fontface = "bold") plot(p) ``` ##### Kruskal test Test de Kruskal (Anova non paramétrique) pour chaque Family pour tester si les abondances sont similaires (pour la Family) entre les différents groupes. ```{r data.-family, eval=TRUE} # depth <- sample_sums(frogs.data)[1] data.phylum <- frogs.data %>% transform_sample_counts(function(x) { x / sum(x)}) %>% ## transform counts to proportions fast_tax_glom(taxrank = "Family") %>% psmelt() %>% mutate(Group = factor(Group, labels = c("Vehicule","Rotenone"))) ``` ```{r test kruskal-family, eval=TRUE} data.test <- compare_means(Abundance ~ Group, data = data.phylum, method = "kruskal", group.by = "Family") data.test ``` ### Zoom au niveau Family dans chacun des Phylums {.tabset} Les 6 phylums d'intérêts sont les Actinobacteriota, les Bacteroidota, les Desulfobacterota, les Fibrobacterota, les Firmicutes et les Proteobacteria. #### Actinobacteriota ```{r 09-composition-Actinobacteriota,fig.width=8,fig.height=5, eval=TRUE} correct.order <- c("Vehicule","Rotenone") sample_data(frogs.data)$Group <- factor(sample_data(frogs.data)$Group, levels = correct.order) p <- plot_composition(frogs.data, "Phylum", "Actinobacteriota", "Family",numberOfTaxa = 5, fill = "Family") p <- p + facet_wrap(~Group, scales = "free_x", nrow = 1) plot(p) ``` ```{r 10-composition-Actinobacteriota merged,fig.width=6,fig.height=5, eval=TRUE} frogs.data.merged<-merge_samples(frogs.data,group="Group") sample_data(frogs.data.merged)$Group <- sample_names(frogs.data.merged) correct.order <- c("Vehicule","Rotenone") sample_data(frogs.data.merged)$Group <- factor(sample_data(frogs.data.merged)$Group, levels = correct.order) p <- plot_composition(frogs.data.merged, "Phylum", "Actinobacteriota", "Family",numberOfTaxa = 5, fill = "Family") p <- p + facet_wrap(~Group, scales = "free_x", nrow = 1) + theme(axis.text.x = element_blank()) plot(p) ``` #### Bacteroidota ```{r 11-composition-bacteroidetes,fig.width=8,fig.height=5, eval=TRUE} p <- plot_composition(frogs.data, "Phylum", "Bacteroidota", "Family",numberOfTaxa = 5, fill = "Family") p <- p + facet_wrap(~Group, scales = "free_x", nrow = 1) plot(p) ``` ```{r 12-composition-Bacteroidota merged,fig.width=6,fig.height=5, eval=TRUE} frogs.data.merged<-merge_samples(frogs.data,group="Group") sample_data(frogs.data.merged)$Group <- sample_names(frogs.data.merged) correct.order <- c("Vehicule","Rotenone") sample_data(frogs.data.merged)$Group <- factor(sample_data(frogs.data.merged)$Group, levels = correct.order) p <- plot_composition(frogs.data.merged, "Phylum", "Bacteroidota", "Family",numberOfTaxa = 5, fill = "Family") p <- p + facet_wrap(~Group, scales = "free_x", nrow = 1) + theme(axis.text.x = element_blank()) plot(p) ``` #### Desulfobacterota ```{r 13-composition-tenericutes,fig.width=8,fig.height=5, eval=TRUE} p <- plot_composition(frogs.data, "Phylum", "Desulfobacterota", "Family",numberOfTaxa = 5, fill = "Family") p <- p + facet_wrap(~Group, scales = "free_x", nrow = 1) plot(p) ``` ```{r 14-composition-Desulfobacterota merged,fig.width=6,fig.height=5, eval=TRUE} frogs.data.merged<-merge_samples(frogs.data,group="Group") sample_data(frogs.data.merged)$Group <- sample_names(frogs.data.merged) p <- plot_composition(frogs.data.merged, "Phylum", "Desulfobacterota", "Family",numberOfTaxa = 5, fill = "Family") p <- p + facet_wrap(~Group, scales = "free_x", nrow = 1) + theme(axis.text.x = element_blank()) plot(p) ``` #### Deferribacterota ```{r 15-composition-Deferribacterota,fig.width=8,fig.height=5, eval=TRUE} p <- plot_composition(frogs.data, "Phylum", "Deferribacterota", "Family",numberOfTaxa = 5, fill = "Family") p <- p + facet_wrap(~Group, scales = "free_x", nrow = 1) plot(p) ``` ```{r 16-composition-Deferribacterota merged,fig.width=6,fig.height=5, eval=TRUE} frogs.data.merged<-merge_samples(frogs.data,group="Group") sample_data(frogs.data.merged)$Group <- sample_names(frogs.data.merged) p <- plot_composition(frogs.data.merged, "Phylum", "Deferribacterota", "Family",numberOfTaxa = 5, fill = "Family") p <- p + facet_wrap(~Group, scales = "free_x", nrow = 1) + theme(axis.text.x = element_blank()) plot(p) ``` #### Firmicutes ```{r 17-composition-firmicutes,fig.width=8,fig.height=5, eval=TRUE} p <- plot_composition(frogs.data, "Phylum", "Firmicutes", "Family",numberOfTaxa = 5, fill = "Family") p <- p + facet_wrap(~Group, scales = "free_x", nrow = 1) plot(p) ``` ```{r 18-composition-Firmicutes merged,fig.width=6,fig.height=5, eval=TRUE} frogs.data.merged<-merge_samples(frogs.data,group="Group") sample_data(frogs.data.merged)$Group <- sample_names(frogs.data.merged) p <- plot_composition(frogs.data.merged, "Phylum", "Firmicutes", "Family",numberOfTaxa = 5, fill = "Family") p <- p + facet_wrap(~Group, scales = "free_x", nrow = 1) + theme(axis.text.x = element_blank()) plot(p) ``` #### Proteobacteria ```{r 19-composition-proteobacteria,fig.width=8,fig.height=5, eval=TRUE} p <- plot_composition(frogs.data, "Phylum", "Proteobacteria", "Family",numberOfTaxa = 5, fill = "Family") p <- p + facet_wrap(~Group, scales = "free_x", nrow = 1) plot(p) ``` ```{r 20-composition-Proteobacteria merged,fig.width=6,fig.height=5, eval=TRUE} frogs.data.merged<-merge_samples(frogs.data,group="Group") sample_data(frogs.data.merged)$Group <- sample_names(frogs.data.merged) p <- plot_composition(frogs.data.merged, "Phylum", "Proteobacteria", "Family",numberOfTaxa = 5, fill = "Family") p <- p + facet_wrap(~Group, scales = "free_x", nrow = 1) + theme(axis.text.x = element_blank()) plot(p) ``` ## Diversité $\alpha$ ### Courbes de raréfaction Comme ces distances sont sensibles à la profondeur d'échantillonnage, on va raréfier les échantillons avant de calculer les distances. Normalisation par rarefaction: même nombre de séquences pour chaque échantillon. ```{r rarefaction,echo=TRUE, eval = TRUE} #On peut rajouter le filtre sur l'abondance >= 10000 #abundant.samples <- (sample_sums(frogs.data) >= 10000) #frogs.data.rare <- rarefy_even_depth(prune_samples(abundant.samples, frogs.data), rngseed = 20170317) frogs.data.rare<-rarefy_even_depth(frogs.data,rngseed=20170329) sample_sums(frogs.data.rare)[1:5] ``` Avant de calculer les diversités $\alpha$, on va faire des courbes de raréfaction pour vérifier si on a saturé la richesse sous-dominante (i.e. celle qui passe les filtres d'abondances). ```{r 21-ggrare, message=FALSE,fig.width=10,fig.height=6, eval = TRUE} if(file.exists("frogs.data.courbes.rds")){ p <- readRDS("frogs.data.courbes.rds") }else{ p <- ggrare(frogs.data, step = 100, color = "Group", plot = FALSE) saveRDS(p, file = "frogs.data.courbes.rds") } rare.level <- min(sample_sums(frogs.data)) family.palette<- c("Vehicule" = "#227432", "Rotenone" = "#8C67A9") p <- p + facet_wrap(~Group) + geom_vline(xintercept = rare.level, color = "Gray60")+ xlab("Sample Size (nb de séquences)") + ylab("Species Richness (nb d'OTU)")+ scale_color_manual(values = family.palette)+ scale_fill_manual(breaks = c("Vehicule","Rotenone"), values=c("#227432","#8C67A9")) plot(p) ``` ### Effet du régime sur la diversité alpha diversité: Diversité au sein d'un échantillon / Une valeur par échantillon ```{r 22-plot-richness,fig.width=10,fig.height=7, eval = TRUE} #Ne pas utiliser frogs.data.rare car cela altère le resultats sur la diversité p <- plot_richness(frogs.data, color = "Group",measures = c("Observed", "Chao1", "Shannon", "InvSimpson"),x = "Group") +theme_bw() + geom_boxplot(aes(fill = Group)) + geom_point() + theme(axis.text.x = element_blank())+ scale_color_manual(values = family.palette)+ scale_fill_manual(breaks = c("Vehicule","Rotenone"), values=c("#227432","#8C67A9")) plot(p) ``` ### Richness Table ```{r create-richness-table, message = FALSE, eval=TRUE} #On va tester si les différences de richess observées sont significatives en prenant soin de corriger pour les profondeurs de séquençage #inégales (on pourrait aussi raréfier et ne pas corriger) richness.table <- estimate_richness(frogs.data, measures = c("Observed", "Chao1", "Shannon", "InvSimpson")) richness.table <- cbind(richness.table, sample_data(frogs.data)) richness.table$Depth <- sample_sums(frogs.data) richness.table %>% DT::datatable(extensions = 'Buttons', options = list(dom = 'Bfrtip', pageLength = 10, buttons = list('copy', 'csv', 'excel'))) ``` ## Statistiques sur la diversité {.tabset} ### Statistiques sur Richness Observed #### Anova sur la Richness Observed Analyse de la variance, compare les moyennes d'échantillons. ```{r 33-anova, message = FALSE, eval=TRUE} div_data <- cbind(estimate_richness(frogs.data, measures = "Observed"), sample_data(frogs.data)) model <- aov(Observed ~ 0 + Group, data = div_data) anova(model) ## impact of Groupe and Time_point on richness after correction for depth / seulement possible si T0 et Tf #observed.aov <- aov(Observed ~ Depth + Groupe + Time_point, data = richness.table) #anova(observed.aov) ``` Df: nombre de degrés de liberté Sum Sq: inertie inter et intra respectivement Mean Sq: c'est la variance, obtenue en faisant le quotient de la 2eme colonne par la 1ere (moyenne des inerties) F value: obtenue en faisant quotient des 2 valeurs trouvées dans la 3eme colonne Pr(>F): pvalue #### Coefficient Richness Observed coef function: permet d'extraire que les coefficients estimés, mesure l'écart des groupes à la moyenne. Groupes équilibrés si les valeurs sont du même ordre de grandeur. ```{r 32-create-richness-table, eval=TRUE} coef(model) ``` #### Test de comparaisons multiples sur Richness Observed Utilisé pour déterminer les différences significatives entre les moyennes de groupes dans une analyse de variance. Pour étudier les différences inter-groupes, permet de distinguer parmi les échantillons s’il y en a qui diffèrent significativement des autres. ```{r 34-tukey, eval=TRUE} TukeyHSD(model) ``` diff: différences entre les moyennes obséervées lwr et upr: donnent les bornes inférieure et supérieur de l'intervalle p adj: pvalue après ajustement pour les comparaisons multiples ```{r wilcox test, eval=TRUE} # comparison_data <- compare_means( # Observed ~ Group, # data = div_data, # method = "wilcox.test" # ) %>% filter(p.adj <= 0.05) # comparison_data ``` ### Statistiques sur Richness Chao1 #### Anova sur la Richness Chao1 ```{r 35-create-richness-table, eval=TRUE} div_data <- cbind(estimate_richness(frogs.data, measures = "Chao1"), sample_data(frogs.data)) model <- aov(Chao1 ~ 0 + Group, data = div_data) anova(model) ``` #### Coefficient Chao1 ```{r 37-create-richness-table, eval=TRUE} coef(model) ``` #### Test de comparaisons multiples sur Richness Chao1 ```{r 38-create-richness-table, eval=TRUE} TukeyHSD(model) ``` ```{r wilcox test chao, eval=TRUE} # comparison_data <- compare_means( # Chao1 ~ Group, # data = div_data, # method = "wilcox.test" # ) %>% filter(p.adj <= 0.05) # comparison_data ``` ### Statistiques sur Richness shannon #### Anova sur la Richness Shannon ```{r 43-create-richness-table, eval=TRUE} div_data <- cbind(estimate_richness(frogs.data, measures = "Shannon"), sample_data(frogs.data)) model <- aov(Shannon ~ 0 + Group, data = div_data) anova(model) ``` #### Coefficient Shannon ```{r 44-create-richness-table, eval=TRUE} coef(model) ``` #### Test de comparaisons multiples sur Richness Shannon ```{r 45-create-richness-table,, eval=TRUE} TukeyHSD(model) ``` ```{r wilcox test Shannon, eval=TRUE } # comparison_data <- compare_means( # Shannon ~ Group, # data = div_data, # method = "wilcox.test" # ) %>% filter(p.adj <= 0.05) # comparison_data ``` ### Statistiques sur Richness InvSimpson #### Anova sur la Richness InvSimpson ```{r 49-create-richness-table, eval=TRUE} div_data <- cbind(estimate_richness(frogs.data, measures = "InvSimpson"), sample_data(frogs.data)) model <- aov(InvSimpson ~ 0 + Group, data = div_data) anova(model) ``` #### Coefficient InvSimpson ```{r 50-create-richness-table, eval=TRUE} coef(model) ``` #### Test de comparaisons multiples sur Richness InvSimpson ```{r 51-create-richness-table, eval=TRUE} TukeyHSD(model) ``` ```{r wilcox test InvSimpson, eval=TRUE } # comparison_data <- compare_means( # InvSimpson ~ Group, # data = div_data, # method = "wilcox.test" # ) %>% filter(p.adj <= 0.05) # comparison_data ``` ## Diversité $\beta$ Diversité entre les communautés, comparaison des échantillons entre eux / les communautés entre elles. ### Calcul des distances ```{r 23-dist-as-heatmap,fig.width=12,fig.height=11, eval=TRUE} SampleOrder <- levels(reorder(sample_names(frogs.data.rare), as.numeric(get_variable(frogs.data.rare, "Group")))) dist.bc <- distance(frogs.data.rare, "bray") p.bc<-plot_dist_as_heatmap(dist.bc, order = SampleOrder,title="Bray Distances", show.names = T) SampleOrder <- levels(reorder(sample_names(frogs.data.rare), as.numeric(get_variable(frogs.data.rare, "Group")))) dist.jac <- distance(frogs.data.rare, "cc") p.jac<-plot_dist_as_heatmap(dist.jac, order = SampleOrder,title="Jaccard Distances", show.names = T) SampleOrder <- levels(reorder(sample_names(frogs.data.rare), as.numeric(get_variable(frogs.data.rare, "Group")))) dist.uf <- distance(frogs.data.rare, "unifrac") p.uf<-plot_dist_as_heatmap(dist.uf, order = SampleOrder,title="Unifrac Distances", show.names = T) SampleOrder <- levels(reorder(sample_names(frogs.data.rare), as.numeric(get_variable(frogs.data.rare, "Group")))) dist.wuf <- distance(frogs.data.rare, "wunifrac") p.wuf<- plot_dist_as_heatmap(dist.wuf, order = SampleOrder,title="Weighted Unifrac Distances", show.names = T) plot_grid( p.bc, p.jac, p.uf, p.wuf, ncol = 2) ``` ### Ordination des échantillons {.tabset} #### Plot ordination Bray Curtis ```{r 24-ord-plot-bray,fig.width=10,fig.height=5, eval=TRUE} ord <- ordinate(frogs.data.rare, method = "MDS", distance = "bray") family.palette<- c("Vehicule" = "#227432", "Rotenone" = "#8C67A9") p <- plot_ordination(frogs.data.rare, ord, color = "Group",shape="Group") p <- p + theme_bw() + ggtitle("MDS + Bray Curtis")+ scale_color_manual(values = family.palette) p <- p +geom_point(size=3)+ stat_ellipse(aes(group = Group), size = 1) plot(p) ``` * Adonis sur la matrice Bray Curtis ```{r 35-adonis-bc-1, eval=TRUE} adonis(dist.bc ~ Group ,data = as(sample_data(frogs.data.rare), 'data.frame')) ``` Df: degree of freedom. Sums Of Sqs: sum of squares. MeanSqs: sum of squares by degree of freedom. F: F statistics. R2: partial R-squared. Pr(>F) p-value based #### Plot ordination Jaccard ```{r 28-ord-plot-jac,fig.width=10,fig.height=5, eval=TRUE} ord <- ordinate(frogs.data.rare, method = "MDS", distance = "cc") p <- plot_ordination(frogs.data.rare, ord, color = "Group",shape="Group") p <- p + theme_bw() + ggtitle("MDS + Jaccard")+ scale_color_manual(values = family.palette) p <- p +geom_point(size=3)+ stat_ellipse(aes(group = Group), size = 1) plot(p) ``` * Adonis sur la matrice Jaccard ```{r 35-adonis-jac-1, eval=TRUE} adonis(dist.jac ~ Group,data = as(sample_data(frogs.data.rare), 'data.frame')) ``` #### Plot ordination Unifrac ```{r 30-ord-plot-unifrac,fig.width=10,fig.height=5, eval=TRUE} ord <- ordinate(frogs.data.rare, method = "MDS", distance = "unifrac") p <- plot_ordination(frogs.data.rare, ord, color = "Group", shape="Group") p <- p + theme_bw() + ggtitle("MDS + Unifrac")+ scale_color_manual(values = family.palette) p <- p +geom_point(size=3)+ stat_ellipse(aes(group = Group), size = 1) plot(p) ``` * Adonis sur la matrice Unifrac ```{r 36-adonis-uni-1, eval=TRUE} adonis(dist.uf ~ Group,data = as(sample_data(frogs.data.rare), 'data.frame')) ``` #### Plot ordination Weighted Unifrac ```{r 31-ord-plot-Wunifrac,fig.width=10,fig.height=5, eval=TRUE} ord <- ordinate(frogs.data.rare, method = "MDS", distance = "wUnifrac") p <- plot_ordination(frogs.data.rare, ord, color = "Group", shape="Group") p <- p + theme_bw() + ggtitle("MDS + Weighted Unifrac")+ scale_color_manual(values = family.palette) p <- p +geom_point(size=3)+ stat_ellipse(aes(group = Group), size = 1) plot(p) ``` * Adonis sur la matrice wUnifrac ```{r 37-adonis-wuni-1, eval=TRUE} adonis(dist.wuf ~ Group,data = as(sample_data(frogs.data.rare), 'data.frame')) ``` ### Clustering des échantillons {.tabset} Clustering des échantillons pour vérifier s'ils se regroupent par groupe (ou autre). #### Clustering sur la matrice de Bray Curtis ```{r 33-plot-clust-bray curtis,fig.width=10,fig.height=5, eval=TRUE} plot_clust(frogs.data.rare, dist = dist.bc, method = "ward.D2", color ="Group", palette = family.palette) plot_clust(frogs.data.rare, dist = dist.bc, method = "ward.D", color = "Group", palette = family.palette) plot_clust(frogs.data.rare, dist = dist.bc, method = "median", color = "Group", palette = family.palette) plot_clust(frogs.data.rare, dist = dist.bc, method = "average", color = "Group", palette = family.palette) ``` #### Clustering sur la matrice de Jaccard ```{r 34-plot-clust-jaccard,fig.width=10,fig.height=5, eval=TRUE} plot_clust(frogs.data.rare, dist = dist.jac, method = "ward.D2", color = "Group", palette = family.palette) plot_clust(frogs.data.rare, dist = dist.jac, method = "ward.D", color = "Group", palette = family.palette) plot_clust(frogs.data.rare, dist = dist.jac, method = "median", color = "Group", palette = family.palette) plot_clust(frogs.data.rare, dist = dist.jac, method = "average", color = "Group", palette = family.palette) ``` #### Clustering sur la matrice Unifrac ```{r 35-plot-clust-unifrac,fig.width=10,fig.height=5, eval=TRUE} plot_clust(frogs.data.rare, dist = dist.uf, method = "ward.D2", color = "Group", palette = family.palette) plot_clust(frogs.data.rare, dist = dist.uf, method = "ward.D", color = "Group", palette = family.palette) plot_clust(frogs.data.rare, dist = dist.uf, method = "median", color = "Group", palette = family.palette) plot_clust(frogs.data.rare, dist = dist.uf, method = "average", color = "Group", palette = family.palette) ``` #### Clustering sur la matrice Weighted Unifrac ```{r 36-plot-clust-wunifrac,fig.width=10,fig.height=5, eval=TRUE} plot_clust(frogs.data.rare, dist = dist.wuf, method = "ward.D2", color = "Group", palette = family.palette) plot_clust(frogs.data.rare, dist = dist.wuf, method = "ward.D", color="Group", palette = family.palette) plot_clust(frogs.data.rare, dist = dist.wuf, method = "median", color = "Group", palette = family.palette) plot_clust(frogs.data.rare, dist = dist.wuf, method = "average", color = "Group", palette = family.palette) ``` ### Network {.tabset} #### Network Bray curtis ```{r 37-network bray,fig.width=8,fig.height=8, eval=TRUE} g <- make_network(frogs.data.rare, distance = dist.bc, max.dist = 0.4, keep.isolates = TRUE) p <- plot_network(g, physeq = frogs.data.rare, color = "Group", label = "value", hjust = 2, title = "Beta diversity network") p <- p+ scale_color_manual(values = family.palette) p ``` #### Network Jaccard ```{r 38-network jaccard,fig.width=8,fig.height=8, eval=TRUE} g <- make_network(frogs.data.rare, distance = dist.jac, max.dist = 0.4, keep.isolates = TRUE) p <- plot_network(g, physeq = frogs.data.rare, color = "Group", label = "value", hjust = 2, title = "Jaccard diversity network") p <- p+ scale_color_manual(values = family.palette) p ``` #### Network Unifrac ```{r 39-network unifrac,fig.width=8,fig.height=8, eval=TRUE} g <- make_network(frogs.data.rare, distance = dist.uf, max.dist = 0.4, keep.isolates = TRUE) p <- plot_network(g, physeq = frogs.data.rare, color = "Group", label = "value", hjust = 2, title = "Unifrac diversity network") p <- p+ scale_color_manual(values = family.palette) p ``` #### Network Weighted Unifrac ```{r 40-network Wunifrac,fig.width=8,fig.height=8, eval=TRUE} g <- make_network(frogs.data.rare, distance = dist.wuf, max.dist = 0.4, keep.isolates = TRUE) p <- plot_network(g, physeq = frogs.data.rare, color = "Group", label = "value", hjust = 2, title = "Weighted Unifrac diversity network") p <- p+ scale_color_manual(values = family.palette) p ``` ## Abondances différentielles entre les 2 groupes {.tabset}
Pour les analyses suivantes, je prends p-valeur ajustée **pval=0.05**, **min.abundance=30** et **lfc.threshold = 2**
```{r 50-helpers, eval=TRUE} if (Sys.info()["user"] == 'mmariadasso') { source("~/Documents/Custom_Functions.R") } else { source("https://raw.githubusercontent.com/mahendra-mariadassou/phyloseq-extended/master/load-extra-functions.R") } #remotes::install_github("mahendra-mariadassou/phyloseq-extended", ref = "dev") library(phyloseq.extended) #Fonctions auxiliaires g?n?riques qui sont utilis?es pour toutes les analyses (?vite la r?p?titon de blocs de code) ## Extraction des OTUs diff?rentiels et ajout des informations taxonomiques extract_daotus <- function(cond1, cond2, physeq, pval = 0.05) { res <- DESeq2::results(dds, contrast=c("Group", cond1, cond2)) %>% as.data.frame() %>% mutate(OTU = taxa_names(physeq)) %>% filter(padj < pval) %>% inner_join(tax_table(physeq) %>% as("matrix") %>% as_tibble(rownames = "OTU"), by = "OTU") %>% arrange(log2FoldChange) return(res) } ## Construction du tableau de donn?es pour repr?sentation graphique build_plotdata_daotus <- function(da.otus, physeq, min.abundance = 30) { #CHANGE IF YOU WANT da.data <- rarefy_even_depth(physeq, rng = 20170329, verbose = FALSE) %>% prune_taxa(da.otus$OTU, .) is_abundant <- function(x) { x > min.abundance } abundant_taxa <- genefilter_sample(da.data, is_abundant, A = 1) da.data <- prune_taxa(abundant_taxa, da.data) max.level <- apply(as(tax_table(da.data), "matrix"), 1, function(x) { y <- x[!is.na(x)]; y <- y[y != ""]; y <- y[!grepl("unknown", y, ignore.case = T)] y <- y[y != "Multi-affiliation"] # if (names(y)[length(y)] != "Family") { # return(paste(y["Family"], y[length(y)], sep = "/")) #} else { return(y[length(y)]) # } }) otu.formatted.names <- data.frame(OTU2 = paste0(gsub(pattern = "Cluster_", "", names(max.level)), "_", max.level), OTU = names(max.level), stringsAsFactors = FALSE) %>% inner_join(da.otus, by = "OTU") %>% arrange(log2FoldChange) otu.order <- otu.formatted.names %>% pull(OTU2) da.df <- merge(psmelt(da.data), otu.formatted.names) %>% mutate(OTU2 = factor(OTU2, levels = otu.order)) return(da.df) } ## Fonction de heatmap plot_da_heatmap <- function(da.otus, physeq) { ## Build plotdata and order OTU according to lfc plotdata <- build_plotdata_daotus(da.otus, physeq, 0) %>% mutate(OTU = factor(OTU, levels = da.otus$OTU)) ## Build vector label labvec <- with(da.otus, paste0(Family, " (", OTU, ")")) names(labvec) <- da.otus$OTU ## Automatic text size text_size <- function(n, mins = 8, maxs = 30, B = 12, D = 100) { s <- B * exp(-n/D) s <- ifelse(s > mins, s, mins) s <- ifelse(s < maxs, s, maxs) return(s) } ggplot(plotdata) + aes(x = Sample, y = OTU, fill = Abundance) + geom_tile() + facet_grid(~ Group, scales = "free_x") + labs(y = NULL, x = "Sample") + scale_y_discrete(labels = labvec) + scale_fill_gradient(low = "yellow", high = "red", na.value = "white", trans = scales::log_trans(4)) + theme(axis.text.x = element_text(angle = 270, size = text_size(nsamples(physeq))), axis.text.y = element_text(size = text_size(nrow(da.otus)))) } plot_da_boxplot <- function(plotdata) { ggplot(plotdata, aes(x = Group, y = Abundance)) + geom_boxplot(aes(group = Group, fill = Group)) + theme_bw() + theme(legend.position = "none") + facet_wrap(~OTU2, ncol = 3, scales = "free_y") + geom_text(aes(y = Inf, x=Inf, label = paste0("p = ", format(pvalue, digits = 4))), size=4, hjust = 2, vjust = 1) } ## Function pour le graphique composite plot_da_composite <- function(da.otus, physeq, labvec = NULL, lfc.threshold = 2, min.abundance = 30) { ## Custome theme #on peut mettre 3 custom.theme <- theme_bw() + theme(panel.spacing = unit(0.2, "lines"), axis.title.y = element_blank(), axis.ticks.y = element_blank(), axis.text.y = element_blank(), strip.text.y = element_blank()) ## Plotdata plotdata <- build_plotdata_daotus(da.otus, physeq, min.abundance = min.abundance) %>% filter(abs(log2FoldChange) >= lfc.threshold) %>% arrange(log2FoldChange) ## Correct OTU and Family Order otu.order <- da.otus %>% pull(OTU) fam.order <- da.otus %>% dplyr::select(Phylum, Family) %>% unique() %>% arrange(Phylum, Family) %>% pull(Family) plotdata <- plotdata %>% mutate(OTU = factor(OTU, levels = otu.order), Family = factor(Family, levels = fam.order), OTU.label = if_else(Genus != "Unknown", Genus, NA_character_)) ## Annotation data annotate.data <- plotdata %>% group_by(Family) %>% summarize(OTU = length(unique(OTU))) %>% mutate(label = factor(Family, levels = fam.order), log2FoldChange = min(plotdata$log2FoldChange) * 1.1 ## log2FoldChange = -Inf ) ## Plots p.lfc <- ggplot(plotdata, aes(y = OTU, x = log2FoldChange, color = Family)) + geom_point() + geom_vline(xintercept = 0, linetype = 2) + facet_grid(Family~., space = "free_y", scale = "free_y", switch = "both") + geom_text(data = annotate.data, aes(label = label), hjust = 0, vjust = 1) + #vjust=1 c'est le point au dessus du nom family geom_text(aes(label = OTU.label, hjust="inward", vjust=1),size=3.5) #size=3 taille nom family de couleur #vjust=0.5 Noms family à coté point p.evidence <- ggplot(plotdata, aes(x = OTU, y = -log10(padj), fill = Family)) + geom_bar(stat = "identity") + facet_grid(Family~., space = "free_y", scale = "free_y") + coord_flip() + ylab("Evidence") if (is.null(labvec)) { labvec <- setNames(nm = unique(plotdata$Group)) } hm.data <- plotdata %>% group_by(OTU, Group, Family) %>% summarize(Abundance = mean(Abundance)) %>% ungroup() %>% mutate(OTU = factor(OTU, levels = otu.order), Family = factor(Family, levels =fam.order), Group = factor(Group, levels = names(labvec), labels = labvec)) p.hm <- ggplot(hm.data, aes(y = OTU, x = Group, fill = Abundance)) + geom_tile() + facet_grid(Family~., space = "free_y", scale = "free_y") + #Heatmap séparée par Family scale_fill_gradient(trans = scales::log_trans(base = 2), na.value = "white", low = "white", high = "black") p.lfc.2 <- p.lfc p.lfc.2$layers[[3]] <- NULL p1 <- arrangeGrob(p.lfc.2 + custom.theme + theme(legend.position = "none", strip.text.y = element_text( angle = 0,hjust = 0, size = 10), #Noms Family sur la gauche strip.background = element_blank()), p.evidence + custom.theme + scale_x_discrete(position = "top") + theme(legend.position = "none", axis.text.y = element_text(size = 8)), p.hm + custom.theme, widths = c(0.47, 0.35, 0.18), #taille largeur des 3 cadres doit être =1 nrow = 1, ncol = 3) grid::grid.newpage() grid::grid.draw(p1) invisible(p1) } ############################################################## ######## graphique composite 2 ############################### format_names <- function(physeq, min.rank = 0) { tax <- as(tax_table(physeq), "matrix") tax[tax %in% c("Unknown", "Multi-Affiliation")] <- NA final_rank <- function(x) { x <- x[!is.na(x)] if (length(x) <= min.rank) { return("") } else { tail(x, 1) } } apply(tax, 1, final_rank) } ## Function pour le graphique composite plot_da_compositebis <- function(da.otus, physeq, labvec = NULL, lfc.threshold = 2, min.abundance = 30) { ## Custome theme #on peut mettre 3 ou 2 custom.theme <- theme_bw() + theme(panel.spacing = unit(0.2, "lines"), axis.title.y = element_blank(), axis.ticks.y = element_blank(), axis.text.y = element_blank(), strip.text.y = element_blank(), axis.text.x = element_text(size = 10)) #taille noms groupes heatmap ## Plotdata plotdata <- build_plotdata_daotus(da.otus, physeq, min.abundance = min.abundance) %>% inner_join(tax_table(data) %>% as("matrix") %>% as.data.frame() %>% mutate(OTU = taxa_names(data), OTU.label = format_names(data))) %>% filter(abs(log2FoldChange) >= 2) %>% #On peut changer la valeur du log2FoldChange =3 ou 2 arrange(log2FoldChange) ##### Correct OTU and Family Order Ajout Mahendra otu.order <- da.otus %>% pull(OTU) fam.order <- da.otus %>% dplyr::select(Phylum, Family) %>% unique() %>% arrange(Phylum, Family) %>% pull(Family) plotdata <- plotdata %>% ## Ce que tu veux rajouter inner_join(tax_table(data) %>% as("matrix") %>% as.data.frame() %>% mutate(OTU = taxa_names(data), OTU.label = format_names(data), OTU.label.2 = format_names(data, min.rank = 5))) %>% ## Tri des OTUs par log fold change décroissant et des familles par ordre alphabétique au sein des phylums mutate(OTU = factor(OTU, levels = otu.order), Family = factor(Family, levels = fam.order)) #library(paletteer) p.lfc <- ggplot(plotdata %>% distinct(OTU, log2FoldChange, Family, OTU.label.2), aes(y = OTU, x = log2FoldChange, color = Family)) + geom_point(size=3) + geom_vline(xintercept = 0, linetype = 2) + #size=1 taille point et noms family geom_text(aes(x = log2FoldChange - .25*sign(log2FoldChange), label = OTU.label.2, vjust=0.5, #Taille et position noms family à côté des points hjust = if_else(log2FoldChange > 0, 1, 0)))+ #scale_color_paletteer_d("bpalette")+ scale_color_discrete()+ #scale_color_viridis(discrete = TRUE, option = "D") + #scale_color_brewer(palette="Paired") si plus de 10 Family scale_y_discrete(position = "right") # + scale_x_continuous(limits = c(-15, 35)) ######## pour changer echelle axe des x (log2foldchange) #Heatmap hm.data <- plotdata %>% group_by(OTU, Group, Family) %>% summarize(Abundance = mean(Abundance)) %>% ungroup() %>% mutate(OTU = factor(OTU, levels = otu.order), Family = factor(Family, levels =fam.order), Group = factor(Group, levels = names(labvec), labels = labvec)) p.hm <- ggplot(hm.data, aes(y = OTU, x = Group, fill = Abundance)) + geom_tile() + scale_fill_gradient(trans = log_trans(base = 2), na.value = "white", low = "white", high = "black") # p2 <- arrangeGrob(p.lfc + custom.theme + guides(colour = guide_legend(override.aes = list(size=5)))+ #taille points de la legende theme(legend.position = c(0.025, 0.975), ########### OU theme(legend.position = c(0.575, 0.425) ################## legend.justification = c(0, 1), legend.text = element_text(size = 16), #taille noms legende legend.background = element_rect(fill = "transparent"), axis.text.y = element_text(size = 9, hjust = 0.5), #Affichage noms des clusters axis.text.x = element_text(size = 10)), #taille noms legende axe des x p.hm + custom.theme, #p.hm défini la heatmap widths = c(0.78, 0.22), #largeur heatmap nrow = 1, ncol = 2) grid::grid.newpage() grid::grid.draw(p2) invisible(p2) } ################################################################ ## tester la présence d'un OTU is_present <- function(comptage) {comptage > 0} ``` Nous avons 29 échantillons. ```{r 2-variable, eval = TRUE, echo=TRUE} data <- frogs.data table(get_variable(data, "Group")) ``` On ne conserve que les taxas prévalents ```{r 6-prevalent-taxa, eval=TRUE} prevalent_taxa <- genefilter_sample(data, is_present, A = 7) #25% des echantillons filtered.data <- prune_taxa(prevalent_taxa, data) ``` ```{r autres filtres, message = FALSE} # On a plusieurs dizaines d'OTUs avec des p-valeurs ajustées très basses (`r nrow(res %>% filter(padj <= 1e-5))` avec une p-valeur < 1e-5). # # Si en plus on conserve uniquement les OTUs abondants, c'est à dire avec une abondance relative supérieure à 0.1% dans au moins 7 échantillons (i.e. 50%) des échantillons. # # abundant.taxa <- genefilter_sample(fruc.data %>% transform_sample_counts(function(x) { x / sum(x) }), # function(x) {x > 1e-3}, A = 7) %>% which() %>% names() # res <- res %>% filter(OTU %in% abundant.taxa) # # on tombe à `r nrow(res)` OTUs DA. # # Finalement, si on se restreint en plus aux OTUs DA avec un fort effet, c'est à dire un log fold-change (lfc) supérieur à 3 en valeur absolue, on arrive à # `r nrow(res %>% filter(abs(log2FoldChange) >= 3))` OTUs. ## Représentation des OTUs DA (essais) #On va se fixer un certain nombre d'OTUs DA (ceux avec un lfc supérieur à 3 en valeur absolue, qui ont tous une p-valeur ajustée inférieure à `r res %>% filter(abs(log2FoldChange) >= 3) %>% pull(padj) %>% max() %>% round(5)`) et représenter les log-fold change de ces OTUs. ``` On fait ensuite l'analyse d'abondance différentielle avec DESeq2 ```{r 7-DESeq2, message = FALSE, eval=TRUE} cds <- phyloseq_to_deseq2(filtered.data, design = ~ Group) dds <- DESeq2::DESeq(cds) ``` ### Table des OTUs différentiels Avec **pval=0.05** ```{r 18-table-value, eval = TRUE} da.otus <- extract_daotus("Vehicule", "Rotenone", filtered.data, pval = 0.05) da.otus %>% DT::datatable(extensions = 'Buttons', options = list(dom = 'Bfrtip', pageLength = 10, buttons = list('copy', 'csv', 'excel'))) ``` ### Heatmap Des OTUs prévalents avec **pval=0.05** ```{r 41-heatmap_S-A_F3w-A_F3w, warning=FALSE,fig.width = 7, fig.height = 5, eval = TRUE} plot_da_heatmap(da.otus, data) + ggtitle("DAOTUs[Vehicule-Rotenone] ") ``` ### Boxplot Des OTUs prévalents avec **pval=0.05** et **min.abundance = 30** ```{r 42-boxplot, fig.width = 5, fig.height = 5, eval = TRUE} plotdata <- build_plotdata_daotus(da.otus, data, min.abundance = 30) plot_da_boxplot(plotdata) ``` ### Boxplot Phylum Représentation en boxplot des OTUs differentiellements abondants ```{r 43-boxplot-Phylum,fig.width = 8, fig.height = 4, eval=TRUE} p<-ggplot(plotdata, aes(x=OTU2,y=Abundance,fill=Phylum,color=Phylum,alpha=Group)) + scale_y_log10() + scale_alpha_discrete(range=c(0,1), guide=guide_legend(override.aes = list(fill="black"))) + theme(axis.text.x = element_text(angle=90,vjust = 1,hjust = 1,size=12))+ theme(axis.text.y = element_text(size=11)) + geom_boxplot()+ geom_text(aes(y = 500, label = paste0("p = ", format(pvalue)),angle=0))+ coord_flip() plot(p) # la boxplot pour un OTU prend toute la place si l'OTU n'a pas de comptages non-nuls dans l'autre groupe donc il est automatiquement retiré du graphe # Representation par un trait d'un OTU s'il n'a que des comptages égaux à 0 ou 1 dans 1 groupe. Comme les comptages nuls sont retirés (à cause de la transformation log), il ne reste que les comptages égaux à 1 et donc la boxplot est dégénérée (pas de variations dans les données) et se réduit à un trait (min = max = médiane) ``` ### Boxplot Family ```{r 44-boxplot-Family,fig.width = 8, fig.height = 4, eval=TRUE} p<-ggplot(plotdata, aes(x=OTU2,y=Abundance,fill=Family,color=Family,alpha=Group)) + scale_y_log10() + scale_alpha_discrete(range=c(0,1), guide=guide_legend(override.aes = list(fill="black"))) + theme(axis.text.x = element_text(angle=90,vjust = 1,hjust = 1,size=12))+ theme(axis.text.y = element_text(size=11)) + geom_boxplot()+ geom_text(aes(y = 500, label = paste0("p = ", format(pvalue)),angle=0))+ coord_flip() plot(p) ``` ### Plot divergence family Représentations avec **pval = 0.05** / **min abundance = 30** / **log2foldchange = 2** ```{r 45-divergence, fig.width = 10, fig.height= 3, message = FALSE, warning=FALSE, eval = TRUE} plot_da_composite(da.otus, data, labvec = c("Vehicule" = "V", "Rotenone" = "R"), min.abundance = 30, lfc.threshold = 2) ``` ### Plot divergence global Représentations avec **pval = 0.05** / **min abundance = 30** / **log2foldchange = 2** Graphique sur lequel n'est affiché que la classification d'un OTU que si elle est plus précise que la famille (qui apparait déjà via la couleur) ```{r 46-divergence, fig.width = 10, fig.height= 3, message = FALSE, warning=FALSE, eval = TRUE} plot_da_compositebis(da.otus, data, labvec = c("Vehicule" = "V", "Rotenone" = "R"), min.abundance = 30, lfc.threshold = 2) ``` ```{r session info,eval = TRUE} #conflicts(detail=TRUE) sessionInfo() ```