Commit dd909dc2 authored by Nathalie Vialaneix's avatar Nathalie Vialaneix
Browse files

final review of the second case study

parent f81a0df6
Pipeline #77428 passed with stages
in 15 seconds
......@@ -459,7 +459,7 @@ knitr::include_graphics('./images/casestudies/piglet/Asterics_PCA_Proteome_Muscl
knitr::include_graphics('./images/casestudies/piglet/Asterics_PCA_Proteome_Muscle_individual_plot_color_age_b.png')
```
As expected, `Age` appears with a strong effect on the variability in the data.
As expected, `Age` appears to have a strong effect on the variability in the data.
The same variable can be used for 2 different graphical features.
......@@ -483,11 +483,11 @@ This could result in a graphical output relatively difficult to interpret.
knitr::include_graphics('./images/casestudies/piglet/Asterics_PCA_Proteome_Muscle_individual_plot_color_shape_size_b.png')
```
As expected, weights at age 110 are higher than at age 90. Nevertheless, the weights didn't seem to explain the heterogeneity at age 110, i.e., higher weights are not positioned far from the first axis.
As expected, weights at age 110 are higher than at age 90. Nevertheless, the weights don't seem to explain the heterogeneity at age 110, i.e., higher weights are not positioned at an extreme (left or right) of the first axis.
### Variable plot
Use the slider to lower the number of variables displayed. The higher the correlation, the lower the number of variables to be displayed.
Use the slider to lower the number of displayed variables. The higher the correlation, the lower the number of variables to be displayed.
```{r}
knitr::include_graphics('./images/casestudies/piglet/Asterics_PCA_Proteome_Muscle_variable_plot_threshold.png')
......@@ -515,11 +515,11 @@ We carry on the exploration of the `Proteome_Muscle` dataset addressing a cluste
knitr::include_graphics('./images/casestudies/piglet/Asterics_Clustering_Proteome_Muscle.png')
```
The first step consist in choosing the number of clusters.
The first step consists in choosing the number of clusters.
Biologically, we expect a number of clusters relative to the number of conditions. Here, we have two ages and we have four genotypes for the piglets. Then a number of clusters of 8 would be perfect but biology can reserve us surprises.
Biologically, we expect a number of clusters relative to the number of conditions. Here, we have two ages and we have four genotypes for the piglets. Then a number of clusters of 8 would be perfect but surprises are to be expected in biology.
The number of clusters is also statistically proposed by 'broken stick'.
Some heuristics to choose the number of clusters are also proposed in ASTERICS, such as the “broken stick” heuristic.
```{r}
knitr::include_graphics('./images/casestudies/piglet/Asterics_Clustering_Proteome_Muscle_dendrogramme.png')
......@@ -539,22 +539,22 @@ A table gives the number of individuals by cluster.
knitr::include_graphics('./images/casestudies/piglet/Asterics_Clustering_Proteome_Muscle_table.png')
```
An the individual plot of PCA can be used again with the color of the points depending on the cluster.
A plot of the projection of individuals on the first axis of the PCA is also given, with the colors of the points corresponding to the clusters.
```{r}
knitr::include_graphics('./images/casestudies/piglet/Asterics_Clustering_Proteome_Muscle_PCA.png')
```
If we keep in mind the former PCA projection of individuals, we can deduce that the cluster 1 is composed of piglets at day 90, cluster 2 piglets at day 110, cluster 4 piglets at day 110, cluster 5 piglets at day 90. The cluster 3 is surprisingly composed of piglets at both ages! Further exploration may help to elucidate this clustering.
If we have kept in mind the former PCA projection of individuals, we can deduce that the cluster 1 is composed of piglets at day 90, cluster 2 piglets at day 110, cluster 4 piglets at day 110, cluster 5 piglets at day 90. The cluster 3 is surprisingly composed of piglets at both ages! Further exploration may help to elucidate this clustering.
We can explore these clusters in the 'Explore variables in a dataset' to test some variables already known to be important. You can test genotype, weight, and also glucose, albumine, MHC_E (the embryonic myosin). To do so, we have to retrieve the cluster information to know which individual is in which cluster.
We can explore these clusters in the 'Explore variables in a dataset' to test some variables already known to be important for piglet maturity. We will test genotype, weight, and also glucose, albumine, MHC_E (the embryonic myosin). To do so, we have to retrieve the cluster information to know which individual is in which cluster.
```{r}
knitr::include_graphics('./images/casestudies/piglet/Asterics_Clustering_Proteome_Muscle_clusters.png')
```
This information can be extracted to a new dataset and used in the framework we already know. For instance, explore 2 variables, one being categorical indicating the cluster.
This information can be extracted to a new dataset and used in further analyses. For instance, explore 2 variables, one being categorical and indicating the cluster.
Let's first cross the categorical variable `Age` with the clusters.
......@@ -578,17 +578,17 @@ The embryonic myosin is more expressed at day 90 than at day 110. This variable
knitr::include_graphics('./images/casestudies/piglet/Asterics_Clustering_Proteome_Muscle_clusters_Weight.png')
```
The body weight cannot explained the clustering too. Further investigation are needed to explain this.
The body weight cannot explained the clustering either. Further investigations are needed to explain this grouping of piglets.
#### Workflow
Here is the updated workflow after the clustering analyses focused on the `Proteome_Muscle` dataset.
Finally, the updated workflow after the clustering analyses focused on the `Proteome_Muscle` dataset is given by:
```{r}
knitr::include_graphics('./images/casestudies/piglet/Asterics_workflow_Proteome_Muscle_clustering_focus.png')
```
and the global workflow.
and the global workflow by:
```{r}
knitr::include_graphics('./images/casestudies/piglet/Asterics_workflow_Proteome_Muscle_clustering_global.png')
......@@ -606,7 +606,7 @@ We propose to explore the relationships between the transcriptome and the proteo
knitr::include_graphics('./images/casestudies/piglet/Asterics_PLS_Proteome_Transcriptome_Muscle_50.png')
```
We first verify that we have exactly the same piglets studied and for how many variables. The two tables provided here are the same because the two datasets we want to integrate were acquired from the very same individuals.
We first verify that we have exactly the same piglets studied and for how many variables. The two tables provided here are the same because the two datasets we want to integrate were acquired from the very same individuals (the matching criterion is the row names, as given during the importation of datasets: be careful to set this value properly before you integrate datasets).
```{r}
knitr::include_graphics('./images/casestudies/piglet/Asterics_PLS_Proteome_Transcriptome_Muscle_50_Preprocessing.png')
......@@ -631,13 +631,13 @@ The individual plot clearly highlights two clusters. To elucidate this phenomeno
knitr::include_graphics('./images/casestudies/piglet/Asterics_PLS_Proteome_Transcriptome_Muscle_50_individuals_age.png')
```
Obviously, considering the previous analysis, investigation lead to an `Age` effect.
As expected considering the previous analysis, the investigation leads to an `Age` effect.
Compared to the PCA, the differentiation between ages is stronger when combining the two datasets.
#### Explore variables
The variable plot can be difficult to interpret depending on the number of variables. It is recommended to use correlation threshold to lower the number of variables displayed.
The variable plot can be difficult to interpret depending on the number of variables. It is recommended to use correlation threshold to lower the number of displayed variables.
```{r}
knitr::include_graphics('./images/casestudies/piglet/Asterics_PLS_Proteome_Transcriptome_Muscle_50_variables.png')
......@@ -655,7 +655,7 @@ knitr::include_graphics('./images/casestudies/piglet/Asterics_workflow_integrate
### Proteome Muscle 50 animals and Metabolome Plasma 444 animals
Now, we explore a tissue's omic, proteome of the muscle, with a fluid omic, the metabolome of the plasma. This is could be interesting to investigate covariations between variables from the two omics, to identify possible biomarkers in the plasma the most correlated with tissue variables (difficult to investigate as biomarker).
Now, we explore a tissue omic, proteome of the muscle, with a fluid omic, the metabolome of the plasma. This is could be interesting to investigate covariations between variables from the two omics, to identify possible biomarkers in the plasma that are best correlated with tissue variables (more difficult to use as a biomarker).
#### Preprocessing
......@@ -679,13 +679,13 @@ The following studies will be performed with the 45 common individuals.
knitr::include_graphics('./images/casestudies/piglet/Asterics_PLS_Proteome_Muscle_50_Metabolome_Plasma_444_individuals.png')
```
As in the previous analysis, we customize the individual plot to make easier the interpretation.
As in the previous analysis, we customize the individual plot to make the interpretation easier.
```{r}
knitr::include_graphics('./images/casestudies/piglet/Asterics_PLS_Proteome_Muscle_50_Metabolome_Plasma_444_individuals_age.png')
```
We can observe a very good differentiation between ages but in a different manner than with the two muscle omics. Here we observe a wider heterogeneity within the piglets at day 110.
We can observe a very good differentiation between ages but in a different manner than with the two muscle omics. Here, we observe a wider heterogeneity within the piglets at day 110.
#### Explore variables
......@@ -694,7 +694,9 @@ We can observe a very good differentiation between ages but in a different manne
knitr::include_graphics('./images/casestudies/piglet/Asterics_PLS_Proteome_Muscle_50_Metabolome_Plasma_444_variables.png')
```
Here the two datasets contained less variables than the transcriptome. Nevertheless, we can identify variables highly correlated to the first axis and then differentiating the two ages. One of the metabolite is the myo-inositol we considered as one of the best indicator of a delayed development at the end of gestation.
Here the two datasets contained less variables than the transcriptome. Nevertheless, we can identify variables highly correlated to the first axis and then differentiating the two ages. One of the metabolite is the myo-inositol that is known as one of the best indicator of a delayed development at the end of gestation.
**RK NATH : PEUT-ÊTRE METTRE LA RÉFÉRENCE À L'ARTICLE DE GAËLLE AU DÉBUT DE CE CASE STUDY ET AUSSI ICI.**
#### Workflow
......@@ -736,7 +738,7 @@ To discriminate ages, the error rate is very low (0 or 0.04).
#### Explore individuals
Unsupervised, default color associated to the levels of the categorical variable.
**RK NATH : JE NE COMPRENDS PAS LA PREMIÈRE PHRASE** Unsupervised, default color associated to the levels of the categorical variable.
```{r}
knitr::include_graphics('./images/casestudies/piglet/Asterics_PLSDA_Proteome_Muscle_50_Age_individuals.png')
......@@ -744,13 +746,13 @@ knitr::include_graphics('./images/casestudies/piglet/Asterics_PLSDA_Proteome_Mus
Previous results clearly indicates that the discrimination of the individuals based on their age is very easy. It is confirmed here with the very low error rate and the individual plot.
The individual plot can be modified to highlight another categorical variable and complete the interpretation of the results, for instance below with the genotype (TG_F).
The individual plot can be modified to highlight another categorical variable and complement the interpretation of the results, for instance below with the genotype (`TG_F`).
```{r}
knitr::include_graphics('./images/casestudies/piglet/Asterics_PLSDA_Proteome_Muscle_50_Age_individuals_enhanced.png')
```
We can observe that the genotypes within each gestation day are mixed.
We can observe that the genotypes within each gestation age are mixed.
#### Explore variables
......@@ -777,14 +779,14 @@ The error rate is now relatively high to discriminate the genotypes.
#### Explore individuals
Unsupervised, default color associated to the levels of the categorical variable. The discrimination is not as easy as with the age. Nevertheless, the extreme genotypes MSMS (pink) and LWLW (green) are discriminated.
**RK NATH : JE NE COMPRENDS PAS LA PREMIÈRE PHRASE** Unsupervised, default color associated to the levels of the categorical variable. The discrimination is not as easy as with the age. Nevertheless, the two extreme genotypes (pure breeds), MSMS (pink) and LWLW (green), are well discriminated.
```{r}
knitr::include_graphics('./images/casestudies/piglet/Asterics_PLSDA_Proteome_Muscle_50_TG_F_individuals.png')
```
Even if, the supervision is led by the genotype, we can see that the age still has a strong effect.
Even if the supervision is led by the genotype, we can see that the age still has a strong effect.
```{r}
knitr::include_graphics('./images/casestudies/piglet/Asterics_PLSDA_Proteome_Muscle_50_TG_F_individuals_enhanced.png')
......@@ -798,7 +800,7 @@ The interpretation of the variables has to be done using the correlation thresho
knitr::include_graphics('./images/casestudies/piglet/Asterics_PLSDA_Proteome_Muscle_50_TG_F_variables.png')
```
At this threshold, a lot of variables are selected. Most of them seem to be more correlated with the age effect as its effect is always very strong! Nevertheless, a protein as LXN must be more abundant in LWLW at day 90 and 110. A protein as GPD1.isoform4 must be more abundant in MSMS whatever the age. We can also explore variation in abundance of these proteins with univariate analyses (not shown).
At this threshold, a lot of variables are selected. Most of them seem to be more correlated with the age effect as its effect is always very strong! Nevertheless, a protein as LXN should be more abundant in LWLW at day 90 and 110. A protein as GPD1.isoform4 should be more abundant in MSMS whatever the age. We can also explore variation in abundance of these proteins with univariate analyses (not shown).
#### Workflow
......@@ -854,4 +856,4 @@ And here is the final workflow of the case study.
knitr::include_graphics('./images/casestudies/piglet/Asterics_workflow_final.png')
```
We had not analyze four datasets. Now it's up to you...
We had not analyze four of the imported datasets. Now it's up to you...
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment