The goal of this practical is to combine different statistical approaches to explore an omics dataset (transcriptome, proteome).
We prepared a notebook (in R markdown) detailing all the dirty cooking details to download the datasets from the Zenodo repository, and to run some preprocessing steps
We prepared memory images that enable you to reload the preprocessed datasets.
The following tasks will be covered in 2 sessions + a bit of personal work.
Alternatively you can immediately write the code inside a R markdown document, as far as it combines a properly documented code and relevant interpretation comments.
Choose one of the datasets and load the memory image
Compute sample-wise and feature-wise descriptive statistics
Use different graphical representations to compare the values and get familiar with your data. For example :
Solution by the teachers: [02_descriptive_stats.R]
Select the 500 top features according to two different criteria
Compute different metrics indicating the relations between samples (columns) of the log2-transformed values
Compute these on the commplete dataset and on the 500 matrix selections
Draw graphical representations of the result with corrplot()
.
Comment the results
Run hierarchical clustering with hclust()
to extract clusters from the dataset with selected variables.
Plot the feature trees and compare the results obtained with the different choices.
Plot heatmpas with the different feature trees obtained before, and compare the results. Inactivate the individual clustering (default in the heatmap).
We provide hereby
An R script with the principal component analysis of the expression set: [04_PCA.R]
An R markdown model of scientific report: [YOUR-NAME_REPORT-TOPICS.Rmd]
Write a short report that will integrate some pieces of chunks from the R script in the R markdown notebook, add your interpretation of the results, and compile it as a self-contained HTML file.
analyse the features declared positive (provided by the teachers) to the gProfiler function profiling web tool (https://biit.cs.ut.ee/gprofiler/gost)
write a piece of R code that runs the same anlaysis with the gProfileR
package
analyse (with either R or the Web site) the different gene groups discovered by the hclust approach, and evaluate if the results are more relevant when you submit all the differentially expressed features at once, or when you submit them cluster per cluster