module-3-Stat-R

Logo

Analyse statistique avec R

View the Project on GitHub DU-Bii/module-3-Stat-R

Statistics with R – 2020 session

Contents

Goals

The aim of this module is to provide you with the bases of R programming and to present you some statistical concepts for high-throuput data.

To follow this course, prior knowledge is expected on:

Link to the prerequisites:

https://du-bii.github.io/accueil/activites_preparatoires/

Teachers

Name Role(s)
Claire Vandiedonck Coordinator, teacher
Jacques van Helden Coordinator, teacher
Guillaume Achaz Teacher
Anne Badel Teacher
Magali Berland Teacher
Antoine Bridier-Nahmias Teacher
Olivier Sand Teacher
Natacha Cerisier Helper
Doc Description URL
Git pages Web site of the course (to see the supports) https://du-bii.github.io/module-3-Stat-R/stat-R_2020/
Git repo Repository enabling to download or clone the teaching material on your computer https://github.com/DU-Bii/module-3-Stat-R
RStudio at IFB cluster link to RStudio on th IFB cluster https://rstudio.cluster.france-bioinformatique.fr/
Moodle link to the Moddle web page of the DUBii with your ENT account] https://moodlesupd.script.univ-paris-diderot.fr/course/view.php?id=10629

R tutorials and good practice

Doc Description URL
Cheet sheats RStudio Cheet sheats https://rstudio.com/resources/cheatsheets/
Tutorial Tutorial for Beginners by E. Paradis - English version https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf
Tutorial Tutorial for Beginners by E. Paradis - French version https://cran.r-project.org/doc/contrib/Paradis-rdebuts_fr.pdf
R style guide Google’s R Style Guide https://google.github.io/styleguide/Rguide.html

Teaching material

R and Rmd basics

Topics Duration Material  
Slides for the whole session 1/2 day [pdf]  
wooclap Poll 5’ [html)]  
Start R 20’ [link to RStudio on th IFB cluster] [start-R.html] [start-R.Rmd]  
Validation of the prerequisites: quizz on Moodle 15’ + 20’ [with your ENT account] [with password: dubii2020]  
Script Live demo on the board [R]  
Basic R structures (matrices, data frames, factors and lists) 45’ [basic_R-structures.html] [basic_R-structures.Rmd] [Factors_in_R.html] [Factors_in_R.Rmd]
Coffee break 15’    
Intro to programming with R 35’ [html] [Rmd]  
R markdown 45’ Demo and [COVID-19_HK.nb.html]  

Statistical analysis with R

Type Description Links
Slides Slides for the whole session [pdf]
R Scripts Scripts used for the slides [R]
Shiny app Shiny app to explore sampling fluctuation http://shiny.calpoly.sh/Sampling_Distribution/
Practical Descriptive statistics [html] [Rmd]
R package Package to visualise effect sizes https://github.com/ACCLAB/dabestr
Document Memo on correlation and regression [pdf]
Practical A first data analysis with R [html] [Rmd]

Links mentionned during the session:

Topic Title Description Link
Basic stats explained to biologists Points of Significance Nature Methods collection https://www.nature.com/collections/qghhqm/pointsofsignificance
How to represent data Points of View Nature Methods collection http://blogs.nature.com/methagora/2013/07/data-visualization-points-of-view.html
How to represent data DEFAKATOR Détecter des graphiques trompeurs https://www.youtube.com/watch?v=crTt-QIyS-o
Collective result table Table to collect and compare trainee’s results <tinyurl.com/dubii20-randnumstat>  

Statistics for omics data

Topics Description Duration Material
Debrief session 1 & 2 - part I R code: data structures, function usage, plots 20’ live
Practical part I simulated data 30’ [mean-comparison-test_random-numbers.html] [mean-comparison-test_random-numbers.pdf] [mean-comparison-test_random-numbers.Rmd]
Debrief session 1 & 2 - part II basic statistics 20’ [pdf]
Coffee break 15’    
Practical part II industrialization of hypotheses tests 45’ [randnum.R]
Statistics on omics data - part I multiple testing issue 20’ [pdf]
Practical part III correction for multiple testing 15’ same as above
Statistics on omics data - part II parameters estimation issue 15’ [pdf]

Clustering

Contents: clustering/

Data exploration

Topics Type Duration Material
Clustering: rappel de la séance précédente Slides   [html] [Rmd]
Analyse en composante principale Diapos   [html] [Rmd]
Practical: data preparation (Pavkovic, 2019) Notebook code   [Rmd]
Practical: data preparation (Pavkovic, 2019) - UUO proteome dataset Notebook report   [html]
Practical: data preparation (Pavkovic, 2019) - UUO transcriptome dataset Notebook report   [html]
Practical: data preparation (Pavkovic, 2019) - FA proteome dataset Notebook report   [html]
Practical: data preparation (Pavkovic, 2019) - FA transcriptome dataset Notebook report   [html]
Practical: data exploration (Pavkovic, 2019) Exercises   [html]
R scripts (click on script, then “Raw”) R scripts   [github]
Report template with nice yaml header Rmd   [Rmd] [html]
Report (partial): PCA of proteomics data Practical   [html] [Rmd]

Topics for 2020-06-03

Enrichment analysis

Topics Type Duration Material
What was not done yesterday mixed who knows see previous course
Enrichment analysis html who knows [html]
Enrichment analysis Rmd who knows [Rmd]

Supervised classification

Topics Type Material
Starter: brain-learning exercise Exercise [pdf] [Quizz]
Mutlivariate analysis - Intro Lecture [pdf]
Apprentissage automatique Lecture [pdf]
TCGA Breast Invasive Cancer dataset Practical [html] [Rmd]

Evaluation

  1. Quiz: https://moodlesupd.script.univ-paris-diderot.fr/mod/quiz/view.php?id=249268

  2. Personal work (report template): [Rmd] [html]