Analyse statistique avec R
The aim of this module is to provide you with the bases of R programming and to present you some statistical concepts for high-throuput data.
To follow this course, prior knowledge is expected on:
Link to the prerequisites:
https://du-bii.github.io/accueil/activites_preparatoires/
| Name | Role(s) |
|---|---|
| Claire Vandiedonck | Coordinator, teacher |
| Jacques van Helden | Coordinator, teacher |
| Guillaume Achaz | Teacher |
| Anne Badel | Teacher |
| Magali Berland | Teacher |
| Antoine Bridier-Nahmias | Teacher |
| Olivier Sand | Teacher |
| Natacha Cerisier | Helper |
| Doc | Description | URL |
|---|---|---|
| Git pages | Web site of the course (to see the supports) | https://du-bii.github.io/module-3-Stat-R/stat-R_2020/ |
| Git repo | Repository enabling to download or clone the teaching material on your computer | https://github.com/DU-Bii/module-3-Stat-R |
| RStudio at IFB cluster | link to RStudio on th IFB cluster | https://rstudio.cluster.france-bioinformatique.fr/ |
| Moodle | link to the Moddle web page of the DUBii with your ENT account] | https://moodlesupd.script.univ-paris-diderot.fr/course/view.php?id=10629 |
| Doc | Description | URL |
|---|---|---|
| Cheet sheats | RStudio Cheet sheats | https://rstudio.com/resources/cheatsheets/ |
| Tutorial | Tutorial for Beginners by E. Paradis - English version | https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf |
| Tutorial | Tutorial for Beginners by E. Paradis - French version | https://cran.r-project.org/doc/contrib/Paradis-rdebuts_fr.pdf |
| R style guide | Google’s R Style Guide | https://google.github.io/styleguide/Rguide.html |
| Topics | Duration | Material | |
|---|---|---|---|
| Slides for the whole session | 1/2 day | [pdf] | |
| wooclap Poll | 5’ | [html)] | |
| Start R | 20’ | [link to RStudio on th IFB cluster] [start-R.html] [start-R.Rmd] | |
| Validation of the prerequisites: quizz on Moodle | 15’ + 20’ | [with your ENT account] [with password: dubii2020] | |
| Script | Live demo on the board | [R] | |
| Basic R structures (matrices, data frames, factors and lists) | 45’ | [basic_R-structures.html] [basic_R-structures.Rmd] [Factors_in_R.html] | [Factors_in_R.Rmd] |
| Coffee break | 15’ | ||
| Intro to programming with R | 35’ | [html] [Rmd] | |
| R markdown | 45’ | Demo and [COVID-19_HK.nb.html] |
| Type | Description | Links |
|---|---|---|
| Slides | Slides for the whole session | [pdf] |
| R Scripts | Scripts used for the slides | [R] |
| Shiny app | Shiny app to explore sampling fluctuation | http://shiny.calpoly.sh/Sampling_Distribution/ |
| Practical | Descriptive statistics | [html] [Rmd] |
| R package | Package to visualise effect sizes | https://github.com/ACCLAB/dabestr |
| Document | Memo on correlation and regression | [pdf] |
| Practical | A first data analysis with R | [html] [Rmd] |
Links mentionned during the session:
| Topic | Title | Description | Link |
|---|---|---|---|
| Basic stats explained to biologists | Points of Significance | Nature Methods collection | https://www.nature.com/collections/qghhqm/pointsofsignificance |
| How to represent data | Points of View | Nature Methods collection | http://blogs.nature.com/methagora/2013/07/data-visualization-points-of-view.html |
| How to represent data | DEFAKATOR | Détecter des graphiques trompeurs | https://www.youtube.com/watch?v=crTt-QIyS-o |
| Collective result table | Table to collect and compare trainee’s results | <tinyurl.com/dubii20-randnumstat> |
| Topics | Description | Duration | Material |
|---|---|---|---|
| Debrief session 1 & 2 - part I | R code: data structures, function usage, plots | 20’ | live |
| Practical part I | simulated data | 30’ | [mean-comparison-test_random-numbers.html] [mean-comparison-test_random-numbers.pdf] [mean-comparison-test_random-numbers.Rmd] |
| Debrief session 1 & 2 - part II | basic statistics | 20’ | [pdf] |
| Coffee break | 15’ | ||
| Practical part II | industrialization of hypotheses tests | 45’ | [randnum.R] |
| Statistics on omics data - part I | multiple testing issue | 20’ | [pdf] |
| Practical part III | correction for multiple testing | 15’ | same as above |
| Statistics on omics data - part II | parameters estimation issue | 15’ | [pdf] |
Contents: clustering/
| Topics | Type | Duration | Material |
|---|---|---|---|
| Clustering: rappel de la séance précédente | Slides | [html] [Rmd] | |
| Analyse en composante principale | Diapos | [html] [Rmd] | |
| Practical: data preparation (Pavkovic, 2019) | Notebook code | [Rmd] | |
| Practical: data preparation (Pavkovic, 2019) - UUO proteome dataset | Notebook report | [html] | |
| Practical: data preparation (Pavkovic, 2019) - UUO transcriptome dataset | Notebook report | [html] | |
| Practical: data preparation (Pavkovic, 2019) - FA proteome dataset | Notebook report | [html] | |
| Practical: data preparation (Pavkovic, 2019) - FA transcriptome dataset | Notebook report | [html] | |
| Practical: data exploration (Pavkovic, 2019) | Exercises | [html] | |
| R scripts (click on script, then “Raw”) | R scripts | [github] | |
| Report template with nice yaml header | Rmd | [Rmd] [html] | |
| Report (partial): PCA of proteomics data | Practical | [html] [Rmd] |
Topics for 2020-06-03
Descriptive statistics: commented solutions of yesterday’s exercise (15’)
PCA: slides (15’)
Embedding your script in an R markdown report: demo (20’)
Practicals : clustering (20’)
| Topics | Type | Duration | Material |
|---|---|---|---|
| What was not done yesterday | mixed | who knows | see previous course |
| Enrichment analysis | html | who knows | [html] |
| Enrichment analysis | Rmd | who knows | [Rmd] |
| Topics | Type | Material |
|---|---|---|
| Starter: brain-learning exercise | Exercise | [pdf] [Quizz] |
| Mutlivariate analysis - Intro | Lecture | [pdf] |
| Apprentissage automatique | Lecture | [pdf] |
| TCGA Breast Invasive Cancer dataset | Practical | [html] [Rmd] |