Analyse statistique avec R
The aim of this module is to provide you with the bases of R programming and to present you some statistical concepts for high-throuput data.
To follow this course, prior knowledge is expected on:
Link to the prerequisites:
https://du-bii.github.io/accueil/activites_preparatoires/
Name | Role(s) |
---|---|
Claire Vandiedonck | Coordinator, teacher |
Jacques van Helden | Coordinator, teacher |
Guillaume Achaz | Teacher |
Anne Badel | Teacher |
Magali Berland | Teacher |
Antoine Bridier-Nahmias | Teacher |
Olivier Sand | Teacher |
Natacha Cerisier | Helper |
Doc | Description | URL |
---|---|---|
Git pages | Web site of the course (to see the supports) | https://du-bii.github.io/module-3-Stat-R/stat-R_2020/ |
Git repo | Repository enabling to download or clone the teaching material on your computer | https://github.com/DU-Bii/module-3-Stat-R |
RStudio at IFB cluster | link to RStudio on th IFB cluster | https://rstudio.cluster.france-bioinformatique.fr/ |
Moodle | link to the Moddle web page of the DUBii with your ENT account] | https://moodlesupd.script.univ-paris-diderot.fr/course/view.php?id=10629 |
Doc | Description | URL |
---|---|---|
Cheet sheats | RStudio Cheet sheats | https://rstudio.com/resources/cheatsheets/ |
Tutorial | Tutorial for Beginners by E. Paradis - English version | https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf |
Tutorial | Tutorial for Beginners by E. Paradis - French version | https://cran.r-project.org/doc/contrib/Paradis-rdebuts_fr.pdf |
R style guide | Google’s R Style Guide | https://google.github.io/styleguide/Rguide.html |
Topics | Duration | Material | |
---|---|---|---|
Slides for the whole session | 1/2 day | [pdf] | |
wooclap Poll | 5’ | [html)] | |
Start R | 20’ | [link to RStudio on th IFB cluster] [start-R.html] [start-R.Rmd] | |
Validation of the prerequisites: quizz on Moodle | 15’ + 20’ | [with your ENT account] [with password: dubii2020] | |
Script | Live demo on the board | [R] | |
Basic R structures (matrices, data frames, factors and lists) | 45’ | [basic_R-structures.html] [basic_R-structures.Rmd] [Factors_in_R.html] | [Factors_in_R.Rmd] |
Coffee break | 15’ | ||
Intro to programming with R | 35’ | [html] [Rmd] | |
R markdown | 45’ | Demo and [COVID-19_HK.nb.html] |
Type | Description | Links |
---|---|---|
Slides | Slides for the whole session | [pdf] |
R Scripts | Scripts used for the slides | [R] |
Shiny app | Shiny app to explore sampling fluctuation | http://shiny.calpoly.sh/Sampling_Distribution/ |
Practical | Descriptive statistics | [html] [Rmd] |
R package | Package to visualise effect sizes | https://github.com/ACCLAB/dabestr |
Document | Memo on correlation and regression | [pdf] |
Practical | A first data analysis with R | [html] [Rmd] |
Links mentionned during the session:
Topic | Title | Description | Link |
---|---|---|---|
Basic stats explained to biologists | Points of Significance | Nature Methods collection | https://www.nature.com/collections/qghhqm/pointsofsignificance |
How to represent data | Points of View | Nature Methods collection | http://blogs.nature.com/methagora/2013/07/data-visualization-points-of-view.html |
How to represent data | DEFAKATOR | Détecter des graphiques trompeurs | https://www.youtube.com/watch?v=crTt-QIyS-o |
Collective result table | Table to collect and compare trainee’s results | <tinyurl.com/dubii20-randnumstat> |
Topics | Description | Duration | Material |
---|---|---|---|
Debrief session 1 & 2 - part I | R code: data structures, function usage, plots | 20’ | live |
Practical part I | simulated data | 30’ | [mean-comparison-test_random-numbers.html] [mean-comparison-test_random-numbers.pdf] [mean-comparison-test_random-numbers.Rmd] |
Debrief session 1 & 2 - part II | basic statistics | 20’ | [pdf] |
Coffee break | 15’ | ||
Practical part II | industrialization of hypotheses tests | 45’ | [randnum.R] |
Statistics on omics data - part I | multiple testing issue | 20’ | [pdf] |
Practical part III | correction for multiple testing | 15’ | same as above |
Statistics on omics data - part II | parameters estimation issue | 15’ | [pdf] |
Contents: clustering/
Topics | Type | Duration | Material |
---|---|---|---|
Clustering: rappel de la séance précédente | Slides | [html] [Rmd] | |
Analyse en composante principale | Diapos | [html] [Rmd] | |
Practical: data preparation (Pavkovic, 2019) | Notebook code | [Rmd] | |
Practical: data preparation (Pavkovic, 2019) - UUO proteome dataset | Notebook report | [html] | |
Practical: data preparation (Pavkovic, 2019) - UUO transcriptome dataset | Notebook report | [html] | |
Practical: data preparation (Pavkovic, 2019) - FA proteome dataset | Notebook report | [html] | |
Practical: data preparation (Pavkovic, 2019) - FA transcriptome dataset | Notebook report | [html] | |
Practical: data exploration (Pavkovic, 2019) | Exercises | [html] | |
R scripts (click on script, then “Raw”) | R scripts | [github] | |
Report template with nice yaml header | Rmd | [Rmd] [html] | |
Report (partial): PCA of proteomics data | Practical | [html] [Rmd] |
Topics for 2020-06-03
Descriptive statistics: commented solutions of yesterday’s exercise (15’)
PCA: slides (15’)
Embedding your script in an R markdown report: demo (20’)
Practicals : clustering (20’)
Topics | Type | Duration | Material |
---|---|---|---|
What was not done yesterday | mixed | who knows | see previous course |
Enrichment analysis | html | who knows | [html] |
Enrichment analysis | Rmd | who knows | [Rmd] |
Topics | Type | Material |
---|---|---|
Starter: brain-learning exercise | Exercise | [pdf] [Quizz] |
Mutlivariate analysis - Intro | Lecture | [pdf] |
Apprentissage automatique | Lecture | [pdf] |
TCGA Breast Invasive Cancer dataset | Practical | [html] [Rmd] |