Statistics with R – 2020 session

Goals

The aim of this module is to provide you with the bases of R programming and to present you some statistical concepts for high-throuput data.

To follow this course, prior knowledge is expected on:

Link to the prerequisites:

Doc	Description	URL
Git pages	Web site of the course (to see the supports)	https://du-bii.github.io/module-3-Stat-R/stat-R_2020/
Git repo	Repository enabling to download or clone the teaching material on your computer	https://github.com/DU-Bii/module-3-Stat-R
RStudio at IFB cluster	link to RStudio on th IFB cluster	https://rstudio.cluster.france-bioinformatique.fr/
Moodle	link to the Moddle web page of the DUBii with your ENT account]	https://moodlesupd.script.univ-paris-diderot.fr/course/view.php?id=10629

Doc	Description	URL
Cheet sheats	RStudio Cheet sheats	https://rstudio.com/resources/cheatsheets/
Tutorial	Tutorial for Beginners by E. Paradis - English version	https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf
Tutorial	Tutorial for Beginners by E. Paradis - French version	https://cran.r-project.org/doc/contrib/Paradis-rdebuts_fr.pdf
R style guide	Google’s R Style Guide	https://google.github.io/styleguide/Rguide.html

Topics	Duration	Material
Slides for the whole session	1/2 day	[pdf]
wooclap Poll	5’	[html)]
Start R	20’	[link to RStudio on th IFB cluster] [start-R.html] [start-R.Rmd]
Validation of the prerequisites: quizz on Moodle	15’ + 20’	[with your ENT account] [with password: dubii2020]
Script	Live demo on the board	[R]
Basic R structures (matrices, data frames, factors and lists)	45’	[basic_R-structures.html] [basic_R-structures.Rmd] [Factors_in_R.html]	[Factors_in_R.Rmd]
Coffee break	15’
Intro to programming with R	35’	[html] [Rmd]
R markdown	45’	Demo and [COVID-19_HK.nb.html]

Type	Description	Links
Slides	Slides for the whole session	[pdf]
R Scripts	Scripts used for the slides	[R]
Shiny app	Shiny app to explore sampling fluctuation	http://shiny.calpoly.sh/Sampling_Distribution/
Practical	Descriptive statistics	[html] [Rmd]
R package	Package to visualise effect sizes	https://github.com/ACCLAB/dabestr
Document	Memo on correlation and regression	[pdf]
Practical	A first data analysis with R	[html] [Rmd]

Links mentionned during the session:

Topic	Title	Description	Link
Basic stats explained to biologists	Points of Significance	Nature Methods collection	https://www.nature.com/collections/qghhqm/pointsofsignificance
How to represent data	Points of View	Nature Methods collection	http://blogs.nature.com/methagora/2013/07/data-visualization-points-of-view.html
How to represent data	DEFAKATOR	Détecter des graphiques trompeurs	https://www.youtube.com/watch?v=crTt-QIyS-o
Collective result table	Table to collect and compare trainee’s results	<tinyurl.com/dubii20-randnumstat>

Topics	Description	Duration	Material
Debrief session 1 & 2 - part I	R code: data structures, function usage, plots	20’	live
Practical part I	simulated data	30’	[mean-comparison-test_random-numbers.html] [mean-comparison-test_random-numbers.pdf] [mean-comparison-test_random-numbers.Rmd]
Debrief session 1 & 2 - part II	basic statistics	20’	[pdf]
Coffee break	15’
Practical part II	industrialization of hypotheses tests	45’	[randnum.R]
Statistics on omics data - part I	multiple testing issue	20’	[pdf]
Practical part III	correction for multiple testing	15’	same as above
Statistics on omics data - part II	parameters estimation issue	15’	[pdf]

Topics	Type	Material
Clustering: rappel de la séance précédente	Slides	[html] [Rmd]
Analyse en composante principale	Diapos	[html] [Rmd]
Practical: data preparation (Pavkovic, 2019)	Notebook code	[Rmd]
Practical: data preparation (Pavkovic, 2019) - UUO proteome dataset	Notebook report	[html]
Practical: data preparation (Pavkovic, 2019) - UUO transcriptome dataset	Notebook report	[html]
Practical: data preparation (Pavkovic, 2019) - FA proteome dataset	Notebook report	[html]
Practical: data preparation (Pavkovic, 2019) - FA transcriptome dataset	Notebook report	[html]
Practical: data exploration (Pavkovic, 2019)	Exercises	[html]
R scripts (click on script, then “Raw”)	R scripts	[github]
Report template with nice yaml header	Rmd	[Rmd] [html]
Report (partial): PCA of proteomics data	Practical	[html] [Rmd]

Topics for 2020-06-03

Descriptive statistics: commented solutions of yesterday’s exercise (15’)
- Split in rooms depending on level of completion of the exercise
PCA: slides (15’)
Embedding your script in an R markdown report: demo (20’)
Practicals : clustering (20’)

Topics	Type	Duration	Material
What was not done yesterday	mixed	who knows	see previous course
Enrichment analysis	html	who knows	[html]
Enrichment analysis	Rmd	who knows	[Rmd]

Topics	Type	Material
Starter: brain-learning exercise	Exercise	[pdf] [Quizz]
Mutlivariate analysis - Intro	Lecture	[pdf]
Apprentissage automatique	Lecture	[pdf]
TCGA Breast Invasive Cancer dataset	Practical	[html] [Rmd]