This practical aims at showing how to analyze functional enrichment on a list of genes with gProfiler
.
Description : https://du-bii.github.io/study-cases/Homo_sapiens/TCGA_study-case/
First, you need to load the gene list, which can be found on the github repository, in the following file.
#### Load the gene list ####
message("Loading gene list")
## URL of the data folder on github
<- 'https://raw.githubusercontent.com/DU-Bii/module-3-Stat-R/master/stat-R_2021/data/TCGA_BIC_subset/'
data_folder_url
<- "BIC_edgeR_DEG_top_1000_geneIDs.txt"
gene_list_file
## Trick : the file is read as a one-column table, and we transform it to a vector with unlist()
<- unlist(read.delim(file = file.path(data_folder_url, gene_list_file), header = FALSE))
gene_list
## Check the length of the gene list
length(gene_list)
[1] 1000
gost()
function to get the enrichment results.We can now tune the parameters to get some additional information about the evidence code and the list of genes at the intersection between the query set and the functional classes. Note that this generates a table with very large columns.
Get a web link to share the results with your coworkers
Make a static plot
Highlight the GO:0071840
and HPA:0300000
terms
Produce a custom table with choosen results
If you feel adventurous, you can try to do the same with multiple gene lists (for instance, clusters from the previous practical).
For the sake of traceability, store the specifications of your R environment in the report, with the command sessionInfo()
. This will indicate the version of R as well as of all the libraries used in this notebook.
sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /shared/ifbstor1/software/miniconda/envs/r-4.0.2/lib/libopenblasp-r0.3.10.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] gprofiler2_0.2.0 knitr_1.30
loaded via a namespace (and not attached):
[1] pillar_1.5.1 compiler_4.0.2 tools_4.0.2 digest_0.6.27 viridisLite_0.3.0 jsonlite_1.7.2 evaluate_0.14 lifecycle_1.0.0 tibble_3.1.0 gtable_0.3.0 debugme_1.1.0 pkgconfig_2.0.3 rlang_0.4.10 DBI_1.1.1 yaml_2.2.1 xfun_0.20
[17] httr_1.4.2 stringr_1.4.0 dplyr_1.0.5 htmlwidgets_1.5.3 generics_0.1.0 vctrs_0.3.6 grid_4.0.2 tidyselect_1.1.0 data.table_1.14.0 glue_1.4.2 R6_2.5.0 fansi_0.4.2 plotly_4.9.3 rmarkdown_2.5 tidyr_1.1.3 ggplot2_3.3.3
[33] purrr_0.3.4 magrittr_2.0.1 scales_1.1.1 ellipsis_0.3.1 htmltools_0.5.1.1 assertthat_0.2.1 colorspace_2.0-0 utf8_1.2.1 stringi_1.5.3 lazyeval_0.2.2 munsell_0.5.0 crayon_1.4.1