Goal of this practical

This practical aims at showing how to analyze functional enrichment on a list of genes with gProfiler.

Study case : The Cancer Gene Atlas (TCGA) Breast Invasive Cancer (BIC) data

Description : https://du-bii.github.io/study-cases/Homo_sapiens/TCGA_study-case/

Load (clusters of differentially) expressed genes

First, you need to load the gene list, which can be found on the github repository, in the following file.

#### Load the gene list ####
message("Loading gene list")
## URL of the data folder on github
data_folder_url <- 'https://raw.githubusercontent.com/DU-Bii/module-3-Stat-R/master/stat-R_2021/data/TCGA_BIC_subset/'

gene_list_file <- "BIC_edgeR_DEG_top_1000_geneIDs.txt"

## Trick : the file is read as a one-column table, and we transform it to a vector with unlist()
gene_list <- unlist(read.delim(file = file.path(data_folder_url, gene_list_file), header = FALSE))

## Check the length of the gene list
length(gene_list)
[1] 1000

Gene list functional enrichment analysis with gost

  1. Use the gost() function to get the enrichment results.
  • Note: we use FDR as multiple testing correction.
  1. Visualize the results as an interactive plot

We can now tune the parameters to get some additional information about the evidence code and the list of genes at the intersection between the query set and the functional classes. Note that this generates a table with very large columns.

  1. Get a web link to share the results with your coworkers

  2. Make a static plot

  3. Highlight the GO:0071840 and HPA:0300000 terms

  4. Produce a custom table with choosen results

If you feel adventurous, you can try to do the same with multiple gene lists (for instance, clusters from the previous practical).

Save your session info

For the sake of traceability, store the specifications of your R environment in the report, with the command sessionInfo(). This will indicate the version of R as well as of all the libraries used in this notebook.

sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /shared/ifbstor1/software/miniconda/envs/r-4.0.2/lib/libopenblasp-r0.3.10.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] gprofiler2_0.2.0 knitr_1.30      

loaded via a namespace (and not attached):
 [1] pillar_1.5.1      compiler_4.0.2    tools_4.0.2       digest_0.6.27     viridisLite_0.3.0 jsonlite_1.7.2    evaluate_0.14     lifecycle_1.0.0   tibble_3.1.0      gtable_0.3.0      debugme_1.1.0     pkgconfig_2.0.3   rlang_0.4.10      DBI_1.1.1         yaml_2.2.1        xfun_0.20        
[17] httr_1.4.2        stringr_1.4.0     dplyr_1.0.5       htmlwidgets_1.5.3 generics_0.1.0    vctrs_0.3.6       grid_4.0.2        tidyselect_1.1.0  data.table_1.14.0 glue_1.4.2        R6_2.5.0          fansi_0.4.2       plotly_4.9.3      rmarkdown_2.5     tidyr_1.1.3       ggplot2_3.3.3    
[33] purrr_0.3.4       magrittr_2.0.1    scales_1.1.1      ellipsis_0.3.1    htmltools_0.5.1.1 assertthat_0.2.1  colorspace_2.0-0  utf8_1.2.1        stringi_1.5.3     lazyeval_0.2.2    munsell_0.5.0     crayon_1.4.1