Cas d'étude pour le diplôme universitaire en bioinformatique intégrative (DU-Bii)
This page describes a study case based on data from The Cancer Genome Atlas (TCGA; https://cancergenome.nih.gov/). This dataset contains more than 11,285 samples from patients suffering of a wide variety of cancer types. We use some subsets of this huge dataset for different courses of the Diplôme Universitaire en Bioinformatique Intégrative (DU-Bii).
The full datasets are available in the NCBI databases (Gene Expression Ombinus, Short Read Archives).
For the sake of simplicity, we took benefit of pre-processed data made available by Ron Shamir’s team.
We provide here
G. Ciriello et al. (2015). Comprehensive Molecular Portraits of Invasive Lobular Breast Cancer, Cell. 163: 506–519. https://doi.org/10.1016/j.cell.2015.09.033.
Clinical annotation for the samples: http://acgt.cs.tau.ac.il/multi_omic_benchmark/data/clinical.zip
Note: the data is pre-normalised (library scaling + log2 transformation)
Selected study case: Breast cancer dataset
Relevant columns for classification: 2 markers used by clinicians to assign tissues to cancer subtypes: ER_Status_nature2012 and HER2_Final_Status_nature2012
We downloaded the TCGA raw counts from the Recount2 database, and applied the following preprocessing steps:
The preprocessing was done with an R markdown file, which enables anyone to reproduce the results and understand each step.