Jacques van Helden & Olivier Sand
2020-06-03
Symbol | Meaning |
---|---|
\(g = 6000\) | number of genes |
\(m = 40\) | genes involved in methionine metabolism |
\(n = 5960\) | genes not involved in methionine metabolism |
\(k = 10\) | number of genes in the cluster |
\(x = 6\) | number of methionine genes in the cluster |
Symbol | Meaning | Formula |
---|---|---|
\(C_1\) | choose 10 distinct genes among 6000 | \(C_1 = C_{m+n}^{k} = \frac{6000!}{10!5990!} = 1.65e^{31}\) |
\(C_2\) | choose 6 distinct genes among the 40 involved in methionine | \(C_2 = C_{m}^{x} = \frac{40!}{6!34!} = 3.8e^{6}\) |
\(C_3\) | choose 4 genes among the 5960 which are not involved in methionine | \(C_3 = C_{n}^{k-x} = \frac{5960!}{4!5956!} = 5.2e^{13}\) |
\(C_4\) | choose 6 methionine and 4 non-methionine genes | \(C_4 = C2 \cdot C3 = C_{m}^{x}C_{n}^{k-x} = 2.0e^{20}\) |
Probability to have exactly 6 methionine genes within a selection of 10
\[P(X=6) = \frac{C4}{C1} = \frac{C_{m}^{x}C_{n}^{k-x}}{C_{m+n}^{k}} = \frac{C_{40}^{6}C_{5960}^{4}}{C_{6000}^{10}} = 1.219e^{-11}\]
Probability to have at least 6 methionine genes within a selection of 10
\[P(X \ge 6) = \sum_{i=x}^{k}\frac{C_{m}^{i}C_{n}^{k-i}}{C_{m+n}^{k}} = 1.222e^{-11}\]
Organism: Saccharomyces cerevisiae
Data source: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE89530
The aim of this study is to compare NGS-derived yeast transcriptome profiling (RNA-seq) of wild-type and bdf1-Y187F-Y354F mutant strains after sporulation induction (time points: 0h 4h and 8h)
design: S. cerevisiae wild-type and bdf1-Y187F-Y354F mutant strains were collected 0h, 4h and 8h after sporulation induction in triplicates. mRNA were purified, prepared and sequenced using Illumina HiSeq 2000 sequencer
WT vs mutant at 0h: bdf1_Y187F_Y354F_mutant_0__vs__Wild_type_0_DESeq2_positive_geneIDs.txt
WT vs mutant at 4h: bdf1_Y187F_Y354F_mutant_4__vs__Wild_type_4_DESeq2_positive_geneIDs.txt
All genes are sorted according to some criterion (e.g. differential expression p-value, correlation of expression with other variables, …).
Each graph compares the ranked gene list with one reference class (e.g. one biological process).
Black bars denote genes belonging to the reference class.
The green curve estimates, at each level i, the degree of over-representation of the reference genes in the i top-ranking genes.