Genes are pleiotropic and getting a better knowledge of their function requires a comprehensive characterization of their mutants. Within the ProMetIS project, a multi-level dataset has been generated, which combines phenomic, proteomic and metabolomic acquisitions from liver tissues of a mouse knock-out model lacking the Lat (linker for activation of T cells) gene (Imbert et al. 2021). The data and processing code are publicly available in the ProMetIS R package to ensure accessibility, interoperability, and reusability.
Here, we propose to perform a statistical analysis of this dataset and characterize the impact of the deletion of the Lat gene.
Sys.setlocale("LC_CTYPE", "French")
## Warning in Sys.setlocale("LC_CTYPE", "French"): using locale code page other
## than 65001 ("UTF-8") may cause problems
## [1] "French_France.1252"
The multi-omics dataset is stored in the ProMetIS package as three tabulated files (dataMatrix.tsv, sampleMetadata.tsv, and variableMetadata.tsv) for each of the 9 datasets (see the Appendix section below):
lat_mset_dir.c <- ProMetIS::aggregated_dir.c("LAT")
The whole multi-omics dataset can be loaded as a MultiDataSet
object (Hernandez-Ferrer et al. 2016),
by using the phenomis::reading
method from the phenomis package (Imbert et al. 2021):
lat.mset <- phenomis::reading(lat_mset_dir.c, report.c = "none", output.c = "set")
Let us view the content of this multi-dataset:
lat.mset
## Object of class 'MultiDataSet'
## . assayData: 9 elements
## . metabolomics_liver_c18hypersil_pos: 5665 features, 28 samples
## . metabolomics_liver_hilic_neg: 2866 features, 28 samples
## . metabolomics_plasma_c18acquity_neg: 1584 features, 28 samples
## . metabolomics_plasma_c18acquity_pos: 6104 features, 28 samples
## . metabolomics_plasma_c18hypersil_pos: 4788 features, 28 samples
## . metabolomics_plasma_hilic_neg: 3131 features, 28 samples
## . preclinical: 167 features, 28 samples
## . proteomics_liver: 2098 features, 28 samples
## . proteomics_plasma: 419 features, 24 samples
## . featureData:
## . metabolomics_liver_c18hypersil_pos: 5665 rows, 34 cols (set, ..., biosigner)
## . metabolomics_liver_hilic_neg: 2866 rows, 34 cols (set, ..., biosigner)
## . metabolomics_plasma_c18acquity_neg: 1584 rows, 24 cols (set, ..., biosigner)
## . metabolomics_plasma_c18acquity_pos: 6104 rows, 24 cols (set, ..., biosigner)
## . metabolomics_plasma_c18hypersil_pos: 4788 rows, 34 cols (set, ..., biosigner)
## . metabolomics_plasma_hilic_neg: 3131 rows, 34 cols (set, ..., biosigner)
## . preclinical: 167 rows, 15 cols (set, ..., biosigner)
## . proteomics_liver: 2098 rows, 14 cols (set, ..., biosigner)
## . proteomics_plasma: 419 rows, 14 cols (set, ..., biosigner)
## . rowRanges:
## . metabolomics_liver_c18hypersil_pos: NO
## . metabolomics_liver_hilic_neg: NO
## . metabolomics_plasma_c18acquity_neg: NO
## . metabolomics_plasma_c18acquity_pos: NO
## . metabolomics_plasma_c18hypersil_pos: NO
## . metabolomics_plasma_hilic_neg: NO
## . preclinical: NO
## . proteomics_liver: NO
## . proteomics_plasma: NO
## . phenoData:
## . metabolomics_liver_c18hypersil_pos: 28 samples, 4 cols (gene, ..., sex)
## . metabolomics_liver_hilic_neg: 28 samples, 4 cols (gene, ..., sex)
## . metabolomics_plasma_c18acquity_neg: 28 samples, 4 cols (gene, ..., sex)
## . metabolomics_plasma_c18acquity_pos: 28 samples, 4 cols (gene, ..., sex)
## . metabolomics_plasma_c18hypersil_pos: 28 samples, 4 cols (gene, ..., sex)
## . metabolomics_plasma_hilic_neg: 28 samples, 4 cols (gene, ..., sex)
## . preclinical: 28 samples, 10 cols (gene, ..., impc_phenotype)
## . proteomics_liver: 28 samples, 4 cols (gene, ..., sex)
## . proteomics_plasma: 24 samples, 4 cols (gene, ..., sex)
Questions:
How many datasets are included?
How many technologies?
How many tissues?
How many samples?
How many features?
We now restrict the dataset to the liver samples:
# subsetting the MultiDataSet to the datasets containing 'liver' in their name
latliv.mset <- lat.mset[, grep("liver", names(lat.mset), value = TRUE)]
and to the annotated metabolomics features only:
# for each dataset
for (set.c in grep("metabolomics", names(latliv.mset), value = TRUE)) {
# extracting the ExpressionSet
eset <- latliv.mset[[set.c]]
# restricting to the features with a name
eset <- eset[Biobase::fData(eset)[, "name"] != "", ]
# updating the MultiDataSet
stopifnot(methods::validObject(eset))
latliv.mset <- MultiDataSet::add_eset(latliv.mset,
eset,
dataset.type = set.c,
GRanges = NA,
overwrite = TRUE,
warnings = FALSE)
}
We finally rename the datasets ‘proteo’, ‘metabo_c18’ and ‘metabo_hilic’:
latliv_oldname.mset <- latliv.mset
# creating a new MultiDataSet to store the renamed ExpressionSets
latliv.mset <- MultiDataSet::createMultiDataSet()
for (set.c in names(latliv_oldname.mset)) {
# extracting the ExpressionSet
eset <- latliv_oldname.mset[[set.c]]
# including the renamed ExpressionSet
latliv.mset <- MultiDataSet::add_eset(latliv.mset,
eset,
dataset.type = switch(set.c,
proteomics_liver = "proteo",
metabolomics_liver_c18hypersil_pos = "metabo_c18",
metabolomics_liver_hilic_neg = "metabo_hilic"),
GRanges = NA,
overwrite = TRUE,
warnings = FALSE)
}
# re-ordering the datasets
latliv.mset <- latliv.mset[, c("proteo", "metabo_c18", "metabo_hilic")]
We have now the final multi-dataset which we will study in the rest of the session. Let us first have a look at the summary:
latliv.mset
## Object of class 'MultiDataSet'
## . assayData: 3 elements
## . proteo: 2098 features, 28 samples
## . metabo_c18: 138 features, 28 samples
## . metabo_hilic: 199 features, 28 samples
## . featureData:
## . proteo: 2098 rows, 14 cols (set, ..., biosigner)
## . metabo_c18: 138 rows, 34 cols (set, ..., biosigner)
## . metabo_hilic: 199 rows, 34 cols (set, ..., biosigner)
## . rowRanges:
## . proteo: NO
## . metabo_c18: NO
## . metabo_hilic: NO
## . phenoData:
## . proteo: 28 samples, 4 cols (gene, ..., sex)
## . metabo_c18: 28 samples, 4 cols (gene, ..., sex)
## . metabo_hilic: 28 samples, 4 cols (gene, ..., sex)
Questions:
How many datasets are included?
How many technologies?
How many tissues?
How many samples?
How many features?
Note that the phenomis::inspecting
method may be used to restrict the datasets to common samples only:
latliv.mset <- MultiDataSet::commonSamples(latliv.mset)
Finally, we extract the vectors of the our two factors of interest: the ‘gene’ (‘LAT’ or ‘WT’) and ‘sex’ (‘M’ or ‘F’)
# getting one of the sampleMetadata (e.g. the one from the 'proteo' dataset)
pdata.df <- Biobase::pData(latliv.mset[["proteo"]])
# getting the two vectors
gene.vc <- pdata.df[, "gene"]
print(table(gene.vc))
## gene.vc
## LAT WT
## 13 15
sex.vc <- pdata.df[, "sex"]
print(table(sex.vc))
## sex.vc
## F M
## 14 14
The phenomis::inspecting
method from the phenomis package applied to a MultiDataSet (or ExpressionSet) provides
a numerical and graphical summary of each dataset:
latliv.mset <- phenomis::inspecting(latliv.mset, report.c = "none")
Questions:
What shows each graphic?
How many missing values are present in the datasets?
How many zero values are present in the datasets?
Have the intensities been log transformed?
Are there outliers?
We continue our exploration of the individual datasets by focusing on the Principal Component Analysis.
The ropls::opls
method from the ropls package enables to build the PCA models for each dataset from a MultiDataSet object (Thévenot et al. 2015):
latliv.pca <- ropls::opls(latliv.mset, info.txtC = "none")
Questions:
- For each dataset, how many components contribute to more than 10% of the total variance?
Let us study if a separation between the genotypes is visible on the score-plot:
ropls::plot(latliv.pca, typeVc = "x-score",
parAsColFcVn = gene.vc)
Questions:
Are the samples from the two genotypes clustered on the score-plot?
Which omic approach gives the best separation?
What happens with the protein data?
Is there any separation according to ‘sex’?
The MultiDataSet object with new columns for the PCA scores (respectively, loadings) included in the sampleMetadata (respectively, variableMetadata) can be extracted from the PCA model object with the ropls::getMset
method:
latliv.mset <- ropls::getMset(latliv.pca)
We now would like to assess how much information about the gene is contained in each block. We therefore perform a univariate analysis. Since we have observed that there is also a variation according to sex in the datasets, we want to test the effects of gene, sex and their interaction simultaneously.
Questions:
What are the tests available with the
phenomis::hypotesting
method?Which one would you suggest?
What kind of correction for multiple testing are available? What are the default method and threshold?
We now perform a 2-way ANOVA with interaction, with the default Benjamini and Hochberg control of the False Discovery Rate at the 5% threshold (Benjamini and Hochberg 1995).
latliv.mset <- phenomis::hypotesting(latliv.mset,
factor_names.vc = c("gene", "sex"),
test.c = "anova2waysInter",
report.c = "none")
Questions:
Are there features with significant difference between means according to the genotype in all datasets?
Are there features with significant difference between means according to the sex in all datasets?
Are there interactions between genotype and sex?
What would be the result of a single Student’s t-test of a significant difference between means according to the genotype?
Partial Least Squares - Discriminant Analysis is a machine learning approach which is popular in the field of multivariate analysis of mass spectrometry data (Trygg, Holmes, and Lundstedt 2007). It is a latent variable approach based on the maximisation of the covariance between the components and the response (Wold, Sjöström, and Eriksson 2001).
The ropls::opls
method from the ropls package enables to build PLS models by specifying the response as the y
parameter:
latliv.plsda <- ropls::opls(latliv.mset,
y = "gene",
info.txtC = "none")
Questions:
How many components were computed?
What does the top right graphic tell you? Are the models significant?
What is the predictive performance of the models?
Are the results in agreement with the univariate analysis?
The MultiDataSet object with new columns for the PLS-DA scores (respectively, Variable Importance in Projection values) included in the sampleMetadata (respectively, variableMetadata) can be extracted from the PLS-DA model object with the ropls::getMset
method:
latliv.mset <- ropls::getMset(latliv.plsda)
Note: An Orthogonal Partial Least Squares - Discriminant Analysis (OPLS-DA) modeling can be performed with the method by setting the orthI argument to NA (instead of the 0 default value)
latliv.oplsda <- ropls::opls(latliv.mset,
y = "gene",
orthoI = NA,
info.txtC = "none")
## Warning: OPLS: number of predictive components ('predI' argument) set to 1
## Warning: OPLS: number of predictive components ('predI' argument) set to 1
## Warning: OPLS: number of predictive components ('predI' argument) set to 1
Many statistical approaches are available for integrative analysis of multi-omics datasets (Pierre-Jean et al. 2019)(Huang, Chaudhary, and Garmire 2017)(Cantini et al. 2021). Here we provide two examples of (unsupervised) multiple co-inertia analysis (Meng et al. 2014) and supervised multi-block PLS-DA (Rohart et al. 2017).
Multiple Co-Inertia Analysis (MCIA) is a multi-omics exploratory data analysis technique (Meng et al. 2016). The datasets are projected into the same dimensional space by defining both ‘global’ and ‘block-specific’ scores (and loadings), and maximizing the sum squared covariance between them (Meng et al. 2014).
Exporting the data matrices:
latliv_mn.ls <- MultiDataSet::as.list(latliv.mset)
Performing MCIA:
latliv.mcia <- omicade4::mcia(latliv_mn.ls)
Plotting the results (samples are colored by genotype):
require(ade4)
## Le chargement a nécessité le package : ade4
omicade4::plot.mcia(latliv.mcia, axes = 1:2,
phenovec = gene.vc,
sample.lab = FALSE)
Questions:
What is shown by each graphic?
Do you see any clustering of the two genotypes?
What about performing the same analysis with the full metabolomics datasets (i.e., including also the non-annotated variables?)
# restricting the full LAT MultiDataSet to the liver datasets
latliv_full.mset <- lat.mset[, grep("liver", names(lat.mset), value = TRUE)]
# renaming the 3 datasets
latliv_full_oldname.mset <- latliv_full.mset
latliv_full.mset <- MultiDataSet::createMultiDataSet()
for (set.c in names(latliv_full_oldname.mset)) {
eset <- latliv_full_oldname.mset[[set.c]]
latliv_full.mset <- MultiDataSet::add_eset(latliv_full.mset,
eset,
dataset.type = switch(set.c,
proteomics_liver = "proteo",
metabolomics_liver_c18hypersil_pos = "metabo_c18",
metabolomics_liver_hilic_neg = "metabo_hilic"),
GRanges = NA,
overwrite = TRUE,
warnings = FALSE)
}
# re-ordering the datasets
latliv_full.mset <- latliv_full.mset[, c("proteo", "metabo_c18", "metabo_hilic")]
# MCIA modeling
latliv_full_mn.ls <- MultiDataSet::as.list(latliv_full.mset)
latliv_full.mcia <- omicade4::mcia(latliv_full_mn.ls)
# MCIA results
require(ade4)
omicade4::plot.mcia(latliv_full.mcia, axes = 1:2,
phenovec = gene.vc,
sample.lab = FALSE)
# Note: the variables most contributing to the (second) axis may be viewed with:
# omicade4::selectVar(latliv_full.mcia,
# a1.lim = c(-Inf, Inf), a2.lim = c(1, Inf))
Questions:
Conclusion?
What about coloring the sample plot according to ‘sex’?
omicade4:::splot.sample.mcia(latliv_full.mcia,
phenovec = sex.vc,
sample.lab = FALSE)
The Data Integration Analysis for Biomarker discovery using a Latent component method for Omics studies (DIABLO) method extends sparse PLS-Discriminant Analysis to multi-omics analyses (Singh et al. 2019). It is based on sparse generalized canonical correlation analysis (sGCCA) (Tenenhaus et al. 2014). sGCCA is a multivariate dimension reduction technique that uses singular value decomposition and selects co-expressed (correlated) variables from several omics datasets. sGCCA maximizes the covariance between linear combinations of variables (latent component scores). The selection of the correlated molecules across omics levels is performed internally in sGCCA with L1–penalization on the variable coefficient vector defining the linear combinations. To extend sGCCA for a classification framework, one omics dataset is substituted with a dummy indicator matrix Y. In addition, the L1 penalty parameter was replaced by the number of variables to select in each dataset and each component (Singh et al. 2019). The mixOmics::block.splsda
is available in the mixOmics package Rohart et al. (2017).
We first extract the list of (transposed) data matrices:
exprs.ls <- MultiDataSet::as.list(latliv.mset)
texprs.ls <- lapply(exprs.ls, t)
We build the design matrix defining the link between the blocks. The weights were set to 0.1 between the blocks of omics and to 1 between the blocks of omics and the response block.
design.mn <- matrix(0.1,
ncol = length(exprs.ls),
nrow = length(exprs.ls),
dimnames = list(names(exprs.ls),
names(exprs.ls)))
diag(design.mn) <- 0
We next determine the optimal number of components:
latliv_5comp.blocksplsda <- mixOmics::block.splsda(X = texprs.ls,
Y = factor(gene.vc),
design = design.mn,
ncomp = 5)
## Design matrix has changed to include Y; each block will be
## linked to Y.
latliv_perf.blocksplsda <- mixOmics::perf(latliv_5comp.blocksplsda,
validation = 'Mfold', folds = 5, nrepeat = 10)
plot(latliv_perf.blocksplsda)
blocksplsda_comp.i <- latliv_perf.blocksplsda$choice.ncomp$WeightedVote["Overall.BER", "centroids.dist"]
message("Number of selected components: ", blocksplsda_comp.i)
## Number of selected components: 2
Questions:
- How many components would you select?
We now specify how many features are to be selected from each dataset for the 2 components by using cross-validation on a grid of values. Note: the code below will not be run in this session for the sake of time.
# grid for variable selection
varsel_grid.ls <- list(proteo = c(seq(20, 100, 10)),
metabo_c18 = c(seq(10, 60, 10)),
metabo_hilic = c(seq(10, 60, 10)))
tune.blocksplsda = mixOmics::tune.block.splsda(X = texprs.ls,
Y = factor(gene.vc),
design = design.mn,
ncomp = blocksplsda_comp.i,
test.keepX = varsel_grid.ls,
validation = 'Mfold',folds=5, nrepeat=50,
cpus = cpu, dist = "centroids.dist")
# optimum
varsel.ls = tune.blocksplsda$choice.keepX
# number of selected for each block and components:
varsel.ls
varsel.ls <- list(proteo = c(20, 20),
metabo_c18 = c(10, 20),
metabo_hilic = c(60, 60))
We can now build the model:
latliv.blocksplsda <- mixOmics::block.splsda(X = texprs.ls,
Y = factor(gene.vc),
design = design.mn,
scheme = 'horst',
keepX = varsel.ls)
We first look at the individual blocks.
mixOmics::plotIndiv(latliv.blocksplsda,
ind.names = TRUE,
legend = TRUE,
ellipse = TRUE,
col.per.group = ProMetIS::palette.vc()[c("LAT", "WT")])
Questions:
- How do you interpret the results?
mixOmics::plotLoadings(latliv.blocksplsda,
comp = 1,
legend.color = ProMetIS::palette.vc()[c("LAT", "WT")],
contrib = 'max',
method = 'median')
# lapply(within(mixOmics::selectVar(latliv.blocksplsda), rm(comp)),
# function(block_sel.ls) block_sel.ls[["name"]])
Questions:
- How do you interpret the results?
Let us look at the relationships between the block components.
mixOmics::plotDiablo(latliv.blocksplsda)
Questions:
- How do you interpret the results?
mixOmics::circosPlot(latliv.blocksplsda,
cutoff = 0.7,
line = TRUE,
comp = 1,
color.Y = ProMetIS::palette.vc()[c("LAT", "WT")])
Questions:
- How do you interpret the results?
latliv.network <- mixOmics::network(latliv.blocksplsda,
comp = list(proteo = 1,
metabo_c18 = 1,
metabo_hilic = 1),
shape.node = rep("circle", 3),
lty.edge = "solid",
lwd.edge = 3,
color.edge = mixOmics::color.spectral(50),
blocks = c(1, 2, 3), cutoff = 0.75)
latliv.graph <- latliv.network$gR
latliv.graph$layout <- igraph::layout.fruchterman.reingold(latliv.graph,
weights = (1 - abs(igraph::E(latliv.graph)$weight)))
igraph::V(latliv.graph)$size <- 5
plot(latliv.graph)
# knitr::kable(names(igraph::V(latliv.graph)))
Questions:
- What biological interpretation(s) would you suggest?
heatmap.ls <- mixOmics::cimDiablo(latliv.blocksplsda, margins = c(10, 20),
color.Y = ProMetIS::palette.vc()[c("LAT", "WT")],
transpose = TRUE, comp = 1,
size.legend = 0.6, legend.position = "right",
dist.method = c("correlation", "correlation"))
##
## trimming values to [-3, 3] range for cim visualisation. See 'trim' arg in ?cimDiablo
Questions:
What kind of clustering is used?
What additional information is provided by this graphic?
perf.blocksplsda <- mixOmics::perf(latliv.blocksplsda, validation = 'Mfold',
folds = 5, nrepeat = 10,
dist = 'centroids.dist', auc = TRUE)
plot(perf.blocksplsda)
message("Error rate:")
## Error rate:
print(perf.blocksplsda$error.rate)
## $proteo
## centroids.dist
## comp1 0.1428571
## comp2 0.1607143
##
## $metabo_c18
## centroids.dist
## comp1 0.08214286
## comp2 0.06428571
##
## $metabo_hilic
## centroids.dist
## comp1 0.017857143
## comp2 0.003571429
message("AUC:")
## AUC:
print(perf.blocksplsda$auc)
## $comp1
## AUC p-value
## LAT vs WT 0.9993493 6.69088e-05
##
## $comp2
## AUC p-value
## LAT vs WT 0.9999447 6.529187e-05
for (set.c in names(latliv.mset)) {
eset <- latliv.mset[[set.c]]
fdata.df <- Biobase::fData(eset)
loadings.ls <- latliv.blocksplsda[["loadings"]]
loadings.vn <- loadings.ls[[set.c]][, "comp1"]
loadings.vn <- loadings.vn[abs(loadings.vn) > 0]
mixomics.vn <- numeric(nrow(fdata.df))
names(mixomics.vn) <- rownames(fdata.df)
stopifnot(all(names(loadings.vn) %in% names(mixomics.vn)))
mixomics.vn[names(loadings.vn)] <- loadings.vn
fdata.df[, "mixOmics"] <- mixomics.vn
Biobase::fData(eset) <- fdata.df
stopifnot(methods::validObject(eset))
latliv.mset <- MultiDataSet::add_eset(latliv.mset,
eset,
dataset.type = set.c,
GRanges = NA,
overwrite = TRUE,
warnings = FALSE)
}
phenomis::writing(latliv.mset, dir.c = "my_preferred_directory")
Each dataset consists of:
a numeric matrix of intensities (dataMatrix)
a data frame of sample metadata (sampleMetadata)
a data frame of variable metadata (variableMetadata)
Theses 3 tables can be conveniently imported to/exported from R as tabular files:
The following table describes useful Biobase methods for the handling of ExpressionSet objects (see the ‘An introduction to Biobase and ExpressionSets’ documentation from the Biobase package):
Biobase methods | Description |
---|---|
dims(eset) | 2-row numeric matrix of ‘Features’ and ‘Samples’ dimensions |
exprs(eset) | ‘variable times samples’ numeric matrix - dataMatrix |
pData(eset) | sample metadata data frame - sampleMetadata |
fData(eset) | variable metadata data frame - variableMetadata |
sampleNames(eset) | sample names |
featureNames(eset) | variable names |
varLabels(eset) | Column names of the sampleMetadata, pData(eset) |
fvarLabels(eset) | Column names of the variableMetadata, fData(eset) |
Creating an ExpressionSet with Biobase::ExpressionSet
:
lat_dir.c <- ProMetIS::aggregated_dir.c("LAT")
latlivprot_dir.c <- file.path(lat_dir.c, "proteomics_liver")
dataMatrix.mn <- as.matrix(read.table(file.path(latlivprot_dir.c, "dataMatrix.tsv"),
header = TRUE,
row.names = 1,
sep = "\t",
comment.char = "",
quote = "\""))
sampleMetadata.df <- read.table(file.path(latlivprot_dir.c, "sampleMetadata.tsv"),
header = TRUE,
row.names = 1,
sep = "\t",
comment.char = "",
quote = "\"")
variableMetadata.df <- read.table(file.path(latlivprot_dir.c, "variableMetadata.tsv"),
header = TRUE,
row.names = 1,
sep = "\t",
comment.char = "",
quote = "\"")
latlivprot.eset <- Biobase::ExpressionSet(assayData = dataMatrix.mn,
phenoData = Biobase::AnnotatedDataFrame(data = sampleMetadata.df),
featureData = Biobase::AnnotatedDataFrame(data = variableMetadata.df),
experimentData = Biobase::MIAME(title = "lat_proteomics_liver"))
which is equivalent to:
latlivprot.eset <- phenomis::reading(latlivprot_dir.c, output.c = "set")
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 2098 features, 28 samples
## element names: exprs
## protocolData: none
## phenoData
## sampleNames: W616f W617f ... L862f (28 total)
## varLabels: gene mouse_id sex id
## varMetadata: labelDescription
## featureData
## featureNames: Q8R1I1_Cytochrome b-c1 complex . P62852_40S ribosomal
## protein S25 ... Q61553_Fascin (2098 total)
## fvarLabels: set gene ... mixomics (14 total)
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:
Brackets may be used to restrict the ExpressionSet to specific sample or features:
wt.eset <- latlivprot.eset[, grep("W", Biobase::sampleNames(latlivprot.eset))]
Biobase::dims(wt.eset)
## exprs
## Features 2098
## Samples 15
signif.eset <- latlivprot.eset[Biobase::fData(latlivprot.eset)[, "WT.KO_signif"] > 0, ]
Biobase::dims(signif.eset)
## exprs
## Features 275
## Samples 28
Exporting the ExpressionSet in the 3 table format:
phenomis::writing(latlivprot.eset, dir.c = "your_prefered_directory")
The following table describes useful MultiDataSet (and Biobase) methods for the handling of MultiDataSet objects (Hernandez-Ferrer et al. 2016):
MultiDataSet and Biobase methods | Description |
---|---|
MultiDataSet::length(mset) | number of datasets |
MultiDataSet::names(eset) | names of the datasets |
Biobase::dims(mset) | list of 2-element numeric named (‘Features’ and ‘Samples’) vectors of data set dimensions |
MultiDataSet::commonSamples(mset) | multi-dataset restricted to the common samples |
MultiDataSet::as.list(mset) | list of data matrices |
Biobase::pData(mset) | list of sample metadata data frames |
Biobase::fData(mset) | list of variable metadata data frames |
BiocGenerics::subset(mset) | subsetting on common columns from the sampleMetadata and/or variableMetadata |
Creating a MultiDataSet with MultiDataSet::createMultiDataSet
:
lat_dir.c <- ProMetIS::aggregated_dir.c("LAT")
proteo.eset <- phenomis::reading(file.path(lat_dir.c, "proteomics_liver"),
report.c = "none", output.c = "set")
metabo_c18.eset <- phenomis::reading(file.path(lat_dir.c, "metabolomics_liver_c18hypersil_pos"),
report.c = "none", output.c = "set")
metabo_hilic.eset <- phenomis::reading(file.path(lat_dir.c, "metabolomics_liver_hilic_neg"),
report.c = "none", output.c = "set")
latliv.mset <- MultiDataSet::createMultiDataSet()
latliv.mset <- MultiDataSet::add_eset(latliv.mset,
proteo.eset,
dataset.type = "proteomics_liver",
GRanges = NA)
latliv.mset <- MultiDataSet::add_eset(latliv.mset,
metabo_c18.eset,
dataset.type = "metabolomics_liver_c18hypersil_pos",
GRanges = NA)
latliv.mset <- MultiDataSet::add_eset(latliv.mset,
metabo_hilic.eset,
dataset.type = "metabolomics_liver_hilic_neg",
GRanges = NA)
which is equivalent to:
latliv.mset <- phenomis::reading(lat_dir.c,
subsets.vc = c("proteomics_liver",
"metabolomics_liver_c18hypersil_pos",
"metabolomics_liver_hilic_neg"),
report.c = "none",
output.c = "set")
Brackets may be used to restrict the MultiDataSet to specific samples or datasets:
wt.mset <- latliv.mset[grep("W", Biobase::sampleNames(latliv.mset)[["proteomics_liver"]], value = TRUE), ]
Biobase::dims(wt.mset)
## $proteomics_liver
## Features Samples
## 2098 15
##
## $metabolomics_liver_c18hypersil_pos
## Features Samples
## 5665 15
##
## $metabolomics_liver_hilic_neg
## Features Samples
## 2866 15
metabo.mset <- latliv.mset[, c("metabolomics_liver_c18hypersil_pos", "metabolomics_liver_hilic_neg")]
names(metabo.mset)
## [1] "metabolomics_liver_c18hypersil_pos" "metabolomics_liver_hilic_neg"
Note: In case of the restriction to a single dataset, the single bracket (as shown above) will outpout a MultiDataSet object. To get an ExpressionSet, either add the drop = TRUE argument or use the double brackets:
proteo.eset <- latliv.mset[["proteomics_liver"]]
Subsetting based on (common) columns from the sampleMetadata and/or variableMetadata may be performed with BiocGenerics::subset
:
female.mset <- BiocGenerics::subset(latliv.mset, , sex == "F")
# or
signif.mset <- BiocGenerics::subset(latliv.mset, WT.KO_signif > 0, )
Exporting the MultiDataSet into the 3 table format (each dataset will be written as 3 tables in a specific subfolder):
phenomis::writing(latliv.mset, dir.c = "your_prefered_directory")