top of page

Data sources and processing


GENES
The Cancer Genome Atlas (TCGA).
 All TCGA gene expression data (batch effects normalized mRNA data) were sourced from UCSC Xena (here). Cutoff values for data discretization (required for JSD calculation) are log2(10+1) and log2(1000+1). Processing is done in matrinetR.

The Genotype-Tissue Expression project (GTEx). All GTEx gene expression data (Toil normalized mRNA data) were sourced from UCSC Xena (here). Cutoff values for data discretization (required for JSD calculation) are log2(10+1) and log2(1000+1). Processing is done in matrinetR.


PROTEINS
The Human Protein Atlas (THPA).
 All THPA staining profile data (Pathology data) were sourced from THPA Downloadable data page (here). Data are compositional and no discretization is applied (nor continuous measures calculated).

Matrisome DB. All Matrisome DB data (Confidence Score data) were courtesy of A. Naba (here). Since Confidence Score values are a single point/gene/study, we augmented this value as follows to allow for calculations: 1) if a single value per tissue per gene is found, then we generate 10 values normally distributed in 3 SD from that value (taken as the mean); 2) otherwise, we generate 10 normally distributed values per gene/tissue that span from the min found to the max found. 
Cutoff values for data discretization (required for JSD calculation) are log2(10+1) and log2(1000+1). Processing is done in matrinetR.


SINGLE CELLS
Tabula Sapiens.
 All Tabula Sapiens data (single-cell RNAseq data) were sourced from the Tabula Sapiens hub (here). Epithelial, endothelial and stromal cell collections (as Seurat objects) were downloaded and rearranged into a single data list before submitting to matrinetR (pipeline available on request). Cutoff values for data discretization (required for JSD calculation) are 1 and 4. Processing is done in matrinetR.

The Human Protein Atlas single cell data (scTHPA). All THPA data (single cell RNAseq type tissue cluster data, using pTPM as expression value) were sourced from THPA Downloadable data page (
here). Cells are grouped by type and tissue, and clusters are not considered in the current implementation. Data were downloaded and rearranged into a single data list before submitting to matrinetR (pipeline available on request). Cutoff values for data discretization (required for JSD calculation) are 100 and 400. Processing is done in matrinetR.



Further data and tools


MatrixDB. All ECM interactions are sourced from MatrixDB (here) and interactions involving at least one partner for which a gene name is not available are removed. The resulting network model is provided in matrinetR.

Matrisome annotations. All matrisome annotations are downloaded from MSigDB (here) and applied to the MatrixDB model. The resulting network model is provided in matrinetR.

MatrinetR. The R package needed to build data formats compatible with MatriNet and to deploy and expand MatriNet locally. MatrinetR is freely available for download from our GitHub page (here).

 

bottom of page