How it works | matrinet

The founding idea of MatriNet is to estimate parameters describing the "strength" of protein-protein interactions (PPI) within the ECM using a prior network model.

Many network-building protocols, packages and databases exist. We are not among them. We source the list of expected ECM PPI interactions (aka the "matrisome connectome", kuudos to A. Naba) from the expert-curated models in MatrixDB, after removing all interactions involving an element (e.g., a glycosaminoglycan such as Heparin) for which a gene name is not available. Any data source is then confronted against this model and several measures of association between any two elements of an interacting pair are calculated using our freely-available R package matrinetR.

The key part of matrinetR output is the "matrigraph" object, a list containing annotations about the elements of the model in one data frame (the "node_df" element) and the association estimates for each edge of the network model in a second data frame (the "edge_df" element). While MatriNet tools use both the data frames for visualization purposes, the latter element is made available for download in our Download section.

The measures of association calculated by matrinetR are (in bold those featured in current MatriNet version):

- correlation (Cor_C)

- our derivative of the Jensen-Shannon Divergence (JSD), already published in the original MatriNet manuscript

- mutual information (MI_D)

- pairwise distances (PWD) as point-wise sum (sum_D), difference (diff_D) and product (prod_D).

MatriNet shows networks according to any of the above measures (in bold), but users can download matrinetR and explore any further single measure locally. Note that not all measures might be available for all data sources! For example, data from THPA are compositional (frequency of samples staining as "low", "medium" or "high" for any given ECM protein) and thus not suitable for Cor_C, MI_D and PWD calculation. In this case, only JSD is computed. Additionally, not all data sources might provide estimates for the exact same edges! Though the prior network model to be applied is always the same for all data entering our pipeline, matrinetR performs checks for available and missing gene names, available genes without missing values, and available genes with less than 5% of missing values over samples in any group. As different sources might differently pass or fail these checks, minor differences in total network composition might appear.

The measures of association we employ are all meant to highlight a common "behavior" of any two nodes expected to interact in the prior model, though they evaluate different "aspects" of association, as follows:

Cor is the degree to which two variables move in coordination with one another (from Wikipedia). While this is the most intuitive form of association between two variables, it can only capture linear (monotonic) associations and is also easy to fool when the distributions of the two variables differ significantly and in the presence of outliers (and their relative abundance).

JSD is a method of measuring the similarity between two probability distributions (from Wikipedia). While the formulation of the JSD might sound daunting to biologists, it is easy to interpret it as a measure of the association between two variables given that their distributions support each other. In other words, JSD estimates the loss of information that is observed when one variable is used to describe the other. Variables with globally similar distributions (e.g., two genes co-regulated by the same transcription factor) will likely describe each other well - they won't be differently spread across the range of their standardized values - and thus there will be small to no loss of information when one variable is used to describe the other. It follows that when two variables are likely associated the actual JSD value goes to zero (as they don't diverge from one another). As this would be a confusing measure for the concept of "strength", we developed a JSD derivative that moves oppositely to the actual JSD and goes to one when the two variables are strongly associated.

The processed data are then provided to the MatriNet tools (MatriNet GX, LX and CX) for online evaluation and visualization. Tutorials explain further how to use MatriNet.

The MatrinNet pipeline

Current MatriNet data types