Journal list menu

Volume 598, Issue 6 p. 602-620
Review
Open Access

Molecular analysis of the extracellular microenvironment: from form to function

Jade K. Macdonald

Jade K. Macdonald

Department of Cell and Molecular Pharmacology & Experimental Therapeutics, Medical University of South Carolina, Charleston, SC, USA

Search for more papers by this author
Anand S. Mehta

Anand S. Mehta

Department of Cell and Molecular Pharmacology & Experimental Therapeutics, Medical University of South Carolina, Charleston, SC, USA

Search for more papers by this author
Richard R. Drake

Richard R. Drake

Department of Cell and Molecular Pharmacology & Experimental Therapeutics, Medical University of South Carolina, Charleston, SC, USA

Search for more papers by this author
Peggi M. Angel

Corresponding Author

Peggi M. Angel

Department of Cell and Molecular Pharmacology & Experimental Therapeutics, Medical University of South Carolina, Charleston, SC, USA

Correspondence

P. M. Angel, Department of Cell and Molecular Pharmacology & Experimental Therapeutics, Medical University of South Carolina, 173 Ashley Avenue BSB 358, Charleston, SC 29425, USA

Tel: +1 843 792-8410

E-mail: [email protected]

Search for more papers by this author
First published: 21 March 2024
Edited by Lukas Alfons Huber

Abstract

The extracellular matrix (ECM) proteome represents an important component of the tissue microenvironment that controls chemical flux and induces cell signaling through encoded structure. The analysis of the ECM represents an analytical challenge through high levels of post-translational modifications, protease-resistant structures, and crosslinked, insoluble proteins. This review provides a comprehensive overview of the analytical challenges involved in addressing the complexities of spatially profiling the extracellular matrix proteome. A synopsis of the process of synthesizing the ECM structure, detailing inherent chemical complexity, is included to present the scope of the analytical challenge. Current chromatographic and spatial techniques addressing these challenges are detailed. Capabilities for multimodal multiplexing with cellular populations are discussed with a perspective on developing a holistic view of disease processes that includes both the cellular and extracellular microenvironment.

Abbreviations

AFM, atomic force microscopy

ColXaY, alpha Y chain of collagen type X

DDA, data-dependent acquisition

DESI, desorption electrospray ionization

DIA, data-independent acquisition

ECM, extracellular matrix

FFPE, formalin-fixed paraffin-embedded

H&E, hematoxylin and eosin

HYP, hydroxylation of proline or hydroxylated proline

IHC, immunohistochemistry

IR-MALDESI, infrared-matrix-assisted laser/desorption electrospray ionization

iTRAQ, isobaric tag for relative and absolute quantitation

LAESI, laser ablation electrospray ionization

LC–MS/MS, liquid chromatography tandem mass spectrometry

LOX, lysine oxidase

LOXL, lysine oxidase-like

MALDI, matrix-assisted laser/desorption ionization

MIBI, multiplexed ion beam imaging

MMP, matrix metalloproteinase

MRM, multiple reaction monitoring

MSI, mass spectrometry imaging

P3H1, prolyl-3-hydroxylase 1

P4HA, prolyl-4-hydroxylase

PC-MT, photocleavable mass tags

PLOD, lysine hydroxylase

PNGase F, peptide-N-glycosidase F

PSR, picrosirius red

PTM, post-translational modification

QconCAT, quantitative concatemers

SALDI, surface-assisted laser/desorption ionization

SDS/PAGE, sodium dodecyl sulfate/polyacrylamide gel electrophoresis

SEM, scanning electron microscopy

SHG, second harmonic generation

SIMS, secondary ion mass spectrometry

SRM, selective reaction monitoring

THR, triple helical region

TMA, tissue microarray

TME, tissue microenvironment

TMT, tandem mass tag

TOF, time of flight

The extracellular matrix (ECM) is a three-dimensional complex network with significant proteins and post-translational patterns that forms the tissue microenvironment (TME) outside of the cell. Far from being a bare scaffold, the ECM is composed of significant chemical biology due to variations in protein structure forming the encoded architecture that is the basis for tissue structure and function. Notably, the ECM is a proteinaceous composition with significant post-translational modifications (PTMs). The ECM functions as a dynamic molecular highway for cellular signaling via control of chemical gradients, mechanotransduction, protein–protein interactions, receptor binding, and protease-induced remodeling [[1-6]]. Proteins that comprise the ECM include collagens, elastin, glycoproteins, proteoglycan proteins, and enzymes such as matrix metalloproteinases (MMPs) and lysyl oxidases that assemble structure outside of the cell [[7, 8]]. The structure and composition of the extracellular microenvironment are controlled by deposition of collagen superstructures and secretion of other extracellular proteins that regulate tissue homeostasis. Collagens are present within almost all of the animal kingdom, forming the basis of structure and function for all organs through multiscale processing from single chain fibrils to suprastructure bundles of fibers [[9, 10]] (Fig. 1). The analysis of structural features of fibrillar collagens, the surrounding extracellular matrix proteins, and chemical composition poses an analytical challenge. This review focuses on proteomic analysis of the ECM to include fibrillar collagens that form the structure of the tissue microenvironment. Collagen synthesis and its role in producing the extracellular structure is discussed to give context to associated analytical hurdles. Methods to visualize the pathology of the extracellular matrix cover historical use of chemical stains and microscopy techniques, chromatographic proteomic approaches, current spatial transcriptomics, and spatial proteomics targeting ECM. A perspective summarizes the potential of the extracellular matrix in the study of human disease.

Details are in the caption following the image
Multiscalar components of the extracellular matrix and collagen structure. The top schematic depicts processes involved in multiscalar collagen fibril formation and the post-translational modifications that can occur within each structure. Hydrogen bonds are indicated with red semi-circles accompanied by dotted lines pairing hydrogen bond donors and acceptors. The bottom schematic details the multiscalar small molecules, proteins, protein suprastructures, and cells that comprise the extracellular matrix. Together collagen composition and structure form a dynamic extracellular microenvironment.

Extracellular matrix form leads to function

Within the extracellular matrix, collagens form an intricate structural and cell sensory information network that shapes the basis of tissue health. There are 28 members of the remarkably complex collagen family that forms the suprastructure scaffolding throughout the TME [[11]]. Recent reviews and original studies provide deep detail into the process of extracellular collagen assembly, covered in brief here [[9, 12-14]]. All collagen proteins are characterized by the presence of a unique triple helical quaternary structure with repeat sequences of G-X-P or G-P-X, where X is frequently a nonpolar aliphatic amino acid [[15]]. This triple helical region encodes multiple protein and cell interactions domains that produce significant cell sensory information [[16, 17]]. Within the triple helical region (THR), proline is the second most abundant amino acid after glycine and is variably modified by hydroxylation, or the insertion of an oxygen, at the 3- or, most commonly, 4-position of the proline residue [[18, 19]]. Hydroxylation of proline (HYP) plays a significant role in hydrogen bonding of the THR [[20]]. Throughout the tissue microenvironment, HYP site variability controls and modulates the exposure of the collagen suprastructure cell binding domains, resulting in the structure–function relationships of specific organs [[21-24]]. The addition of the most common modification, 4-hydroxylation of proline, is an enzymatic process critical to collagen synthesis utilizing any one of three cell-specifically produced prolyl-4-hydroxylases (P4HA1, P4HA2, P4HA3), ascorbic acid, iron, oxygen, and succinate to create the HYP site during collagen synthesis [[23, 25]]. Fibrillar collagens have several hundred potential sites for 4-HYP (putative: Col1a1, 283 sites; Col1a2, 237 sites; Col3a1, 286 site; Col5a1, 293 sites), creating potential for an immense amount of variability and structural conformations. The 3-HYP position is more rare in fibrillar collagens, occurring in fewer site residues (6–20 sites mapped [[26]]) facilitated by prolyl 3-hydroxylase-1 (P3H1) complexing with cartilage-associated protein and cyclophilin B [[27]]. In early synthesis within the Golgi apparatus, HYP sites are added to single-strand chains [[22, 28, 29]]. HYP-facilitated hydrogen bonding occurs after chain combination at the C terminus of single strands to form a triple helical structured propeptide with free strands at the N and C termini [[22]]. Fibrillar collagen Col1a1 and Col1a2 are frequently combined in a 2 : 1 ratio to form an initial triple helical structure [[29, 30]], yet this ratio is dependent on the biological status of the tissue. After secretion from the cell, fibrillar collagens, now in a triple helical “brick,” fuse through enzymatic processes cleaving the N- and C-terminal strands to form insoluble fibrils and fibers with differing width, length, and curvatures. Mechanisms and sequence of assembly driving structural formation within the extracellular microenvironment are somewhat undefined and are due to cell processing and/or mechanical loading [[9]]. Cellular processing involves, in some ways, the presence of fibropositors or membrane channels that form template spaces for organizing individual or groups of fibrils [[31]]. Modeling data that included electron microscopy and laser capture microdissection coupled with proteomics suggested that not all assembly is dependent on cellular processes and may be driven by mechanical loading [[32]]. In this model, free triple helical segments were observed secreted early in development and self-assembled through processes of mechanical loading, and aggregation in the absence of cells [[32]]. As part of extracellular assembly, crosslinks are formed between collagen types as well as other ECM proteins. Crosslinking occurs through short lived post-translational modifications of lysine hydroxylation, a modification placed by one of three lysyl hydroxylases (PLOD1-3) [[33, 34]]. Disulfide bridge formation by protein disulfide isomerases, glutamate/lysine crosslinks by transglutaminases, and lysine/lysine crosslinking by lysine oxidase (LOX) or lysine oxidase-like proteins (LOXL1-4) form crosslinks within the extracellular space [[35-37]]. Crosslinking produces ultrasuprastructure scaffolding throughout the TME to present an additional macro level of chemical biology and cell sensory mechanism. The way that fibers assemble within the extracellular microenvironment have been known for decades to alter in disease status [[38, 39]]. Alterations in extracellular assembly are driven by mutations, propeptide splice variants, changes in ECM composition or post-translational modifications, differential crosslinking and MMP activity [[30, 40-43]]. Subsequently, fiber measurements have been used to predict disease progression and outcome [[39, 44, 45]]. The variations and scalability of the biology presented throughout collagen synthesis, secretion and modeling direct the cell and structure network of TME form and function. These factors also pose analytical challenges toward understanding the complexity of fibrillar collagen composition and organization within the tissue microenvironment.

Scope of the analytical challenge

The cell processing, composition, and multiscalar combinations of the ECM pose analytical challenges toward evaluating a complete TME that includes the extracellular environment. With the exception of small molecules and minerals, the ECM is primarily composed of proteins with significant contributions from post-translational modifications [[46, 47]]. Post-translational modifications (PTMs) of N-linked glycoproteins and glycosaminoglycans form the basis for gradient control and protein–protein interaction within the ECM PTMs pose analytical challenge through complex structural diversity within glycoform groups, high molecular weight, and a diversity of negatively charged functional groups that can present on the polysaccharides [[48, 49]]. Methods that work to isolate and decellularize the tissue may damage ECM components, distort the amount of soluble ECM used to shape the microenvironment, and lose important spatial secretory information within a cellular niche. Transcription that reports the assortment of potential products is frequently done at a different time scale than the long-lived suprastructure proteins and is unable to report site-specific localization of PTMs. Trypsin, the conventional enzyme used for proteomics, has limited activity against collagen due to suprastructure formation and abundant G-P-X and G-X-P regions with lysine crosslinking. The presence of a significant glycine composition in collagens can produce shorter retention times during chromatographic analysis. The high content and variability in 4- and 3-HYP sites requires analytical approaches that can distinguish peptides with the same mass but with variation on which and where prolines are hydroxylated. The high levels of nonpolar aliphatic amino acids result in differences to ionization and fragmentation potential by mass spectrometry proteomics approaches. MMP activity produces significant variation of ECM protein structure [[50]] and when combined with the use of trypsin may result in larger numbers of semi-tryptic peptides in database searches. Crosslinking between different ECM proteins means that different strategies are needed for database searching, as a single detected peptide may be a combination of several proteins. Spatially, ECM significantly controls the topography of the TME and this has been historically addressed with chemical stains that are ambiguous in reporting molecular composition. The majority of studies in spatial proteomics have mimicked chromatographic studies by using trypsin, resulting in the same limitations as tryptic LC–MS/MS approaches. From the collective information on biological processing of ECM, it is clear that analytical approaches directed at ECM must be proteomic in nature but with significant changes to the standard trypsin proteomic workflow. These changes include implementation of nontraditional proteolytic enzymes, alterations in chromatography based on compositional analysis, capabilities to evaluate complex PTMs and site variation, increased spatial resolution to define cellular neighborhoods, new tools to investigate the signaling response to cell type expression, and unique data base searching approaches.

ECM proteomics by chromatography and mass spectrometry sequencing

The study of ECM proteomics by liquid chromatography coupled to tandem mass spectrometry (LC–MS/MS) has been elegantly summarized to cover the last 10 years [[51]]. Current techniques in ECM proteomics involve sample preparation techniques to enrich ECM proteins, alternative methods to solubilize and digest ECM for analysis, and specialized ECM methods for mass spectrometry data acquisition, and database tools to identify ECM proteins [[52-54]].

ECM sample preparation techniques

Extracellular matrix proteomic preparation techniques include ECM enrichment by decellularization or ECM fractionation by solubility and can be followed by SDS/PAGE [[51-53, 55]]. Decellularizing the tissue, or removing the cells, is a first pass enrichment step. For LC–MS/MS proteomics, decellularization is done at the tissue level and must be customized per organ tissue type and disease state [[56-58]]. This is done by incubating tissue samples in lysis buffers [[59]] or solutions containing detergents, like sodium dodecyl sulfate (SDS) [[55, 58, 60-63]]. The soluble fractions contain cellular proteins, whereas the insoluble fraction is enriched for ECM proteins, including collagen types. The ECM fraction is solubilized with high concentrations of chaotropic agents, like urea or guanidine, and addition of reducing reagents such as dithiothreitol [[61, 62]]. A quantitative detergent-based method has been developed that does not include a decellularization step, and instead fractionates ECM based on solubility into buffers with increasing salt and detergent (SDS) concentration [[64]].

ECM digestion techniques

After ECM enrichment, ECM proteins are digested prior to LC–MS/MS analysis. Typical proteomics techniques rely on trypsin, an enzyme that cleaves at lysine and arginine residues that are not flanked by prolines, to digest a diverse range of proteins. The high proline content of collagens, dense post-translational modifications, and crosslinked insoluble fibers make relatively resistant to trypsin and can result in limited coverage of collagen-type proteins. Glycosylation is a significant PTM within the ECM and poses significant steric hindrance. To improve steric access to the protein structure, ECM-targeted methods have implemented deglycosylation with Peptide N Glycosidase F (removing N-linked glycans) [[62, 65, 66]], chondroitinase ABC [[67]], keratanase, or heparanase [[60]] (removing glycosaminoglycans) prior to digestion. For increased targeting and access to differential protein structure, alternate enzymes and non-enzyme approaches have been used alone or in combination with trypsin. These approaches include collagenase [[58, 68, 69]], pepsin [[67]], or LysC [[61, 64, 65]] and chemical digestion by hydroxylamine and cyanogen bromide [[70, 71]].

ECM LC–MS/MS data acquisition and analysis techniques

LC–MS/MS experiments for ECM currently leverage similar proteomics approaches as used in other cellular fractionation methods. Chromatographic approaches use combinations of reverse phase, strong cation exchange and may be used in combination with SDS/PAGE. Acquisition strategies that include data-dependent acquisition (DDA) to produce proteomic libraries are useful in targeted regional analysis [[72]] and with non-tryptic enzymes [[68, 73]]. Data-independent acquisition (DIA) is useful on tryptic digests of ECM, increasing reproducible peptide identification and increasing coverage of the tryptic matrisome [[74]]. Quantification methods using tandem mass tags (TMT) [[75]], isobaric peptide labeling (iTRAQ) [[76]], selective reaction monitoring (SRM) [[70]], multiple reaction monitoring (MRM) [[77]], and quantitative concatemers (QconCATs) [[78-80]] have been employed to understand ECM protein regulation. Significantly, matrisome databases and data tools have been created to collect curated studies of the ECM across tissue types, with visualization of PTM and structural entities [[54, 81, 82]].

Spatial analysis of the extracellular microenvironment

Spatial analysis works to report relationships between single cell types to cellular neighborhoods throughout the TME. The extracellular microenvironment represents an important component of this crosstalk. The emergence of spatial biology has now resulted in a plethora of techniques focused on detecting single cells, yet less so on defining the in situ molecular components of the extracellular microenvironment that provide functional communication between cells. Historically, chemical staining has been the most common way to gain spatial information on the extracellular microenvironment. Chemical stains (Table 1) combined with microscopy methods target collagen structure and report distribution, intensity, and fiber measurements. Electron and photon microscopy techniques use the birefringent properties of the extracellular matrix to produce high spatial resolution images of collagen fibers surrounding tissue features or within cellular regions. Spatial transcriptomics and proteomics work to detect multiplexed combinations of extracellular components at the transcript level or protein level. New techniques in spatial proteomics leverage mass spectrometry imaging to target and access the ECM proteome (Fig. 2). Current trends in spatial omics are emerging to combine all of these techniques as multimodal, multiplexed studies to form a complete systems biology portrait of the TME. This section discusses historical and current trends in the spatial analysis of the extracellular microenvironment.

Table 1. Chemical stains for extracellular matrix visualization. Staining reagents that comprise the dye, their staining substrates, and the resulting color of the stained substrates are summarized. If known, staining reagent properties, such as acidity or basicity, optimal staining pH, and proposed staining mechanism are listed.
Staining reagents Substrates stained Color Staining mechanism (optimal pH) First used
Hematoxylin & Eosin Mayer's Hematoxylin (Potassium or ammonium aluminum sulfate mordant, sodium iodate oxidation) Nuclei, rough ER, ribosomes Black

Basic dye (pH 2.4–2.9)

Binds negatively charged residues (DNA) [[85]]

1876
Eosin Cytoplasm, ECM Pink

Acidic dye (pH 5.0)

Binds basic residues

Masson's trichrome Weigert's Hematoxylin (Ferric chloride mordant, natural oxidation) Nuclei, rough ER, ribosomes Black

Basic dye (pH 2.4–2.9)

Binds negatively charged residues (DNA) [[85]]

1929
Acid Fuchsin with Ponceau Cytoplasm and connective tissue Red

Acidic dye (2.5)

Binds basic residues [[115]]

Aniline blue (methyl blue and Water blue) Collagens Blue Acidic dye
Movat's pentachrome Weigert's or Verhoeff's Hematoxylin (ferric chloride mordant, natural or iodine oxidation) Nuclei Black

Basic dye (pH 2.4–2.9)

Binds negatively charged residues (DNA) [[85]]

1955
Saffron Collagen Yellow Unknown
Alcian Blue (Methylene blue) Polysaccharides Blue

Basic dye (pH 2.5)

Binds protonated acidic residues (–HSO4 and –OH)

Acid Fuchsin Muscle Fibers Red

Acidic dye

Binds basic residues [[115]]

Crocein Fibrin Bright Red Unknown
Herovici Weigert's Hematoxylin (Ferric Chloride Mordant, natural oxidation) Nuclei, Rough ER, Ribosomes Black

Basic dye (pH 2.4–2.9)

Binds negatively charged residues (DNA) [[85]]

1963
Acid Fuchsin with picric acid Muscle Fibers Red

Acidic dye

Binds basic residues [[115]]

Aniline blue (methyl blue and Water blue) Collagens Blue Acidic dye
Picrosirius Red Weigert's Hematoxylin (ferric chloride mordant, natural oxidation) Nuclei, rough ER, ribosomes Black

Basic dye (pH 2.4–2.9)

Binds negatively charged residues (DNA) [[85]]

1968
Sirius red (Red 80) with picric acid Collagens Birefringent Yellow-red

Acidic dye

Binds basic residues [[115]]

Binds parallel to collagen fibers [[207]]

Details are in the caption following the image
Approaches to analyzing multiscalar features of the extracellular microenvironment including collagen fibers. An assortment of analytical techniques has been used to probe the molecular and physical composition of the ECM. PTMs, post-translational modifications.

Chemical stains

The Hematoxylin and Eosin (H&E) stain is considered the gold standard for clinical applications and has been used in pathological workflows to diagnose diseases. H&E was one of the first stains reported for use on connective tissue in 1876 [[83]]. Hematoxylin or its oxidized form, hematein, is a stain initially extracted and synthesized from the logwood tree [[84, 85]]. It binds to negatively charged DNA and RNA molecules, staining nuclei purple. The hematoxylin stain comes in many forms based on the oxidizer and mordant used. The water-soluble and acid-resistant properties of hematoxylin make it compatible with downstream acidic counterstains [[86]]. The most common hematoxylin used in H&E stains is Mayer's hematoxylin, which is oxidized by sodium iodate and uses potassium or ammonium aluminum sulfate. Eosin is ethanol-soluble and stains collagenous, muscle fibers, and cytoplasm pink [[87]]. Combined H&E can stain with up to five shades of pink, providing information on cellular morphology as well as distribution and patterning of collagen features [[86, 88]]. H&E cannot distinguish types of collagen. H&E may be used with Second Harmonic Generation microscopy to report collagen fiber measurements (length, width, curvature) related to features within the TME [[89]].

Masson's Trichrome staining was first published in 1929 [[90]] and involves a nuclear stain (Weigert's hematoxylin: brown/black), a cytoplasmic and muscular tissue stain (acid fuchsin with ponceau: red) and a connective tissue stain (aniline blue (methyl blue): blue or green) [[91]]. It is shown to stain collagens blue or green dependent on basic residues as well as muscle fibers red or yellow depending on their maturity [[92]]. Masson's Trichrome staining has been used to stain collagen and assess its structure and organization [[93, 94]] and is oftentimes used alongside other ECM imaging techniques such as immunohistochemistry, second harmonic generation, and other imaging modalities [[95-98]].

Saffron, which comes from the stigmata of the saffron flower, stains collagen fibers yellow and is typically used in conjunction with hematoxylin and either eosin [[99, 100]] or, in earlier years, phloxine [[101, 102]]. Saffron is also one of the five stains in Movat's pentachrome stain., This stain dyes acidic carbohydrates blue (Alcian Blue), nuclei black (Weigert's hematoxylin), elastin dark purple (resorcin-fuschin), muscle and fibrinoid red (woodstain-scarlet-acid fuchsin), and collagen fibers yellow (saffron) [[103]]. Saffron stains may be used as a single stain or in combination to quantify collagen fibers by a yellow intensity readout [[104]].

Herovici's picropolychrome stain [[105]] stains muscle fibers and cytoplasm yellow (picric acid), and produces either a fuchsia/red (acid fuchsin) or blue (methyl blue) staining in collagenous regions [[106, 107]]. Methyl blue is pH-dependent, binding to exposed basic sites, leading to the characteristic that blue stains represent immature collagen fibers. The exact target composition of the Herovici stain remains undefined. Herovici's stain has been used to report and relatively quantify Collagen Types I and III as well as immature and mature collagen [[108-113]], with controversy regarding the type of collagen that is stained.

Picrosirius Red (PSR) [[114]] selectively stains collagens reddish orange. The binding mechanisms have been shown to be electrostatic, meaning the dye can be removed with a low pH [[115]]. Combining PSR with the use of a polarized filter in microscopy can enhance birefringence of finer collagen fibers and aid in reporting collagen organization across the tissue. The ability of PSR to distinguish between collagen types remains undefined [[116, 117]].

Atomic/electron/photon microscopy techniques for fiber measurement

Microscopy techniques for assessing collagen fibers and overall ECM organization include atomic force microscopy (AFM), scanning electron microscopy (SEM) and second harmonic generation (SHG) [[38, 118, 119]]. These techniques do not chemically alter or ablate tissues, making them compatible with downstream analyses. In AFM, a mechanical probe is scanned across the surface of tissues and cantilever movements are detected to provide a three-dimensional image of a tissue's surface with a lateral resolution of 1 nm and a vertical resolution of 0.1 nm [[120]]. AFM may be used to spatially assess physical properties including stiffness, elasticity, and adhesion [[121]]. In SEM, a high-energy beam of electrons is stepped across unstained, thinly cut tissue sections [[122, 123]]. The readout is emitted, diffracted, and backscattered electrons focused through lenses and detected by a secondary electron detector. SEM resolution depends on the incident probe diameter and can produce features as small as 1 nm. In SHG, irradiation of tissue results in two incident photons with the same frequency simultaneously impacting a birefringent material, creating the emission of one photon with twice the frequency and same direction propagation as the incident photons [[124, 125]]. When imaged with a lens that filters for the second harmonic wavelength, the resulting image allows quantitative analysis of the distribution, shape, and orientation of collagens [[126]]. SHG microscopy resolution depends on the excitation wavelength, which typically ranges from 400 to 1000 nm [[127]]. There are numerous variations of SHG microscopy that work to measure the different physical properties arising from irradiation of birefringent extracellular matrix [[125]]. SHG has also been multiplexed with two-photon microscopy (multiphoton imaging) to visualize collagens within surrounding tissue architecture and with H&E staining to co-register collagen fiber information with pathological features [[128-130]].

Spatial transcriptomics

Spatial transcriptomics allows for spatially distributed expression profiles of RNA within fixed or frozen tissue sections with a diversity of workflows [[131-137]]. A general workflow uses UV labile oligonucleotides tags on antibodies or in situ hybridization probes to bind to target RNA or proteins over the entire tissue section. In certain workflows, regions of interest may be identified by fluorescently labeled antibody markers of the cell types of interest. The areas of interest are irradiated with UV light, which releases probes for amplification (GeoMx Digital Spatial Profiler, Nanostring). This process can produce spatial information from single cells to thousands of cells with multiplexed information at the protein and genetic level. In most cases, the tissue must be placed onto a specified target area, which requires access to stored tissue blocks for specialized sectioning. Commercialization of these techniques has increased the accessibility for multiplexed studies that include ECM profiles. Spatial transcriptomics lacks information on protein composition and post-translational modification. Timing between transcription and modeling of the ECM may be vastly different [[138, 139]].

Multiplexed imaging by antibodies

Multiplexed antibody approaches target specific protein through antibody binding epitopes down to single cell levels. These approaches use fast mass spectrometry analyzers, such as time of flight (TOF), paired with detectors, such as multichannel detectors [[140, 141]], that are capable of capturing multiplexed signal. Mass cytometry [[142, 143]] or CyTOF® uses antibodies coupled with rare metal tags irradiated by a laser beam to visualize single cells down to 1 μm within the TME. Mass cytometry is generally restricted to a small area around 800 μm2 where the tissue is completely ablated during analysis. Multiplexed Ion Beam Imaging (MIBI-TOF) [[144, 145]] uses combinations of rare metals detected by stepping an ion beam composed of oxygen ions ( O 2 + ) [[146]] across the surface to release the tags. MIBI-TOF works within a select region around 800 μm2 and has a spatial resolution of < 200 nm. MIBI-TOF is a surface scan, removing the top molecular layers for analysis. Photocleavable mass tags (PC-MTs) are a recent development that can be leveraged for multiplexed studies by any mass spectrometry instrument equipped with a matrix-assisted laser desorption/ionization (MALDI) source [[147]]. PC-MTs on antibodies are applied to the entire tissue, released by brief exposure to UV light and the tags detected in place after application of a chemical matrix that facilitates ionization. PC-MT may be used to evaluate entire tissue sections to smaller, high spatial resolution regions [[148]] and are likely compatible with other sources such as desorption electrospray ionization. A pitfall with all antibody approaches is that the mechanism of detection involves binding to a specific epitope within the protein structure. This greatly limits detection of detailed protein domain modulation, including post-translational modifications.

Laser capture microdissection coupled to proteomics

In this approach, a laser is used to isolate a small region, which is then captured and processed by microproteomics techniques [[149]]. Microproteomics typically uses the same enzymatic approaches and chromatography techniques in as in conventional proteomics studies. Current studies leverage using multiplexed markers of disease to target specific regions for microproteomics [[72, 150]]. This is a powerful approach for comprehensive proteomic assessment of the ECM, especially when paired with other modalities such as SEM or nonlinear birefringent techniques [[32]]. An advantage is the proteomic depth achieved by microproteomics when compared to other tissue imaging proteomics. A main limitation is that removal of the region limits understanding the association with the surrounding tissue and the removed regions cannot be used for further multiplexed targeting of specific analytes.

Mass spectrometry imaging of ECM

Mass spectrometry imaging (MSI) has been used for decades to investigate the spatial distribution of many types of analytes [[151-153]]. Mass spectrometry imaging requires a source capable of focusing on a discrete x and y coordinate for downstream analysis and detection. The source is stepped across the tissue or two-dimensional target in an array fashion and at each x and y coordinate produces a mass spectrum that contains hundreds to thousands of analyte peaks. Each peak represents an image channel that can be mapped across the entire tissue section. There are many sample preparation approaches dependent on the study hypothesis and configurations of mass spectrometry instrumentation [[154-156]]. This review reports on detection of ECM by MSI, which we term ECM-MSI.

Ionization sources

Many different ionization methods that have been developed in the field of mass spectrometry imaging (MSI). Secondary ion mass spectrometry (SIMS) has been used to spatially profile ECM proteins [[157]]. Other ionization techniques initially used for small molecule analysis that have recently been optimized to image peptides and proteins, but have yet to explore the ECM include infrared-matrix-assisted laser desorption electrospray ionization (IR-MALDESI, small peptides [[158, 159]] and glycans [[160]]), surface-assisted laser desorption ionization (SALDI [[161, 162]]), desorption electrospray ionization (DESI [[163-165]] and nanoDESI [[166-169]]), and laser ablation electrospray ionization (LAESI) [[170, 171]]. MALDI-MSI has been used as a main approach to peptides derived from the ECM by mass spectrometry imaging [[68, 172]]. This review will focus on the use of matrix-assisted laser desorption/ionization mass spectrometry imaging (MALDI-MSI) for ECM analysis.

Considerations for ECM sample preparation and peptide identification by ECM-MSI

Accessing the extracellular matrix proteome by MALDI-MSI leverages enzyme specificity to produce analytes from target protein structure. MALDI-MSI may be done as a single analytical approach or used with multimodal, multiplexed strategies to visualize the ECM (Fig. 3). The image data obtained by MALDI-MSI of the ECM contains spatial distribution of peptides produced from enzymatic digestion of targeted protein structures. Identification is done after MALDI-MSI in a separate step on the same tissue section [[173, 68]]. The workflow is highly synchronous with immunohistochemistry approaches [[68, 155, 173]]. Enzymes may be used on fresh-frozen tissue or fixed tissue and are sprayed onto the tissue using automated sprayers to maintain localization. Fresh-frozen tissue requires clearing of metabolites and lipids prior to enzymatic digestion, whereas formalin-fixed tissue requires additional steps that remove the formalin crosslinks between proteins by antigen retrieval [[172, 174, 175]]. Both clearing and antigen retrieval are required to increase access to the protein structure. The general workflow includes tissue clearing and/or antigen retrieval, enzymatic spraying and digestion, chemical matrix application for ionization (oftentimes α-cyano-4-hydroxycinnamic acid for proteins and peptides), and mass spectrometry data acquisition. Improvements in instrumentation and sample preparation have allowed for increases in sensitivity, throughput, mass accuracy, and resolution. Considerations for customizing sample preparation include the type of tissue being imaged, the enzymes used for targeting the ECM proteins or PTMs and required spatial resolution. MALDI ECM-MSI has been done on fresh-frozen tissues, formalin-fixed paraffin-embedded (FFPE) [[68, 176-178]] tissues, and tissue microarrays (TMAs) [[179-183]]. Embedded tissues or TMAs are sectioned with a typical thickness of 3–7 μm for FFPE tissues and 8–10 μm for fresh-frozen tissues [[173, 184]]. It should be noted that tissue thickness has been shown to alter MALDI-MSI peak intensities in FFPE tissues [[185]]. The type of instrument available dictates the type of substrate the sample may be mounted on. For charge decoupled sources where MALDI is coupled with an ion funnel, a standard microscope slide may be used. In our group, very little charge build-up has been observed over hundreds of slides analyzed. For charge coupled sources (MALDI-TOF), a conductive slide must be used, typically an indium tin oxide coated; these slides are commercially available. Decellularization may be done on fresh-frozen samples at the tissue or organ level to enrich ECM proteins prior to enzymatic digestion [[186]]. Enzymatic techniques used to report spatial distribution of the ECM include untargeted approaches using trypsin [[172]] or pepsin [[187]], and targeted approaches using elastase (elastin) [[68, 188]], PNGase F (N-linked glycans) [[189]], EndoF (core-fucosylated N-glycans) [[190]], chondroitinase ABC (chondroitins) [[191]], isoamylase (glycogen) [[192]], or collagenase (collagens and 40–60 other ECM proteins) [[68]]. When choosing enzymes, it is important to consider the consensus site of the target. Tryptic digests of proline-rich proteins, like collagen, show reduced coverage when compared to proteins with lower proline content due to limited cleavage when lysine/arginine is followed by a proline residue [[193]]. N-linked glycosylation is restricted by consensus site to N-X-S/T, where X cannot be proline. Fibrillary collagens have limited consensus sites; however, more than 75% of ECM proteins surrounding the collagens are N-linked glycosylated and have on average 4.3 N-linked consensus sites. Deglycosylation greatly improves access to the ECM protein structure for increased detection of ECM proteins [[194]]. Spatial studies reporting ECM expression have used either trypsin [[176-178, 182, 195]] or collagenase Type III [[68, 180, 181, 191, 194, 196-198]] in combination with deglycosylation.

Details are in the caption following the image
Integrative MALDI-MSI targeting the extracellular microenvironment. (A) Overall Spatial Imaging Workflow by MALDI-MSI. MALDI mass spectrometry imaging workflow details integral steps in matrix proteome imaging tissues on standard microscope slides by charge decoupled sources. Steps that are depicted are indicated with a black outline. * designates optional techniques that can be performed throughout the workflow on the same tissue section are included at optimal locations within the workflow. Techniques discussed performed prior to MALDI-MSI of ECM include GeoMx, MALDI-immunohistochemistry (IHC), second harmonic generation microscopy (SHG), scanning electron microscopy (SEM), immunohistochemistry (IHC) and atomic force microscopy (AFM). (B) Hematoxylin and eosin (H&E) stains are pathologically annotated. H&E-staining of a representative formalin-fixed paraffin-embedded (FFPE) hepatocellular carcinoma tissue section enables pathological annotation of tumor (black) and stroma (blue) regions. (C) Tissues are sampled at discrete points with a laser. Shown are the designated sites where the laser will sample the tissue prior to actual acquisition. (D) Mass spectra are produced from each pixel. A MALDI mass spectrum shows representative mass range and m/z values detected from ECM-MSI. (E) Heuristic spatial clustering produces localized ECM proteomes. Example segmentation image analysis was produced from mass spectra from 79 341 pixels. Segments were produced from heuristic spatial clustering using bisecting k-means and the Manhattan metric. A legend of the cluster analysis is shown on the right. (F) Multiplexed image collection. Example of 280 MALDI-MS images representing a fraction of the peptides detected in tissue with high levels of stroma. (G) Mass spectrometry images are produced from individual m/z values. Images are matched by high mass accuracy to peptides derived from LC–MS/MS of the same tissue section. Individual MALDI-MS images show spatial distribution of peptides with m/z values of 740.390 (P740), 1242.582 (P1242), and 1681.811 (P1681). Yellow indicates high peptide intensity and blue indicates low peptide intensity.

A challenge with ECM-MSI is that identification is done in a separate workflow but on the same tissue section, creating a reference library to match by accurate mass or using targeted approach to sequence a peptide or analyte of interest. Targeted identification can be done by on-tissue MALDI-MS/MS [[179, 199]], or by using an in silico reference library [[182, 183]]. Using the same tissue section, an untargeted reference library may be created by LC–MS/MS from either locally-digested, liquid-extracted peptides [[177]] or in-solution-digested peptides from homogenized whole tissue [[178, 196]] or homogenized macrodissected regions of interest [[176]].

Multimodal multiplexing with ECM spatial proteomics by MALDI-MSI

An advantage ECM-MSI is that any of the enzymes may be used in serial with careful planning and with other omic approaches, thus producing a more complete view of the TME. This has been done with combinations of glycans, chondroitins, tryptic peptides, elastase peptides, and collagenase peptides [[191, 200-202]]. Single and multiplexed enzymes are highly compatible with common pathology stains, including H&E, done on the same tissue section [[194, 203]]. This opens the prospect for multimodal approaches that combine photon microscopy such as second harmonic generation toward investigating collagen fiber measurements. Immunohistochemistry is a well-established approach that probes epitopes of interest with primary antibodies. Their presence is detected by a chromophoric readout produced either directly from the primary antibody or indirectly through a conjugated secondary antibody. Immunohistochemistry (IHC) and MALDI-MSI has been used together to further validate findings and increase biological information associated with the study [[176, 182, 183, 195, 204, 205]]. ECM-MSI using collagenase and IHC can be performed on the same FFPE tissue section with high reproducibility and without reduction in downstream ECM signal compared to unstained tissue sections [[196]]. A comprehensive multimodal multiplexing study on same tissue sections was recently completed evaluating single cell type approaches GeoMx Digital Spatial Profiling, CyTOF, and PC-MTs (Ambergen) in combination with multiplexed ECM-MSI by PNGase F to release N-glycans and collagenase to target the ECM [[206]]. Single cell type approaches were used as “drop-in” methods without modifying the manufacturer's protocols to evaluate placement in the workflow. The overall conclusion was that antibody-directed workflows were done best as a first step prior to ECM-MSI. Combining modalities produces a complete view of cellular and extracellular imaging data and provides context on how specific cell types function within the surrounding extracellular microenvironment.

Perspective

The analysis of both the cellular and extracellular microenvironment presents a holistic view of the TME with enormous potential to impact human health (Fig. 4). The matrisome remains a complexity of post-translational modifications, mutations, splice variants, translational variation and regulation, crosslinking, and remodeling that we have simply not defined in depth at any one point. All of these variables of the matrisome can differ for each organ. A significant part of the problem is that the tools needed for a complete investigation of the tissue microenvironment that include the extracellular microenvironment have not been developed. Enzymes, not antibodies, represent considerable tools for targeting the extracellular matrix proteome. Multiplexed enzymatic approaches may be used either collectively or in serial to systematically define the matrisome. Although we have at hand spatial approaches to define the extracellular matrix “unit” along with cell signaling and composition, work needs to be done to fully integrate these modalities from analytics to data reporting and visualization. These integrative cell and ECM units would be best represented at a systems biology level, where across the tissue topography, one can report the cell interactive neighborhood along with ECM form and function. Collagen proteins should be represented as networked structures rather than single entities as it is the domain regulation that reflects tissue function. At a cellular level, we understand very little about how each cell type produces the cell-specific extracellular microenvironment. The interactive neighborhood specifically of fibroblast subtypes remains undefined and immense knowledge remains to be gained from how the fibroblast interactome responds to other cell types, especially immune cells, in different tissue and disease states. All current proteomic studies are limited in differentiating the secretory or waste products of the cells extruded into the extracellular space or proximal fluid, and this presents a novel reservoir for disease markers. Many diseases are driven by changes in ECM deposition, yet the clinical significance of the matrisome remains largely untapped. It is expected that mining of the matrisome, especially with spatial data, will produce novel risk to progression markers, refine therapeutic targets, report on therapeutic response, and stratify for clinical trials selection. With this in mind, the spatial matrisome presents a critical part of precision medicine but is, as of yet, a nearly unexplored frontier.

Details are in the caption following the image
Perspective on where continued work is needed on cell-ECM interactions, matrisome output and the clinical utility of the matrisome.

Acknowledgements

JKM was supported by the Cellular, Biochemical and Molecular Sciences Training Program 5T32GM132055 (NIH/NIGMS). PMA was supported by NIH/NCI R01CA253460; P20GM103542 (NIH/NIGMS) and in part by Hollings Cancer Center Support Grant P30 CA138313 at the Medical University of South Carolina. Supported in part by the Biorepository & Tissue Analysis Shared Resource, Hollings Cancer Center, Medical University of South Carolina (P30 CA138313) and MUSC Digestive Disease Research Center (P30 DK123704) (NIH/NIDDK). We appreciate assistance from Jaclyn Dunne scanning H&Es for figures. The contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH or NCATS. Figures were created with the help of BioRender.com.