The promise and the challenges of cryo‐electron tomography

Structural biologists have traditionally approached cellular complexity in a reductionist manner in which the cellular molecular components are fractionated and purified before being studied individually. This ‘divide and conquer’ approach has been highly successful. However, awareness has grown in recent years that biological functions can rarely be attributed to individual macromolecules. Most cellular functions arise from their concerted action, and there is thus a need for methods enabling structural studies performed in situ, ideally in unperturbed cellular environments. Cryo‐electron tomography (Cryo‐ET) combines the power of 3D molecular‐level imaging with the best structural preservation that is physically possible to achieve. Thus, it has a unique potential to reveal the supramolecular architecture or ‘molecular sociology’ of cells and to discover the unexpected. Here, we review state‐of‐the‐art Cryo‐ET workflows, provide examples of biological applications, and discuss what is needed to realize the full potential of Cryo‐ET.


Edited by John Briggs
Structural biologists have traditionally approached cellular complexity in a reductionist manner in which the cellular molecular components are fractionated and purified before being studied individually. This 'divide and conquer' approach has been highly successful. However, awareness has grown in recent years that biological functions can rarely be attributed to individual macromolecules. Most cellular functions arise from their concerted action, and there is thus a need for methods enabling structural studies performed in situ, ideally in unperturbed cellular environments. Cryo-electron tomography (Cryo-ET) combines the power of 3D molecular-level imaging with the best structural preservation that is physically possible to achieve. Thus, it has a unique potential to reveal the supramolecular architecture or 'molecular sociology' of cells and to discover the unexpected. Here, we review state-of-theart Cryo-ET workflows, provide examples of biological applications, and discuss what is needed to realize the full potential of Cryo-ET.
Keywords: cellular structural biology; correlative light-electron microscopy; cryo-electron tomography; image processing workflow; sample preparation workflows; structural biology in situ A comprehensive understanding of the inner workings of cells needs more than knowledge of their molecular inventories or the sum of individual molecular structures [1,2]. When cells are taken apart and molecules are released from their functional environment, all information about their interactions and context is irrecoverably lost. It is common belief now that cellular functions are not the result of random collisions of individual molecules; they rather require the concerted actions of functional modules [3,4]. Many of these exist only transiently, while others, being more stable, may be so deeply rooted in their cellular environment that they cannot be isolated without violation of their structural integrity. Hence, there is a compelling need for methods that allow visualizing the molecular architecture of cells in situ [5].
There is no single method that could give us the whole picture; it rather requires an 'imaging across scales' approach and the integration of data covering different length scales ( Fig. 1). At one end of the spectrum are the methods providing high-resolution structures of isolated and purified molecules, such as X-ray crystallography, nuclear magnetic resonance spectroscopy (NMR), and cryo-electron microscopy (cryo-EM) single-particle analysis. In recent years, the latter method has emerged as the most versatile method unrivalled in particular when it comes to large and flexible macromolecular assemblies [6]. At the other end of the spectrum are methods allowing the Abbreviations CLEM, Correlative light-electron microscopy; Cryo-EM, Cryo-electron microscopy; Cryo-ET, Cryo-electron tomography; Cryo-FM, Cryo-fluorescence light microscopy; CTF, Contrast transfer function; FIB, Focused-ion beam; FIB-SEM, Ion-beam block face scanning electron microscopy; TEM, Transmission electron microscope; VPP, Volta phase plate; ZPP, Zernike phase plate. visualization of whole cells and their dynamics such as super-resolution light microscopy. When combined with ion-beam block face scanning electron microscopy (FIB-SEM), striking multimodal views of large volumes can be obtained [7]. Cryo-electron tomography (cryo-ET) provides a crucial link between wholecell imaging and high-resolution structure determination. It provides molecular resolution 3D images of cellular landscapes, but is restricted in volume because larger samples must be thinned to < 1 lm to render them electron transparent [8]. Otherwise, the only preparation step is vitrification by rapid freezing, yielding pristinely preserved samples. Chemical fixation and staining which bear the risk of altering the macromolecular organization of cells are avoided altogether.

Sample preparation
Small objects such as viruses [9][10][11][12], isolated organelles [13], cell appendages [14][15][16], small bacteria [17], or minicells [18] can be studied in toto. For cellular structural studies, cells are grown on grids which must be non-cytotoxic. To have some control over where cells grow, avoiding, for example, areas obstructed by grid bars, micropatterning methods can be used [19] (Fig. 2). Vitrification of single cells not exceeding a thickness of~5 lm can be achieved by plunge freezing [20]. For thicker objects, high-pressure freezing is indispensable to avoid ice-crystal formation and its deleterious consequences for cellular ultrastructure [21,22].
After vitrification most cells require thinning to render them electron transparent. Serial sections can be cut with a cryo-microtome, but compression artifacts and poor reproducibility have limited the use of this method [23]. Focused-ion beam (FIB)-milling has become the method of choice for compression-free specimen thinning [24,25]. By controlling the stream of Ga + ions, different thinning geometries (lamellae, wedges) can be realized. In combination with cryocompatible micromanipulators, slabs of high-pressure frozen samples can be lifted out and placed on EM grids for further thinning. This cryo-FIB lift-out method expands the range of samples that can be studied by cryo-ET to eukaryotic cells, multicellular organisms, and tissues [26].  1. Imaging across scales aims at a detailed and comprehensive description of the cellular space. This can be achieved by the integration of high-resolution structures into large volume data. In situ cryo-ET provides a crucial link by bridging the resolution gap (dotted region) between ex situ high-resolution structures and low-resolution large volume data. Boxes of the structural methods below the scale bar indicate the attainable resolutions (left) and scales over which information can be obtained (right). The volume of the box corresponds to a typical volume of a single tomogram. It is populated with imaginary particles of sizes typical for the various high-resolution methods (red NMR, green X-ray crystallography, blue single-particle analysis). In this scheme, the occupancy of the cellular space is underrepresented (2,3%). In reality, cells are much more crowded (occupancy 20-30% of the volume).

Fig. 2.
Cryo-ET workflow from sample preparation to tomogram acquisition. Sample thickness determines the vitrification method and the subsequent processing steps, producing samples with thicknesses (< 0.5 µm) allowing electron imaging. Cryo-ET can be applied to a broad range of samples from isolated macromolecules to multicellular organisms. Arrows indicate possible workflow directions for different sample groups. For adhesive cells, the use of lasers for drawing patterns (micropatterning) provides control over where cells attach and can induce different morphologies. Vitrification is mostly achieved under atmospheric pressure either by plunge freezing, where samples are applied to EM grids, blotted, and plunged into a cryogen kept close to À196°C, or microfluidic vitrification, where a rapid temperature drop is achieved by switching off the heater element between the sample and the heat sink. Thicker specimens need to be vitrified by highpressure freezing (2045 bar) and may need the addition of cryoprotectants. Micromanipulation (box) includes optional steps that can be combined depending on experimental requirements. CLEM utilizes cryo-FM to identify regions of interest and enables targeted thinning of samples using FIB-milling. Cryo-FIB lift-out allows examining parts of bulky specimens. Cryo-electron microscopy of vitreous sections (CEMOVIS) can be used for initial trimming of bulky specimens. Finally, specimens are imaged using an electron beam in a fashion similar to computer-aided tomography. The main difference is that projections of different views are obtained by rotating the sample in the beam instead of rotating the beam around the sample.

3245
FEBS Letters 594 (2020) 3243-3261 ª 2020 Federation of European Biochemical Societies Correlative light-electron microscopy (CLEM) allows for the identification and localization of features or events of interest in large cellular landscapes, and their precise targeting for FIB-milling [27][28][29][30]. Cryogenic super-resolution optical fluctuation imaging (cryo-SOFI) further improves the localization precision of fluorescently labeled proteins beyond the diffraction limit [31]. However, the signal localization precision of live cell imaging suffers from a time delay, between imaging and vitrification, on the scale of seconds. Recently, microfluidics cryofixation devices have been used to address this issue. They are able to arrest a particular cellular state within milliseconds by enabling vitrification on the imaging stage [32].

Data acquisition
To obtain a tomogram of the area of interest a series of projection images is recorded, while tilting the sample in the microscope. After refining the eucentric height over the recording position, four steps need to be repeated for each tilt in the series: tilting, centering, focusing, and recording. The angular increments can be either equally spaced or closer together as the tilt angle increases [33]. The tilt series range is usually limited to AE 60°. Different mono-and bidirectional tilt schemes are used to distribute the electron dose in a preferred sequence. The now widely used dose-symmetric tilt scheme changes the tilt direction for every step, starting at 0°and climbing its way up to the maximum tilt [34,35]. The main advantage of starting data collection at low angles, where the path of the electrons through the sample is shortest, is the preservation of high-resolution information before the onset of serious radiation damage. In addition, almost symmetrical distribution of the electron dose minimizes alignment jumps notorious for bidirectional tilt schemes.
Because of the radiation sensitivity of frozen-hydrated biological material, the exposure to the electron beam must be minimized. Optimal exposure is dependent on both the sample sensitivity to ionizing radiation and the resolution aimed at. At exposures around 160 eÁ A À2 , gas bubbles develop causing severe structural damage [36]. Distributing the total allowable electron dose over a tilt series leads to low signal-tonoise ratios (SNR) of the individual images. In addition, for higher tilt angles the path electrons traverse through the slab-shaped samples increases, resulting in multiple electron scattering further reducing the SNR. Moreover, obtaining views over the whole 180°angular range is not possible due to the physical design of the specimen holder that obstructs the beam path at higher tilt angles. The limited angular sampling range of projections results in the 'missing wedge' of information in Fourier space that causes elongation artifacts in reconstructed tomograms parallel to the electron beam direction [37]. In transmission electron microscopes (TEM) with a dual-axis stage, a tilt series can be recorded in two directions, reducing the missing information to a missing pyramid yielding a more isotropic resolution [38].
An additional factor to consider when recording a tomogram is magnification, which determines the theoretically achievable maximum resolution according to the Nyquist-Shannon sampling theorem [39]. Here, one has to keep in mind that when magnification is increased the intensity of the beam needs to be increased as well, in order that the number of electrons per pixel stays the same and that frame and tilt series alignment are not compromised. Therefore, the electron exposure of the specimen, measured in electrons/ A 2 , increases quadratically with magnification; that is for a 2-fold increase in magnification, the electron exposure of the sample increases 4-fold. Selecting optimal recording parameters needs to reconcile the optimization of sampling with the need to limit the cumulative exposure.

Tomogram reconstruction
The projection images are pre-processed and aligned before a tomogram is reconstructed. Nowadays, preprocessing benefits from the fast readout and improved SNR provided by direct electron detectors. Recording in movie mode allows the correction of beam-induced sample motions, which otherwise limit resolution [40], and it allows for low-pass filtering of movie frames in such a way that the filter cutoff is following the gradual demise of high-resolution structural information as the total exposure to the electron beam accumulates [41]. Recently, fiducial-based motion correction has been extended to the whole tilt series, reducing the effect of out-of-plane sample deformations during exposure [42]. Next, alignment of the tilt series brings the projections into a common register. The addition of nano-sized high-contrast fiducial markers to the samples, typically 5-20 nm gold beads, increases the alignment accuracy. The tilt series alignment of FIBmilled lamellae without fiducials requires patch tracking [43] or tracking of intrinsic high-contrast features [44]. The functionality for automated tracking and tilt series alignment as well as routines for tomogram reconstruction are integrated in, for example, the widely used IMOD software package [43] or in the Protomo implementation inside Appion [45]. For sufficient levels of phase contrast for cryo-ET, a tilt series is conventionally acquired in underfocus. To successfully restore the high-resolution image information, it is necessary to accurately determine the defocus and correct for the contrast transfer function (CTF). The simplest approaches divide each tilt image into strips or patches and estimate the defocus gradient perpendicular to the tilt axis [46][47][48]. For sub-tomogram averaging more accurate 3D CTF correction methods consider also the particle height in the tomogram [49,50].
The aligned tilt series data are then computationally reconstructed into a tomogram (Fig. 3) by any of the available reconstruction methods. Most commonly, weighted back-projection (WBP) [51,52] is used for its speed and preservation of high-resolution information, although direct Fourier inversion methods have certain advantages [44]. Iterative reconstruction methods yield improved contrast compared to WBP [53] and have been shown to reduce the distortions caused by the 'missing wedge' [54][55][56]. These characteristics are useful for visual tomogram interpretation and for sub-tomogram alignment [53]; however, for sub-tomogram averaging, unlike WBP, high-frequency information may be lost [53]. Recently, improved iterative methods outperformed the WBP method in the 2-3 nm resolution range [56][57][58].

Tomogram interpretation
Interpretation of structural features from reconstructed tomograms is especially challenging for crowded cellular systems and low SNR datasets. Denoising filters enhance structural features by suppressing noise, while preserving most of the signal. For example, non-linear anisotropic diffusion enhances edges [59,60], whereas more specialized filters are used to track filaments or to segment membranes [61,62]. Recently, convolutional neural networks have taken advantage of the fast detector readout to train a noise model on the basis of frames [63].
Denoising filters greatly facilitate segmentation. Image segmentation not only aids visual interpretation but also enables quantitative analysis, such as distance-based analysis of specific protein populations to segmented membranes [64,65]. Macromolecules can be located using templates derived from already known structures [66,67]. However, finding different molecular species in complex systems relies on the availability of large template libraries and the computationally demanding template search is usually limited to a few molecular species. Constraining the search space, where prior knowledge is available, can greatly facilitate the speed and accuracy of template matching. For example, the search space for membrane-embedded Projection images of a tilt series are aligned and back-projected into a tomographic volume. To facilitate particle picking, denoising filters can be applied and features, such as membranes, can be segmented to further constrain the search space. Resolution of the extracted subtomograms can be improved by averaging and classification methods. Finally, a tomographic model is generated by placing extracted objects back into their refined 3D orientations, enabling the analysis and quantitative interpretation of their spatial distribution in the context of the cellular environment. molecular complexes can be constrained to segmented membranes. Several software packages augment the labor-intensive manual segmentation [68,69]. Convolutional neural networks can be trained on segmented data [70] or to extract features automatically, in an unsupervised fashion [71]. While automation reduces user bias and increases throughput, some manual intervention is still required to set boundary conditions.
To avoid bias and enable detection of novel complexes, it is desirable to develop methods that are not relying on predefined templates, but rather use the tomogram data for pattern mining. Recently, a template-free detection method was developed and successfully applied to membrane-bound complexes [72] and a multi-pattern pursuit method for automated detection of frequently occurring structural features in tomograms [73].

Sub-tomogram averaging
When tomograms contain repetitive features, resolution can be improved by averaging multiple copies of structures of interest using methods akin to those being used in single-particle analysis. Extracted sub-tomograms are iteratively aligned and averaged to increase their SNR and resolution. For every iteration, the relative rotations and shifts between sub-tomograms and the reference structure are determined by a cross-correlation similarity metric [53]. Applying these orientations yields a new reference, which is used to refine the sub-tomogram orientations in the subsequent iteration round. Refinement is repeated until convergence or a specified iteration number has been reached. To avoid overfitting, the reference structure is filtered between iteration rounds to its estimated resolution by comparing resolution shells of two independently reconstructed volumes obtained by splitting the dataset into two halves [74]. In addition, maximum likelihood methods further reduce the risk of overfitting by allowing each sub-tomogram to simultaneously contribute to different orientations in a weighted manner [75]. To reduce the compositional and conformational heterogeneity and to further improve the averaging results, it is necessary to classify the sub-tomograms into more homogenous subsets. Different averaging and classification methods have co-evolved and among other features consider the missing wedge by performing constrained cross-correlations [67,76], provide speed improvements [77,78], and employ different classification strategies [72,[79][80][81]. Instead of finding subpopulations by global comparison, masks can be applied to specific areas for focused classification. To reduce bias, masks can be automatically generated from the local variance between sub-tomogram averages generated from subsets of the whole dataset [82]. In this way, different assembly, functional, and binding states of in situ 26S proteasomes have been successfully identified as well as two distinct oligomeric forms of tripeptidyl peptidase II [64,83,84].

Viruses and bacterial S-layers
Structural studies of viruses have been driving methods developments in 3D EM since the early days [123]. Sub-tomogram averaging has been used widely for studies of pleomorphic viruses. Examples include the murine leukemia retrovirus [92,93], rubella virus [94], herpes simplex virus type I [9,95], and the severe acute respiratory syndrome coronavirus 2 [96,97]. Sub-tomogram averaging of viral lattices with high local order resulted in the first sub-nanometer resolutions for the ex situ Ebola nuclear capsid [10] and fullerene-like HIV-1 capsid [11]. For the well-behaved immature HIV-1 capsid, near-atomic 3.9 A resolution was achieved [12]. This was further improved to 3.4 A by applying 3D CTF correction [50], and more recently to 3.1 A with emClarity, a sub-tomogram averaging package [87]. In addition to sub-tomogram averaging and data-collection advances, new insights into the assembly and maturation of viral capsids have been obtained.
Application of these developments has resulted in the first 4. 8 A in situ cryo-ET structure of Caulobacter crescentus surface layer [121] (Fig. 4A). S-layers are 2D arrays of proteins that provide protection and structural support to many bacteria and archaea. Using an integrative approach combining cryo-ET, single-particle averaging, and native mass spectrometry (MS), a high-resolution structural model of the S-layer was generated. This provided insights into the S-layer lattice organization, its interaction with surface lipopolysaccharides and dependence on calcium ions, and exemplifies the progress EM has made in the past three decades, since the first electron crystallography structure of C. crescentus S-layer [124].

COPI
Coat protein complex I (COPI) has been extensively studied ex situ and in situ. COPI-coated vesicles transport cargo within the Golgi network and in the retrograde direction to the endoplasmic reticulum. These heteroheptameric complexes assemble on the surface of the cis Golgi cisternae where they are recruited by Arf1 to facilitate the formation of membrane buds for transport. Cryo-ET of ex situ assembled COPI vesicles from mouse proteins revealed flexibly linked trimeric assemblies as the basic modules covering the vesicle surface [104]. At 13 A the basic molecular architecture of the COPI was established [105] (Fig. 4B). Taking advantage of improved data acquisition schemes and local alignment, resolution was further improved to 9 A. This allowed refinement of the COPI architecture and provided insights into the coatomer formation and disassembly [106].
In situ studies by cryo-ET FIB-milled lamellae from Chlamydomonas reinhardtii provided the native COPI structure at 20 A [107]. Comparison of the independently reconstructed in situ COPI structure showed high similarity to the ex situ COPI structure, confirming that reconstitution faithfully recapitulated the in situ scenario. In contrast to the empty ex situ vesicles, however, the luminal side of native COPI contained additional density indicating bound cargo or cargo receptors. Interpretation of the COPI in the context of their native environment provided insights into the COPI morphology and apparent half-life in the cell. As the COPI progress from the cis to trans Golgi, each time they bud off, uncoat, and fuse with the next cisterna their morphology is altered. The vesicle diameter, membrane thickness, cargo, all reflect the characteristics of the parent Golgi membrane, whereas the vesicle coat itself remains structurally unchanged. These insights could not have been obtained from a reconstituted system alone. On the other hand, understanding the COPI architecture at the secondary-structure level was only possible from the ex situ COPI data and the integration of atomic details provided by crystallographic methods. This example illustrates how different levels of information can be meaningfully integrated to provide a comprehensive picture of a complex system.

Poly-Gly-Ala aggregates in intact neurons
In situ cryo-ET of poly-Gly-Ala aggregates associated with amyotrophic lateral sclerosis and frontotemporal dementia provide an example of how alteration of the intracellular environment affects the local concentration of other molecular species [125]. CLEM was used to locate fluorescently tagged poly-Gly-Ala aggregates in transfected rat neurons and to target them for FIBmilling. Cryo-ET revealed networks of poly-Gly-Ala composed of polymorphic ribbons (Fig. 4C). The aggregates are densely populated with a single molecular species, the 26S proteasome. Compared with the surrounding cytosol, the concentration within the aggregate is increased~30-fold. Given the local abundance of proteasomes, sub-tomograms with a resolution of~10 A could be generated and classification allowed to distinguish between conformations corresponding to the basic functional states: the ground state and the substrate-processing state [126]. Mapping these two conformations back into the tomograms of the segmented aggregate material showed that the proteasomes identified as being engaged with substrate are invariably in close contact with the poly-Gly-Ala ribbons. This suggests that they are engaged with the substrate, but fail to degrade it. Becoming stalled upon interaction with the aggregate material severely compromises the cellular protein quality control system.

Challenges ahead
From low throughput to higher throughput The transition from electron tomography of resin-embedded and metal-stained cellular samples to low-dose imaging of vitrified specimens was enabled by automation of the data acquisition [127][128][129]. Since demonstrating the feasibility of cryo-ET [130], a workflow has been established that enables structural studies in situ to be performed with a wide range of cellular systems. However, throughput is still a limitation when , and surface layers (S-layer). The 4.8 A in situ cryo-ET sub-tomogram averages (gray) of the S-layer lattice are mapped back into the tomogram, and structures from X-ray crystallography at 2.7 A (red) and single-particle averaging at 3.7 A (blue) are docked. The inset (right) illustrates an integrative model of the S-layer anchoring into the outer membrane, combining structural and native MS methods. For comparison, the inset (left) shows an early electron crystallography map of the S-layer ring structure at 2 nm resolution [124]. (B) The in situ molecular organization of the Golgi apparatus from C. reinhardtii. A tomographic slice containing the cisternae stacks of the Golgi apparatus (left) and its segmented representation (right) revealing the cis (green), medial (magenta), and trans (blue) cisternae and COPI vesicles as well as the morphology of the endoplasmic reticulum (ER, yellow), trans Golgi network (TGN, purple), and two nuclear pore complexes, ribosomes, and other membranes (gray). In the tomographic slice, several COPIcoated vesicles and those budding off are indicated by asterisks. The inset shows the molecular architecture of the ex situ COPI-coated vesicle. (C) Molecular sociology within poly-Gly-Ala aggregates shows an increased concentration of 26S proteasomes. A tomographic slice (left) of a neuron expressing poly-Gly-Ala repeats. Mapping of macromolecules near the periphery of an aggregate region (right) shows 26S proteasomes in the ground state (GS, green) and an enrichment for the substrate-processing state (SPS, blue) proximal to poly-Gly-Ala twisted ribbons (red). Other molecular species, such as the ribosomes (yellow) and the chaperonin TRiC (purple), are largely excluded from the aggregate material. The inset shows a closeup of two 26S proteasome sub-tomogram averages after classification revealing the nonprocessing GS and the engaged SPS. Figure (A) inset (left) reproduced from Ref. [124]. large datasets are needed for quantitative analyses and for sub-tomogram classification and averaging. In single-particle analysis, purification of the molecules of interest results in a massive increase in concentration and the number of particles is almost never a limitation. In in situ cryo-ET, the natural abundance of a molecule in the cell determines the copy numbers per tomogram. Moreover, in situ a higher degree of heterogeneity may be encountered, as different functional and conformational states often coexist. Imaging of cellular samples has a considerable discovery potential for mapping macromolecular interactions, but often requires collecting more data than for purified specimens. For example, to perform the structural analysis of the COPI vesicles, 61 and 60 tomograms were collected for both ex situ and in situ datasets. However, the in situ dataset contained four times fewer copies of COPI, despite recording over an area twice as large. Another factor reducing the yield of in situ cryo-ET is often the lack of high-contrast intracellular fiducial markers. As a result, half of the in situ COPI tomograms had to be discarded due to unacceptable alignment errors resulting from variations in sample quality. To achieve a comparable in situ dataset would require the acquisition of~250 tomograms even in a scenario where every tomogram would contain at least one Golgi apparatus.
This requires preparation of numerous lamellae using Ga + FIB-milling, which is currently a time-consuming manual process. For thicker specimens, the milling time scales in proportion with the amount of material that needs to be ablated. At the current throughput of typically 5-10 lamellae per day, the sample preparation is slower than the recording capabilities of TEMs. Many steps of the milling process are repetitive and have been automated in the material sciences; this is not yet common in cryo-FIB-milling [131]. All steps of the workflow, from data acquisition to data analysis will benefit from task automation wherever this is possible. While automation would provide some remedy, Ga + FIB-milling is intrinsically limited by low beam currents. Xe + plasma FIB offers faster sputtering rates and two orders of magnitude higher beam currents [132] and is a promising alternative. Together with an automated milling procedure preparation of up to 5 lamellae/ hour is now possible [133].
Thicker samples present additional challenges. First, vitrification of~5 µm samples or beyond cannot be achieved by plunge freezing. High-pressure freezing devices are limited to~150 µm thickness of material or alternatively up to a few hundred µm with doublesided cooling devices [22]. With microfluidic freezing platforms, the sample thickness is given by the channel depth of 20 µm [32]. Freezing of bulk specimens has not seen substantial improvements in the past two or three decades and would need revisiting, to make tissue vitrification routine. Second, finding the region of interest in the block of ice and trimming it down requires a combination of cryo-fluorescence microscopy (FM), FIB-milling, and FIB lift-out. To enable precise 3D targeting of features of interest in an automated manner, such an approach requires the development of suitable models to incorporate the image information into deep learning approaches [134]. Correlating fluorescence and electron microscopy is relatively straightforward with thin samples. Correlation is aided by fiducial markers on the sample surface [135] and has recently become integrated into an automated workflow [131,136]. But this approach would need some adaptation for correlating features in larger frozen volumes.
Currently, the CLEM workflow requires the transfer of samples between specialized equipment for FM, FIB-milling, and finally TEM. Ice contamination is difficult to avoid during sample transfer. While being acceptable in the early sample preparation steps, after preparation of lamellae it should be avoided; even minor contaminations can affect data quality, while ice crystals obstruct the beam and reduce areas suitable for tomogram acquisition. Moreover, any sample deformations after FM affect the signal correlation. This can be problematic during the final stages of FIB-milling where bending of lamellae is not uncommon and often requires to stop the thinning process prematurely. Tension buildup resulting from large temperature changes during vitrification of materials with different expansion coefficients (carbon supports on metal grids) can be reduced by milling micro-expansion joints next to cells, but this can lead to displacement of the specimen and interfere with correlation [137]. Alternatively, all-gold supports do not suffer from different expansion coefficients, but their mechanical stability is much lower [138].
At the current resolution, FM is mainly used for targeting areas of interest and not for correlative interpretation. However, super-resolution FM correlative approaches have the potential to contribute to molecular identification. Increased fluorophore photostability at cryogenic temperatures can provide the needed photon counts for their precise localization also in samples considered too thick for EM. Recently, imaging inside a He-cooled cryostat enabled correlative super-resolution FIB-SEM to reveal identities of 100-200 nm large vesicles inside whole cells [7]. To fully harness the identification power of super-resolution methods it would be advantageous to close the resolution gap down to~30 nm or below, which is already possible for fixated cells with interferometric photoactivated localization microscopy imaging [139]. For vitrified specimens localizing microtubule bundles inside human bone osteosarcoma epithelial cells (U2OS) is a recent achievement that required addressing the increased drift of cryostages and devitrification induced by the excitation laser [140].
Integrating FM and FIB-milling into one instrument would facilitate switching between different imaging modalities over different scales and eliminate the risk of contamination during sample transfer. Such an instrument would not only open up new possibilities for workflow optimization but would also help develop new strategies for dealing with thick specimens.

Data acquisition speedup
The acquisition of tomographic data is considerably slower than data acquisition for single-particle cryo-EM. To acquire a tomogram with 41 projections around 32 min are required using the dose-symmetric tilt scheme [35], whereas anywhere between 100 and 200 micrographs can be recorded in the same time for single-particle analysis. Overhead time is spent on accurate focusing and tracking to compensate for the movement of the stage between tilting steps. In addition, the requirements for precise tracking and stage settling increase with magnification, leading to even longer acquisition times. New single-tilt axis holders reduce the stage settling time practically to zero and obviate the need for tracking. This considerably speeds up acquisition of a dose-symmetric tilt series to under 5 min, but the effect on high-resolution sub-tomogram averaging still needs further investigation [141]. Alternatively, piezoelectric-driven stages provide superior precision (14 pm) and ultra-low drift (11 pmÁs À1 ), but they are not yet available commercially and would require some redesign of the current grids [142].
Setting up acquisition areas will also require automation to keep up with faster tomogram recording rates. Automated data acquisition of in situ samples is challenging, because of the high variability of lamellae content and could benefit from machine learning approaches. For ex situ samples the task is simpler, since the grid support raster can be used to target areas, similar to single-particle acquisition.

Detection and classification of small particles
Identification of molecular complexes in complex in situ environments is challenging, because the inventory of molecular species spans several orders of magnitude in size and abundance. For example, a commonly cultured human cell line is populated with over 10 000 different proteins ranging in copy numbers from anywhere below 500 to over 20 million copies per cell [143]. Attempting to find molecular complexes in a template-based approach would require the availability of comprehensive template libraries, but often templates do not exist. For an approach toward building a functional model of the whole cell, the number of available structural and homology models with a high sequence coverage was recently estimated for the pancreatic b-cell [144]. The estimates show that structures exist for 28% of the 11 700 estimated protein species and while these numbers may be discouraging for a template-based approach they suggest that there is considerable potential for template-free approaches. To achieve a complete structural and spatial representation of the cell's proteome, that is the goal of visual proteomics [145], further developments of pattern mining methods will be necessary [72,73,146]. However, with decreasing size of molecular species, it becomes increasingly difficult to detect particles in noisy tomograms and to reliably align and classify them. Thus, smaller and less abundant complexes may escape detection and analysis. The same applies to single-particle cryo-EM, which relies heavily on averaging to improve the resolution and SNR, just as sub-tomogram averaging does. In practice, however, single-particle cryo-EM reconstructions of < 100 kDa proteins are possible [147], while molecular complexes smaller than 500 kDa are currently considered small for subtomogram averaging. To assess the detection rate of template matching by a scoring function, cryo-ET has been used in combination with quantitative MS in Leptospira interrogans [148].
One way to extend the lower-end size-limit in cryo-ET is to use a phase plate for contrast enhancement. Positioned at the diffraction plane, a phase plate converts the information imprinted on the phase of the electron wave into detectable amplitude modulation, by inducing a p/2 phase shift between the unscattered electron wave relative to the scattered electron waves. This enables data collection close to focus and provides substantial contrast transfer at low spatial frequencies, which is advantageous for recognizing and aligning smaller objects. Despite the success of the Zernike phase plate (ZPP) in light microscopy, the precision required to manufacture analogous devices for TEM delayed the realization of early proposals by decades [149]. Higher contrast and similar resolution to conventional TEM were obtained with the thin-film ZPP [150]. However, the ZPP was impractical for routine use, because of its short lifespan, it required precise centering of the hole, and automated data acquisition was not achieved [151]. Moreover, the central hole where the direct beam passes unaffected caused fringes in the images and its diameter determined the onset of the phase shift (cut-on periodicity). These problems were overcome with the Volta phase plate (VPP), which also uses a thin amorphous carbon film, but relies on the electron beam to induce the phase shift and offers a practical solution to many shortcomings of earlier designs [151,152]. Since there is no hole, the need for centering is obviated and the only requirement is the precise tuning of the TEM. The VPP regenerates over time and has a lifespan of years, if handled appropriately. More importantly, the VPP is compatible with automated data acquisition, and with software advances that included CTF phase fitting and correction, larger datasets could be acquired close to focus. This resulted in reconstructions of 20S proteasomes at 2.4 A and enabled reconstructions of hemoglobin (64 kDa) at 3.2 A [153,154]. At the theoretical lower-end size-limit for single-particle analysis, images and 2D class averages of myoglobin (17 kDa) showed the stunning improvement in contrast the VPP provides [154]. Recently, using the VPP in cryo-ET the 52 kDa streptavidin became visible, although no sub-tomogram averaging was attempted [155]. However, a problem of using the VPP for lamellae is charge buildup that causes local differences in phase contrast and limits its usability. An additional thin metal coating mitigates this problem and has provided impressive views of the nuclear periphery [156]. For Dictyostelium cells milled using a wedge geometry, beam-induced charging was not an issue. VPP imaging over the resulting thickness gradient revealed the detailed structure of actin networks and enabled tracing of branch structures within the actin waves also in~300 nm thick regions [157]. The boost in contrast from the VPP can be a decisive advantage to resolve features of cellular structures that require imaging in thicker areas. However, for extensive sub-tomogram averaging aiming for achieving high resolution, the use of VPP brings additional challenges. For tomography tilt series acquired close to focus, VPP images render CTF fitting and correction difficult, limiting the attainable resolution to the first CTF zero. Nevertheless, the contrast enhancement is advantageous and often the only way to visualize small non-repetitive features in tomograms.
In a novel promising approach, phase shift can be generated by sufficiently strong laser fields, taking advantage of the ponderomotive force [158]. The laser phase plate does not suffer from signal loss inherent to thin-film phase plates and provides a stable phase shift that is tunable on-demand. However, at the moment the laser field strength is able to induce a p/2 phase shift of 80 keV electrons and further developments are needed before its use can be tested on 300 keV electrons commonly used in cryo-ET [159].

The missing data problem
Distortions resulting from the missing information at higher tilts beyond 60°are a fundamental problem in cryo-ET [37]. One way to approach this is to fill the frequency space by averaging sub-tomograms of objects in different orientations [160]. Another possibility is to optimize the information content in one tomogram. Computationally it is possible to infer what the missing wedge data would be, but it is not possible to recover unobserved data [161]. Obtaining tomograms over the full angular range has been successful for small bacteria trapped in glass capillaries [162]. To extend the angular range, circular FIB-milling patterns could be used to make cylindrical specimens at the expense of a much reduced volume. However, to precisely align cylindrical specimens for tilt series acquisition would be challenging and might render this approach impractical.
Another challenge in cryo-ET is the low SNR of the recorded images resulting from the electron dose restrictions. In light of recent technological improvements in electron detection and software analysis, Peet et al. [163] set out to accurately estimate the optimal electron energy per induced radiation damage for cryo-EM. Their results confirm that the elastic crosssection contributing to signal in the TEM images increases proportionately faster than the inelastic cross-section contributing to radiation damage, and accurate measurements suggest that for thinner samples of up to 600 A using lower energy electrons (100 keV) would result in up to 25% improvement in information extraction, if appropriate detectors existed. For tomography the ability to image thicker samples would be desirable, but there is a limited advantage of increasing the acceleration voltage. The gain in penetration depth from 100 to 300 kV is 2fold, but increases only an additional 1.5-fold from 300 kV to 1.2 MV as it goes along with a steep increase in the equipment costs and a higher likelihood of knock-on damage [164]. Nevertheless, the suggested direction is interesting, because it challenges the paradigm, that the sample thickness should be adjusted for TEM imaging. With automated tuning of microscopes, it would become possible to switch between different acceleration voltages and adjust the microscope to the specimen thickness to achieve optimal information transfer. The same authors also discuss the benefit of chromatic aberration (Cc) correction that would render inelastically scattered electrons usable for phase contrast and enable an overall increase in the SNR.
An idea put forward by Danev et al. [165] is to use defocus modulations to optimally extract sample information. Inserting a small electrostatic lens at the back focal plane would allow control and fast tuning of the defocus parameter during acquisition, which can be used as an optical dose-dependent filter that follows the gradual demise of high-resolution structural features. For tomography this would allow to more evenly fill the frequency space by combining close-tofocus high-resolution information with high-defocus information for more reliable CTF determination and tilt series alignment within one movie frame sequence.
Cryo-ET has the unique potential to bridge the gap between the cellular and molecular worlds. Ideally, it would provide structural information over large cellular volumes. But the quest for high-resolution cryo-ET is incompatible with imaged volume. Imaging at high magnifications comes at the cost of reducing the field of view and imaging thick samples results in low-quality data. Considering that optimal specimen thickness for 300 keV electrons is around 100 nm [163], such tomograms show a small fraction of the cellular volume. For example, a 100 nm thin lamella of a typical baker's yeast with a diameter of 5 µm would contaiñ 0.22% of its total volume, if imaged at 2.5 A/pixel on a 4 k detector. Large assemblies, such as nuclear pore complexes or vesicles, might not even fit in their entirety into 100 nm thick slabs, compromising the benefits. Depending on the biological question being addressed, often a compromise to record over a larger volume and to accommodate the features of interest at the cost of resolution is advantageous. To ensure highresolution snapshots of the cell's interior are representative of the cell's state and to avoid missing rare events of interest, requires acquisition of many tomograms. Considering the increasing throughput in tomogram acquisition it is not unreasonable to consider covering large areas in a tessellated cryo-ET approach, where tomograms would be in silico stitched together to represent the entire volume of thinned specimens. This means a 100-fold increase of the imaged area compared to a single tomogram for a 12 by 12 µm lamella, using the same imaging conditions as above. While beam-induced structural changes will present a challenge for implementing this approach to cryo-ET, such approaches were already successfully applied to plastic-embedded sections of, for example, U2OS cells [166]. Although overlapping regions will have higher electron exposures, the central parts of tomograms would retain the high-resolution information for subtomogram averaging.
While such ideas are exciting in theory, practical solutions have to be found.

The promise of Cryo-ET
Cryo-ET has made great progress over the last decade, which is reflected by the increasing number of successful applications. It has provided accurate views of unperturbed cellular landscapes, revealing their native molecular organization. For a long time, studying isolated molecular complexes was the only possible way of obtaining structural information for a mechanistic understanding of their functions. However, technology advances in cryo-ET now enable in situ structural studies at 3-4 nm resolution and, when applying averaging methods, sub-nanometer resolutions can be achieved. We are just beginning to realize the potential of cryo-ET as a tool for studying macromolecular crowding. Phase transitions in lipid droplets can be already detected and attributed to different cellular states [167], but the information contained in tomograms, when properly calibrated, could be used for measuring systematically local density fluctuations inside cells. This holds the potential to detect phase separation phenomena and map regions that define membraneless organelles [168]. While in favorable cases cryo-ET can provide near-atomic resolution, even at nanometer resolution, the cellular environment offers a tremendous potential to discover unexpected scenarios. An increasing number of reports have revealed that macromolecules are organized into functional microcompartments at the nuclear pore basket [64] or the endoplasmic reticulum [65], highlighting the importance of in situ studies. However, the difficulty of in situ cryo-ET lies in mining the rich information contained in the tomograms. Cellular complexity presents a challenge to identify macromolecules when sub-tomogram averaging fails to provide the resolution for unambiguous identification. Integrative approaches have shown success where individual methods could not provide the full picture and helped generate pseudoatomic models of large assemblies such as the 26S proteasome [169] or the nuclear pore complex [170]. In proximity to the already mapped assemblies, interacting smaller proteins could be modeled based on distance constraints provided by cross-linking MS data. There is no reason to believe that the integrative modeling approach could not be expanded to tomograms. In fact, a tomogram spanning~1 µm 2 and up to a few hundred nm in thickness can provide a structural framework to integrate functional data and for docking high-resolution atomic structures into their cellular context. In addition, integrating MS data allows crossvalidating tomographic results. Quantitative MS has the possibility to provide the ground truth of protein copy numbers, which could be used to assess the number of detected complexes and those that escaped detection. On a larger scale, cryo-ET can itself be integrated into larger volumes provided by FIB-SEM to provide pictures of whole cells. There is still a lot of potential to improve the workflow of sample preservation, data recording, information extraction, quantitative interpretation, and finally data integration with other methods. Developments in instrumentation and image analysis will undoubtedly continue to push the boundaries forward. They will help advance our understanding of known structures and lead to the discovery of new structural patterns orchestrating cellular functions.