Structure-based computational design of antibody mimetics: challenges and perspectives
Elton J. F. Chaves and Danilo F. Coêlho contributed equally to this article.
Edited by Claudio Soares
Abstract
The design of antibody mimetics holds great promise for revolutionizing therapeutic interventions by offering alternatives to conventional antibody therapies. Structure-based computational approaches have emerged as indispensable tools in the rational design of those molecules, enabling the precise manipulation of their structural and functional properties. This review covers the main classes of designed antigen-binding motifs, as well as alternative strategies to develop tailored ones. We discuss the intricacies of different computational protein–protein interaction design strategies, showcased by selected successful cases in the literature. Subsequently, we explore the latest advancements in the computational techniques including the integration of machine and deep learning methodologies into the design framework, which has led to an augmented design pipeline. Finally, we verse onto the current challenges that stand in the way between high-throughput computer design of antibody mimetics and experimental realization, offering a forward-looking perspective into the field and the promises it holds to biotechnology.
Abbreviations
-
- ∆∆G
-
- binding free energy difference
-
- AF2
-
- AlphaFold2
-
- AF3
-
- AlphaFold3
-
- AI
-
- artificial intelligence
-
- ANN
-
- artificial neural network
-
- dArmRP
-
- designed armadillo repeat proteins
-
- DARPIN
-
- designed ankiryn repeat proteins
-
- DL
-
- deep learning
-
- DPPM
-
- denoising diffusion probability model
-
- FN3
-
- fibronecting type-III
-
- GN
-
- generative model
-
- hACE2
-
- human angiotensin-converting enzyme 2
-
- IL-17A
-
- interleukin-17A
-
- kDa
-
- kilodalton
-
- mAb
-
- monoclonal antibody
-
- MC
-
- Monte Carlo
-
- ML
-
- machine learning
-
- MSA
-
- multiple sequence alignment
-
- MSE
-
- mean square error
-
- NMR
-
- nuclear magnetic resonance
-
- PDB
-
- protein databank
-
- R&D
-
- research and development
-
- RBD
-
- receptor binding domain
-
- RIF
-
- rotamer interaction field
-
- RMSD
-
- root mean square deviation
-
- SARS-CoV-2
-
- severe acute respiratory syndrome coronavirus 2
-
- SASA
-
- solvent accessible surface area
-
- VEGF-A
-
- vascular endothelial growth factor A
-
- VHH
-
- variable heavy domain
Recent advancements in therapeutic antibody research have led to significant progress in both key technologies and theoretical innovations. This encompasses the development of antibody-drug conjugates, antibody-conjugated nucleotides, bispecific antibodies, nanobodies, and various other antibody derivatives. Furthermore, therapeutic antibodies have been effectively combined with technologies from other fields, giving rise to novel interdisciplinary applications, including cell-based therapies [[1]]. In fact, the biopharmaceutical industry is one of the most dynamic innovation and business ecosystems, with an estimated investment of hundreds of billions of dollars annually. In the United States alone, it accounted for 17% of dollars spent on domestic research and development (R&D) in the year of 2020, nearly doubling the investment on software development in the country [[2]]. Its main product, monoclonal antibodies (mAbs), can be designed to specifically target disease-causing molecules or cells, minimizing off-target effects. According to a report from Future Market Insights, the antibody therapy market in 2023 accounted for USD 235 billion and it is expected to reach USD 824 billion in the next decade. Most mAbs come from natural sources, offering biocompatibility advantage, and reducing the risk of adverse reactions when employed in vivo. They have been developed to treat a wide range of diseases, including cancer, autoimmune disorders, and infectious diseases. However, producing mAbs requires complex and highly specialized protein production technology, and its cost precludes population-wide use of this class of molecules.
The development of synthetic antibody-mimetics (proteins structurally not related to antibodies, but capable of exerting similar function) has been explored as an alternative to the limitations above. Unlike biopharmaceuticals, antibody mimetics offer simpler and scalable production via chemical synthesis or microbial fermentation. However, the development of an antibody mimetic typically required a significant investment in R&D. In addition to designing and optimizing novel structures (e.g., design target properties, engineer stability, solubility, and improve biocompatibility), validating their efficacy and safety in vivo may be challenging due to their novel and engineered nature, requiring extensive preclinical and clinical testing. Nevertheless, its versatility potential and cost of production are unmatched. A comparison of the main advantages and disadvantages of using antibody mimetics and conventional antibodies is summarized in Table 1. To date, nearly two-dozen scaffold classes of antibody-mimetics have been explored, in addition to a few tailored designs. Figure 1 illustrates the structure of the current most used scaffold classes, highlighting their binding domains. (As the focus of these review is on the computational design approaches for antibody mimetics, we suggest the review by Yu and colleagues for a more in-depth biomedical applications for these molecules) [[3]].
Main advantages | Main disadvantages | |
---|---|---|
Antibody mimetics | Size control: typically, smaller and simpler in structure compared to conventional antibodies, which allows for better tissue penetration and potentially reduced immunogenicity | Achieving high affinity: compared to conventional antibodies, achieving binding affinities and target specificity is challenging, often requiring multiple rounds of design |
Engineering flexibility: mimetics can be engineered with specific properties tailored to their intended applications (e.g., enhanced stability and/or binding affinity) | Clinical validation: extensive validation and clinical track record are needed, compared to conventional antibodies, potentially raising efficacy and safety concerns | |
Versatility of administration: mimetics can be designed to control the via of administration. They are often small enough to be orally administered, if desired, offering advantages in terms of patient convenience and compliance | Shorter half-life: mimetics may have shorter half-lives in circulation compared to conventional antibodies, requiring either more frequent dosing for therapeutic applications or fusing to other proteins to enhance their half-life | |
Structural diversity: mimetics can be derived from various sources, including synthetic peptides, small proteins, or non-protein molecules, and even novel scaffolds, providing a wide range of options for development | High development cost: development of mimetics by experimental means involves great financial risk, historically being mostly undertaken by the private sector | |
Production cost: they can be engineered to be produced in prokaryotes and to yield large quantities | Lack of effector function: mimetics do not carry the antibody constant fraction region | |
Conventional antibodies | High specificity: conventional antibodies, particularly monoclonal antibodies, exhibit high specificity for their target antigens, which minimizes off-target effects | Complex structure: conventional antibodies have a complex structure, making them expensive and challenging to produce at scale |
Natural recognition: they rely on the natural immune system's mechanisms for target recognition, ensuring biocompatibility | Immunogenicity: antibodies derived from non-human sources can provoke immune responses, potentially limiting their therapeutic use | |
Versatility: antibodies can be modified and engineered for various applications, including therapeutics, diagnostics, and research tools | Limited tissue penetration: their large size can hinder tissue penetration, affecting efficacy in certain therapeutic applications | |
Long half-life: IgG antibodies have a relatively long half-life in bloodstream, providing sustained therapeutic effects | Storage and stability: antibodies require specific storage conditions and can degrade over time, affecting their shelf life and efficacy | |
Well-established production: large-scale production methods for conventional antibodies are well-established, facilitating manufacturing for commercial purposes | Production cost: it requires eukaryotic cell lines for production due to post-translational modifications |
As the field of computational protein design developed, these techniques have been employed as a way of speeding up the achievement of suitable structural properties (such as those mentioned above), thus reducing R&D-related costs, especially those associated to the early stages of development. These methods have been mainly used to leverage the potential of already validated classes of antibody mimetics by harnessing, in silico, all possible sequences that fit the desired function criteria. However, the recent association of AI technology to computational protein design now allows the development of novel binders that do not rely on predetermined protein templates. It unfolds an unprecedent potential for exploration of the flourishing field beyond nature's protein portfolio.
Main classes of antibody mimetics
Affibody
Based on the B-domain of staphylococcal protein A, it has a molecular weight of ca. 6 kDa. Affibodies are designed to bind to specific target molecules, such as proteins or peptides, with high affinity and specificity. Advantages over conventional antibodies include smaller size, simpler structure, and ease of engineering. Affibodies have applications in various areas including diagnostics, imaging, drug delivery, and targeted therapy [[4]]. Izokibep, an affibody-based biopharmaceutical, was shown to bind to and inhibit the activity of IL-17A in in vitro and in vivo assays using a murine model. It was also found to be safe and well-tolerated in phase I and II clinical studies for the treatment of psoriatic arthritis [[5]].
Affimer
Previously known as Adhiron, its scaffold is based on the human protease inhibitor, Stefin A [[6]]. Affimers have a molecular weight of ca. 11 kDa, and their structure contains four β-sheets, one α-helix, and two variable loops. These loops consist of nine amino acids each, used to design binding interfaces for a desired target. Up to date, affimers have been mainly designed by molecular or directed evolution techniques, where a diverse library of potential binding proteins is created and screened for those with the desired properties. They have been used as molecular probes for studying protein interactions, as diagnostic tools for detecting biomarkers or pathogens, and as therapeutic agents for targeting specific molecules involved in diseases such as cancer or inflammatory disorders [[6, 7]].
dArmRP
Designed Armadillo Repeat Proteins (dArmRPs) have an armadillo domain, consisting of sequential armadillo repeats (8–12 internal repeats), each containing approximately 42 amino acids. They vary in molecular weight by 39 and 58 kDa. Each repeat consists of three α-helices, designated as H1, H2, and H3. Inserted between the N and C terminus, the helical repeats protect the hydrophobic core from exposure to the solvent. They have been computationally designed to recognize and bind to peptide ligands, overcoming the limitation of antibody specificity upon peptide flexibility. Other uses include drug delivery, and molecular imaging [[8-10]].
Anticalin
Derived from lipocalins, a family of naturally occurring proteins that typically bind to small hydrophobic molecules, anticalins were created through protein engineering techniques to have binding sites tailored for specific targets, such as drugs, metabolites, or other molecules of interest. Their structure consists of a cup-shaped pocket weighing about 20 kDa. Biomedical applications include the delivery of biopharmaceuticals across the blood–brain barrier [[11]], and theranostic applications [[12]].
DARPIN
Proteins composed by 33 amino acids ankyrin repeat motifs, arranged into two linked α-helices in opposite directions, and connected to the subsequent repeat through an elongated β-turn. DARPins are typically synthesized through combinatorial protein design using libraries comprising two to three ARPs repeated motifs. These motifs are sandwiched between positively and negatively charged N- and C-terminal caps, typically incorporating six random positions within the β-turn and the first α-helix of each repeat. As the number of monomers can vary, DARPINs will have a minimum molecular weight of 14 kDa. The scaffold has been designed for several applications, such as antivirals [[13-15]], and cancer treatment. The most progressed DARPin compound in clinical development is Abicipar pegol, an antagonist of VEGF-A. It is currently undergoing phase III trial to explore its potential to treat ophthalmic conditions including neovascular age-related macular degeneration and diabetic macular edema [[16]].
Miniprotein
This is a case of a tailored de novo design conceptualized by the group of David Baker [[17, 18]]. These small proteins are typically formed by fewer than 50 residues. Despite their small size, they can fold into three-dimensional structures, often 3- or 4 helix bundles, with a molecular weight ranging from 5 to 10 kDa. As showcase, the Baker Labs developed designs against the RBD (receptor binding domain of spike protein) of SARS-CoV-2, with affinities ranging from 100 pm to 10 nm, and able to block virus infection in vitro [[19]].
Monobody
Based on the human fibronectin type III domain (FN3), this scaffold has an immunoglobulin-like fold, with a molecular weight ca. 10–15 kDa. Monobodies lack disulfide bonds, and thus, they are particularly suited as genetically encoded reagents to be used intracellularly [[20]], while the small and simple structure of monomeric monobodies confers increased tissue distribution. When designed in a bead-on-a-string-like assembly, multiple domains of FN3 can bind to different targets, overcoming the multi-specificity challenge of conventional antibodies. Furthermore, full-length fibronectin can fold into multiple conformations as part of its natural function, providing structural and sequence versatility to monobodies [[21]], with affinity and specificity that rival those of antibodies [[22, 23]].
Nanobody
Nanobodies, also known as VHH or single-domain antibodies, are a class of antibody fragments derived from the variable region of heavy-chain antibodies found in camelids, such as camels, llamas, and alpacas. The proteins consist of a single monomeric antibody domain. With a molecular weight around 12–15 kDa, it makes them one-tenth the size of conventional antibodies. Despite their small size, nanobodies retain high specificity and affinity for their target antigens, making them valuable tools in various biomedical applications. While they are generally obtained by animal immunization or phage display techniques, computer design of nanobodies have recently become popular [[24]]. In silico affinity maturation has also been used to improve the thermal stability and binding affinity of natural nanobodies [[25]].
Computational protein design as the foundation for antibody mimetic development
The design of antibody mimetics is based on protein engineering principles. Our understanding of protein structure and function has matured significantly since the groundwork of Linus Pauling and Francis Crick in 1950s, allowing us to design proteins with specific properties. Protein engineering techniques can be classified as empiricism-based (e.g., directed evolution, phage display) or mechanism-based (e.g., comparative modeling, de novo design). Although purely experimental designs have been largely successful, they are cost and labor-consuming, and the lessons learned are usually not applicable to unrelated systems [[26]]. In addition, evolution explores limited protein sequence space, leading to clustered natural protein families [[27]]. De novo design allows exploration of broader sequence space, leveraging protein biophysics principles.
Computational protein design methods are based on thermodynamics principles and biological observations, and they can be used for practically all classes of proteins [[27, 28]]. However, these methods rely on Anfinsen's Thermodynamic Hypothesis, that is, proteins fold into the lowest energy states that are accessible to their amino acid sequences [[29]]. In the last decade, rational design based on computational modeling and structural analysis emerged as a powerful strategy to engineer proteins with enhanced stability, activity, and/or specificity. It is based on the assumption that the geometry of a protein, together with the specific presentation of charges and molecular groups on its surface, determine its function. A given sequence of amino acids (primary structure) often leads to a specific three-dimensional structure. On the other hand, different combinations of residues with similar properties can also lead to the same final topological structure. Protein design aims to determine an amino acid sequence that will fold into a three-dimensional structure to perform a specific function. Thus, the key points are the configuration sampling method and the energy function used to predict stability and binding for searching the lowest energy model.
The advancement in rational protein design has historically been related to the progress of structure prediction methods and the increase in the number of experimentally determined target structures. Until three decades ago, most of computational techniques were limited to modifications through site-directed mutagenic assays at binding sites, surfaces, and interfaces of well-defined and untouched frameworks. This scenario has changed since 2003 after design of a novel protein from scratch called Top7. Using Monte Carlo search with molecular force fields and scoring functions, a novel protein was designed with unprecedented topology at the time [[30]]. To achieve the final model, several sequence design iterative cycles and backbone optimizations were performed. X-ray crystallography and NMR resolved the structure of the unnatural 93-mer α/β fold protein. Comparison with the model showed a backbone RMSD of approximately 1 Å, and the protein also exhibited remarkable thermodynamic stability [[30]]. That provided Top7 with a rich work portfolio that has vast implications on the most diverse areas of medicine and biotechnology as an ultra-stable scaffold [[31, 32]]. Since then, structure-based protein design has been marked by exceptional advances and significant increase in the number of proteins designed with high levels of complexity. However, we are just now experiencing a further substantial advance in the field. The recent use of neural networks on computational protein prediction (e.g., AlphaFold2 [[33]] and RoseTTAFold [[34]]) have allowed solving large protein structures with atomic precision and remarkable rapidness, overcoming half a century of challenges. These advances have paved ways for the development of robust, yet efficient, sampling algorithms, and sophisticate design methods.
Design of antibody mimetics
While protein prediction methods have matured in the last few years, protein complex prediction has lacked behind. Success is highly dependent on the strategy to estimate parameters that affect the binding affinity. Predicting association strength and complex structure accurately often requires combined techniques and experimental data guidance [[35]]. This is likely based on the electrostatic diversity of protein–protein interactions. Algorithms were initially designed to fold proteins, which almost invariably are formed by a hydrophobic core and a mostly hydrophilic solvent accessible area. In contrast, protein–protein interactions are seldom characterized by hydrophobic contacts only. Designing antibody-mimetics considering target epitope molecular signature yields higher success rates than postdesign optimization. Based on that strategy, a number of target-oriented computer protein design methods have been used to the development of antibody mimetics. The main techniques are discussed below, and a schematic workflow of each technique is shown in Fig. 2.
Docking and design
When designing a new protein, sometimes the goal is just to modify positions in the amino acid sequence of an already known protein structure complex. In this case, the template is the native protein complex itself, and the design is performed on top of it. This method is commonly used to design proteins with improved binding affinities to other proteins or ligands [[36]]. Alternatively, key-interacting residues in the antigen interface are selected as starting point for the design of the new protein (antibody mimetic). The idea is to search high-resolution protein structures to be used as scaffolds with complementary shapes to antigens. Then, engineer interaction interfaces using local docking and designing steps. As in protein folding, protein association is driven by energy minimization, induced by van der Waals interactions, hydrophobic effect, electrostatic interaction, hydrogen bonding, shape, and chemical complementarity of the interaction partners [[37]]. Sequence optimization is performed every cycle, and a scoring function is used to find residues that stabilize the interaction and improve the binding propensity. Subsequently, the interaction interface sequence of the antibody mimetic remains fixed. Sequence optimization follows for remaining scaffold residues to ensure stability and solubility. A similar strategy to design interface is the so-called hotspot-centric approach, which consists of docking disembodied residues, selecting suitable scaffolds displaying residues at similar positions of the hotspots, and refining the interface [[38, 39]]. This approach has recently been used to design a biopharmaceutical targeting the conserved fusion loop region of the Envelope proteins of flaviviruses. The protein neutralized infections by Zika and Dengue serotype 1 and 2 viruses in vitro, with an EC50 on pair of human monoclonal antibodies [[39]]. Another alternative is the rotamer interaction field (RIF) docking, which searches for hotspots for protein interface interaction from scratch (de novo). Disembodied residue conformations are docked onto the target interface to create favorable hydrogen bonds and hydrophobic interactions [[40]]. Scaffolds that displays residues at similar positions are subsequently used as starting point for protein design. Residues onto the scaffold are mutated by the residues outputted by RIF docking, followed by interface and scaffold optimizations.
Motif grafting design
Knowledge of the antigen–antibody interface structure is required as a starting point for the motif grafting method. Typically, antibody loops or selected regions of the interacting interface are grafted onto a protein scaffold [[41]]. The method consists of the following steps: define the motif to be transplanted, structurally align the motif to a putative carrier protein to determine the best region for grafting, transplant the motif, and redesign the scaffold protein around the grafted motif to ensure folding and stability of the chimeric protein. After the three-dimensional alignment, one can choose between keeping the backbone scaffold and grafting only the side chains of the antibody (side-chain grafting) or discarding the structural elements of the carrier, replacing it completely with the structural motif of the antibody (backbone-grafting). A flexible-backbone remodeling is employed to optimize the conformation of the protein skeleton after its modification [[42]]. Nevertheless, motif grafting by sidechain or backbone replacement faces limitations when the motif is too complex to find a structurally compatible protein scaffold. Proteins sporting a single immunoglobulin domain motif (e.g., VHH, selectins, ankyrin repeat proteins) are more commonly used with this technique.
De novo design
Template-based techniques have found several critical restrictions, imposed by strict adjustments of motifs onto scaffolds as well as the inconvenient requirement for well-defined and suitable frameworks. Comparative methods rely on known homologous structures, limiting design to pre-existing interfaces and excluding exploration of new sites [[37]]. To address the design of antibody mimetics to more complex epitopes, de novo methods represents a powerful alternative. It has been used to design new proteins around a given motif/epitope [[43, 44]].
De novo proteins began to be designed around binding motifs using minimal knowledge of their interactions, but without prior knowledge of scaffold atom-positions. At first, these methods were considered ab initio or template-free techniques, as they did not use any known structure as a base. As techniques have become more sophisticated, the boundaries between different categories have become less clear [[45]]. There are methods today that are considered hybrids between template-free, and template-based. In de novo design, the target sequence is not compared with known proteins but to a blueprint of α-helix and β-sheet fragments, creating low-energy structures [[27]]. Folding is based on the basic principles of the diffusion-collision model developed by Karplus and Weaver [[46]], where once a local thermodynamically favorable interaction is formed, it is maintained, creating a bias by bringing together other contacts into spatial proximity. The process is repeated until the complete folding of the protein. Generally, pipelines are conceived through three steps: (a) The core is built first by an assembling process starting from valine-based secondary structures in the presence of the target; (b) the sequence is subsequently tailored on fixed backbone, taking into account the chemical environment (which also includes the antigen interface) and amino acid occurrence in secondary structure elements through layer-based design approaches; and (c) the sampling of possible backbone conformations is done using a library of short amino acid fragments based on the distribution of homologous structures taken from the PDB [[45]]. The fragment library covers accessible local structures, while side chains are assembled from a rotameric library. The final structure is assembled using Metropolis/Monte Carlo [[47, 48]]. Binding affinity is achieved through successive interface design calculations at the interface with the aim of maximizing the number of buried hydrogen-bonds and creating hydrophobic contacts as good as commonly found in native protein–protein complexes [[49]]. Although binding free energy would be expected to be the most relevant feature, due to the empiricism nature of the scoring functions, other interface variables must be considered for a proper and more realistic interface description. Interfaces are evaluated by comparing the difference in binding free energies (∆∆G), solvent accessibility surface area (SASA), shape complementarity, number of total and unsaturated hydrogen bonds, and ∆∆G/SASA of each decoy to the pool of generated structures. Although these energetic quantities do not find adherence in reality and therefore cannot be directly related to experimental measurements, they are crucial to triage the best candidate out of several thousand, or even a few million, generated designs. We highlight as showcase a work from the Baker's group that have recently harnessed the power of de novo methods to design a potent antibody mimetic against infection by the SARS-CoV-2 virus [[19]]. Miniproteins were developed to specifically block the interaction of human angiotensin-converting enzyme 2 (hACE2) to the receptor binding domain (RBD) from SARS-CoV-2 Spike protein, and thereby preventing the entrance of the virus into host cells. They showed that two selected candidates, from several thousand designs, prevented virus entry into host cells, and prevented lung disease and pathology in mice [[50]].
Machine and deep learning–driven design
Predicting binder-target strengths with experimental precision is crucial for picking out the best candidates. Nevertheless, accurate calculation of the absolute binding free energies for protein–protein interactions has posed a challenge in protein design methods. Historically, calculation of protein–protein free energies at experimental accuracy were restricted to enhanced sampling methods, which are highly demanding in terms of computational requirements. The advent of artificial intelligence (AI) in the most diverse areas of knowledge has inspired the development of new approaches with great potential for engineering and triage protein–protein complexes without prior structural knowledge, at exceptional performance. In the last few years, the community has resourced from machine and deep learning algorithms to reweigh the molecular feature contributions to experimental binding free energies. The approach has allowed for similar accuracy to more costly methods, at an unprecedent efficiency, making it possible to triage high-affinity binders, from thousands or even millions of candidates, in a high-throughput fashion [[51-53]]. In addition, AI has also been used to design antibody mimetic scaffolds. It provides advantage over template-based and ab initio methods. The former cannot predict new protein folds. While the latter may provide a greater range of new topology possibilities, it requires extensive sampling that results in a higher computational cost.
AI refers to systems or machines that imitate human intelligence. More specifically, machine learning (ML) and deep learning (DL) are subsets of AI that focus on building or improving predictive models that learn from data or identify informative groupings within data. ML has been used to improve existing antibodies, but most developed algorithms rely on deep sequencing or deep mutational scans for training data, and a shortcoming of these methods is that they are often specific classes of antibodies and applying them to other antibodies would require training with new data [[54]]. Few examples have been able to generate new antibodies and antigens without antibody-specific sequencing data [[55-57]].
Among AI methods, artificial neural network (ANN) and generative models (GM) have gained a lot of attention in the development of methods for structural biology, whether in the description of interactions between biomolecules or for structure modeling [[58-62]]. Neural networks, a set of DL techniques, are mathematical models that mimic the connectivity and behavior of neurons in the brain. Artificial neurons, which are the building blocks of ANNs, are simply mathematical functions that convert inputs to outputs in a specific way. To create an ANN, artificial neurons are organized into layers, with the output of one layer being the input of the next. On the other hand, GM is designed to generate new data instances that resemble the training data they were trained on. They learn the underlying structure of the data and then use this knowledge to generate new samples. Among the ANN and GM models available in the literature for protein modeling, two stand out: AlphaFold2 [[33]] and RFdiffusion [[34]].
AlphaFold2 (AF2) predicts protein structures by leveraging neural networks and training procedures based on evolutionary, physical, and geometric constraints [[33]]. The core of AF2's architecture is the Evoformer block, understanding how different parts of the protein are related to each other in 3D space and represents the protein as a graph where each amino acid is a node and the connections between them are edges. Evoformer uses two input representations: the Pair Representation and the Multiple Sequence Alignment (MSA). The former captures the relationships between pairs of amino acids in a matrix describing how two amino acids interact to each other, while the MSA matrix represents homologous sequences, identifying positions prone to simultaneous mutations (coevolution) and potential contact points. Evoformer updates these representations through 48 blocks, ensuring accurate 3D structure representation by applying geometric constraints and using axial attention to focus on critical information. This continuous information exchange between the MSA and pair representations enables precise structure predictions. AF2 achieved top performance in CASP14 with a median backbone accuracy of 0.96 Å RMSD and an all-atom accuracy of 1.5 Å RMSD [[33]]. AF2 was also adapted to predict multichain complexes (AlphaFold-Multimer) [[63]] and benchmarked for antibody–antigen interactions, achieving 43% top-ranked results for various protein complexes [[64]]. Recently, AlphaFold3 (AF3) was released [[65]], featuring a diffusion-based architecture capable of predicting structures containing proteins, ions, nucleic acids, and small molecules, with improved accuracy for antibody–antigen complexes. AF3 replaces Evoformer with a simpler module and predicts atom coordinates using a diffusion module. The MSA is processed in four blocks and incorporated into the pairwise representation, which is then processed through 48 blocks in the new Pairformer module. Following, the diffusion module refines the initial atom coordinates through a denoising process. AF3 surpasses classical docking tools and shows enhanced accuracy for protein complexes, including antibody-protein interactions [[65]].
The RFdiffusion method replaces the physical-based Rosetta methods to DL approaches, aiming to predict diverse proteins and scaffold-free binder interactions with atomic accuracy and unprecedented success [[34]]. RFdiffusion is an updated version of RoseTTAFold [[66]] that uses denoising diffusion probability models (DPPMs) to generate low-resolution backbone models, and then use the ProteinMPNN network [[58]] to subsequently design sequences encoding these structures. Binders are designed similarly to creating photo-realistic images from textual instructions, resulting in novel proteins with higher binding potential and experimental success. The denoising process acts on a random sample of residue backbone, through an interactive DL-based design workflow, which disrupts coordinates toward true proteins, by minimizing the mean square error (MSE) to design the sequences. The authors demonstrated that RFDifusion is capable of generating binders for proteins used as target context, by selecting input residues in the target chain (defined as hotspots) to which the designed chain binds. The proof of concept was carried out with the target proteins Hemagglutinin Influenza A H1, Interleukin-7-α Receptor, Programmed Death Ligand 1, Insulin Receptor and Tropomyosin Kinase A Receptor, showing the potential of RFdiffusion for binders designing [[34]].
More recently, a new generative AI model was released to the public for use in structural biology. The 310 CoPILOT model was developed by 310 AI as an AI Chat for Designer Bio (https://310.ai/). This model allows you to perform tasks in structural biology through a web-based chat platform, making use of third-party tools such as search for and load proteins from the UniProt database, compare proteins using the TM-Align method [[67]], fold proteins using the ESM Fold method [[68]], and design with ProteinMPNN model [[58]]. Furthermore, it also allows the use of 310.ai's own algorithm for designing new proteins. For binder design, the model allows the user to fold an antibody mimetic and then dock it with the target structure. Up to date, it is a tool still under development that requires a considerable amount of work to deliver its promises.
Challenges and perspectives
The recent progress on the computer engineering of antibody mimetics has eased, but not eliminated the challenges that stand between direct design and experimental application. Successful design of antibody mimetics often requires several rounds of experimental validation. Through this process, it is possible to identify the required changes in the design or computational protocol to fine tune the structural changes needed to achieve its desired biological function [[45]].
In computational protein design, the quality of the final model depends on the efficiency of sampling and on the accuracy of the energy function [[28]]. Commonly, the configuration sampling method is a time-independent stochastic process, such as Monte Carlo (MC) [[69]]. However, the stochastic nature of the method leads to not sampling every regional energy minima [[70]]. The free energy of binding, which is directly related to the binding affinity, is the most important indicator of a protein-binding strength, and the most challenging to predict. The energy function is usually based on classical molecular mechanics force-fields associated with other empirical terms that employ simplified interaction potentials, offering computational speed but at the expense of a certain degree of accuracy [[27, 70]]. Another well-known limitation for computational protein design is the solvent treatment. Water molecules are important for the structure, stability, dynamics, and function of proteins [[71]] and generally should not be treated as an implicit interaction agent due to importance of desolvation penalty. The most frequent reasons for failure in protein design are insolubility and the formation of unintended oligomeric states. Proteins that bind to other proteins usually have hydrophobic residues on their surface, which may lead to unanticipated intermolecular hydrophobic interactions and aggregation. Increasing the robustness of designs will require improvements in the accuracy of the energy function not only for free energy binding, but also thermodynamic stability of monomeric proteins [[27]]. In addition, for such calculations to be useful in protein–protein binding discovery, where it is common to produce in silico millions of candidates, the predictions must be rapidly computed, preferably within at most a few hours, and they should also be accurate and reproducible.
In the last few years, AI methods have found use in all aspects of protein design, from weighting scoring functions and sampling space enhancement to the design process itself. These models can achieve high accuracy when predicting binding free energies. However, this accuracy is highly dependent on the quality of the experimental data used for training [[72, 73]]. In addition, while predicting binding free energies is key to design antibody mimetics, other aspects of developability need to be addressed, which include selectivity, stability, aggregation prevention, solubility, biocompatibility, deimmunization, bioavailability, and clearance rate. Other challenges that cannot be predicted, so far, include some proteins that are expressed recombinantly might be the toxic for the prokaryotic cell line used or pose other complexities of the bacterium's biology [[27]]. Therefore, unless these issues are met, the process of protein validation and progression to preclinical and clinical studies will still be largely hindered.
On the experimental side, current attempts to reduce the time required to validate a protein's usefulness in clinical settings include the implementation of automated processes aimed at promoting scalability of testing in a compatible time frame. Successful examples have shown that fully automated laboratories (also known as self-driving labs) can achieve higher standards of accuracy and quality controls in in vitro assays to determine protein biocompatibility and function, when compared to their conventional counterparts [[74, 75]]. Although automation offers a glimpse of hope toward accelerating the translatability of computationally designed antibody mimetics, this approach is hardly a tangible solution for the vast majority of research groups due to its inherent high cost, need for stringent cybersecurity and the intrinsic characteristics of a molecular biology laboratory [[74]].
Considering the current issues, the efficient translation from computational design of synthetic antibody mimetics to real-world applications seem to lie on the comprehensive implementation of AI methods into the design framework so that it goes beyond achieving binding affinity and stability. Toward this end, we envision that a multiple context optimization engine that combines protein structure, protein language, protein images, and biological labels may prove crucial at designing the next generation of novel antibody mimetics.
Acknowledgements
This work was supported by grants from FACEPE (APQ-0346-2.09/19); CNPq (303833/2022-0, 151860/2022-0, INCT-FCx); and the Oswaldo Cruz Foundation through its Innovation Program (VPPCB-007-FIO-18-2-134 and IAM-005-FIO-22-2-44).
Conflict of interest
The authors declare no conflict of interest.
Author contributions
EJFC, EGM, JCMS, and MJN performed an in-depth review of antibody mimetics; DFC and CHBC have reviewed the computational methodologies to design antibody mimetics. RDL has conceptualized the manuscript, oversaw the work, and wrote the final version. All coauthors have approved the submission of this manuscript.