Abstract
Immunotherapy has become increasingly popular in recent years for treating a variety of diseases including inflammatory, neurological, oncological, and auto-immune disorders. The significant interest in antibody development is due to the high binding affinity and specificity of an antibody against a specific antigen. Recent advances in antibody engineering have provided a different view on how to engineer antibodies in silico for therapeutic and diagnostic applications. In order to improve the clinical utility of therapeutic antibodies, it is of paramount importance to understand the various molecular properties which impact antigen targeting and its potency. In antibody engineering, antibody numbering (AbN) systems play an important role to identify the complementarity determining regions (CDRs) and the framework regions (FR). Hence, it is crucial to accurately define and understand the CDR, FR and the crucial residues of heavy and light chains that aid in the binding of the antibody to the antigenic site. Detailed understanding of amino acids positions are useful for modifying the binding affinity, specificity, physicochemical features, and half-life of an antibody. In this review, we have summarized the different antibody numbering systems that are widely used in antibody engineering and highlighted their significance. Here, we have systematically explored and mentioned the various tools and servers that harness different AbN systems.
Introduction
Schematic illustration of an IgG antibody: (A) Legends for the various parts of the antibody structure. (B) Antibody structure: depicting the variable domain of the light chain consisting of three CDR loops and FR regions, V
Antibodies also known as immunoglobulins (Ig) which are produced from B cells, serve as a crucial part of the immune system. Lymphocytes play a pivotal role in the antigen-specific or acquired immune response. Among these lymphocytes, B cells mainly conceit into plasma cells and produce antibodies, which are majorly responsible for humoral immunity, whereas T cells are responsible for cell immunity. T cells harbors a receptor (known as TCR) that enables them to identify a broad range of antigens derived from pathogens, tumors, and surroundings. Additionally, they play a crucial role in maintaining immunological memory and self-tolerance [1, 2]. In the field of drug discovery, development of a therapeutic monoclonal antibody via antibody engineering holds a stupendous potential to treat various diseases and disorders. To obtain a therapeutic antibody with a superior performance, it is certainly considered to conduct standardized analysis processes and various molecular aspects such as binding affinity, half-life (stability), specificity, effector properties, and antigenicity [3, 4]. Specifically, the strength of binding interaction between antigen and antibody is known as binding affinity of that antibody. Interestingly, the approaches designed by the next generation sequencing (NGS) makes far-greater insights into antibody library diversity by providing a large number of sequences (approximately 10
Globally, antibody research and development is significantly increasing with scientific and technological advancement. Monoclonal antibodies (mAbs) have a great therapeutic importance as they are produced by a single clone or B cell lineage and bind to a specific epitope, a segment of an antigen where the antibody binds. Besides, mAbs have been developed into effective clinical therapies and around 79 mAbs have been approved by the US Food and Drug Administration till the year 2020 [6]. Structurally, antibodies are glycoproteins having a molecular weight of 150 kDa which consists of two sets of identical heavy and light chains linked by disulfide bonds. The heavy chains contain variable heavy (VH) and constant heavy (CH1, CH2, CH3) domains, and the light chains have constant light (CL) and variable light (VL) domains. Antibody could be specific and has high affinity for antigen-antibody interactions primarily due to complementary determining regions (CDRs), CDRs are part of the variable domain, which are majorly responsible for binding to their specific epitope of an antigen. Additionally, in an antibody a set of CDRs which bind to a particular epitope of an antigen are known as paratopes or antigen-binding pockets. As shown in Fig. 1, the antibody variable domain is made up of four FR regions represented as FR1, FR2, FR3, and FR4; and three CDR regions represented as CDR1, CDR2, and CDR3. As compared to CDR regions FR regions have relatively more stable amino acids sequence in the heavy and light chains [7, 8]. As a natural requirement, the sequence diversity is essential for antibodies to bind on a diverse range of antigens. Hence, the gene arrangement of antibodies is highly complex that leads to various germlines formation for antibody repertoires [9].
The development of therapeutic antibodies by the engineering of the variable regions directed against the specific epitope of the target antigen demands a precise identification of the CDRs and hence requires a sufficient alignment of antibody sequences from human and non-human species. Researchers have observed that the framework regions of an antibody might as well exert a significant impact on the antibody affinity [10]. The various antibody numbering systems can help identify the precise corresponding positions of amino acid residues in the heavy and light chains of the immunoglobulin. Nonetheless, the use of various numbering schemes available are baffling and may possibly be responsible for the irregular identification of CDR and FR residues.
Antibody numbering systems: Timeline depicting evolution and progress of different numbering systems. Kabat first introduced a numbering system followed by Chothia, IMGT, Gelfand, AHo’s and Martin.
As the emergence of sequencing techniques and numerous structures of antigen-antibodies complex have been reported over the time, this has facilitated the statistical identification of an antibody. Eventually, the antibody numbering systems have become a crucial technique in immunoinformatics and antibody analysis [10, 11, 12]. It is established that the position of each residue in the antibody sequence is crucial for its binding affinity so it is of utmost importance to have accurate definitions of the CDR and FR regions [13]. Antibody CDRs numbering can be used to define and modify the antibody functions such as binding affinity, stability and decrease the non-humanoid antibodies immunogenicity [10, 11]. Chimeric antibody is an antibody molecule which is developed from different species and hence requires humanization for therapeutic applications [14]. Further, this process requires identification, accuracy of the CDRs and appropriate alignment of antibody sequences from humanoid and non-humanoid species. Necessity of the numbering system was recognized at a early stage of antibody research which eventually led to the development of various different numbering systems such as IMGT, Kabat, Chothia, Martin, and Honneger’s (AHo’s), Fig. 2 depicting the timeline evolution of the different numbering systems. In this review, we have discussed the various numbering systems and their importance in the perspective of antibody engineering and listed the various tools that are currently being used for identification and definition of the CDRs and FRs. Additionally, we have compared and analyzed outputs of different numbering systems by studying an antibody PDB ID (3SO3).
Kabat numbering system
Kabat and Wu were the pioneers to evaluate the differences in the composition of amino acids at each consecutive position in the variable regions of different antibodies. They have analyzed and aligned a total of 77 Bence-Jones protein and immunoglobulin light and heavy chain sequences, Bence-Jones proteins are monoclonal immunoglobulin light chain proteins that are excreted in the urine of patients having multiple myeloma type of cancer [15]. Moreover, they coined the term “Variability Parameter” which can be defined as the count of various amino acids at a given position divided by the frequency of the most occurring amino acid at that particular position. Their analysis and alignment studies eventually came up with the observation that three hypervariable regions exist in the variable region of the heavy and light chain; these regions are the part of the heavy and light chains that are in direct contact with antigen and are able to frequently mutate to allow diverse epitope specific recognition. It was observed that cysteine and tryptophan were two highly conserved amino acids in the variable region [15, 16, 17]. Kabat and colleagues aligned various light chains (
Comparison of different CDRs region: Anti-matriptase (MT-SP1) light chain sequence was taken from PDB ID (3SO3) and aligned. Pink, blue, green, purple and yellow represent different CDRs regions from IMGT, Kabat, Chothia, Martin and AHo’s numbering systems respectively. Different numbering systems represent three CDRs from different amino acids. Kabat and Chothia predict CDR1,2,3 on the same amino acid. IMGT, Martin and AHo’s predict CDRs positions at different amino acids.
In 1987, Chothia and Lesk introduced the first structure based antibody numbering scheme for variable regions. They aligned variable regions of different antibodies mainly based on the 3D structures of antibodies in which they explained in detail about CDRs forming loops and found out the possible insertional positions of amino acids in the CDRs of heavy and light chains. CDRs have a higher degree of variation in their sequences, however these loops have a limited number of main chain conformations which are referred to as “canonical structures”. Canonical structures are the three-dimensional structures that exhibit a definite number of conformations for five of six hypervariable loops of the antibodies. It has been observed that the length of the CDR and presence of amino acids within loop regions determine these conformations which eventually acquire the shape of antigen binding pocket or paratope [20]. The characteristics of a canonical structure are defined by including loop length, loop conformation and conserved amino acid residues that are present in CDR and FR regions of the antibody. Antibody Modeling Assessments demonstrated that accurate identification and appropriate utilization of canonical structure are crucial aspects of antibody modeling [20, 21]. It was suggested that certain residues were to be attributed more for variations in conformation in canonical structures, these residues include glycine, proline, aromatic residues and hydrogen donor and acceptors. Due to the greater number of structures available in 1997, Chothia and colleagues discovered a total 25 canonical classes in their publication [22]. Interestingly, all these classes were defined by manual grouping of antibody loops and sequences [20, 22].
Though Chothia numbering system is based on the antibody three dimensional alignment, this numbering shifts the amino acid insertion point from light chain position L27 to L30 and from heavy chain position H35 to H32 [23]. The most important aspect of the Chothia numbering system is that it assigns the same position number to structurally aligned residues from different antibodies and CDRs usually defined by matching the structural antigen-binding loops from the sequences of antibodies. However, this numbering system also has limitations since it is based on the similar length of CDRs region, therefore sequences with different sequence length have been ignored. The other challenge may be the vast diversity of antibodies present in a species and the very limited number of refined structures available that can increase the accuracy of the Chothia numbering system. It is to be noted that the Chothia and Kabat numbering systems predict the CDRs region at the same position, which implies that they are essentially the same (Fig. 3) except for the placement of insertions in CDR-L1 and CDR-H1.
Martin numbering system
In 2008, Martin and group [24] introduced antibody structural alignment of different CDRs and FRs regions which have different sequence lengths. Moreover, the Martin numbering system is the modernized and updated version of the Chothia scheme of AbN. To consider the FR regions, structural considerations for insertion and deletion in the variable region is one of the key features of this numbering system. The only difference between Chothia and Martin numbering systems is the site of insertions and deletions of CDRH1 and CDRL1 regions.
In addition, Martin and group used a quantitative clustering approach to define the canonical classes of variable loops instead of applying manual clustering of the antibody loops [25]. In Martin system, CDRs were grouped into the clusters based on their structural features; they have considered 244 hypervariable regions from 49 immunoglobulins fragment antigen-binding (Fab) or variable domain (Fv) structures that were resolved at resolutions between 1.7 and 3.1 Å [26]. The Martin numbering system has similar challenges because of the limited number of available structures to define the exact regions of CDRs and FRs of the antibody.
Gelfand numbering system
The Gelfand numbering system is one of the interesting but relatively complex AbN systems introduced in 1997, Gelfand and group defined nomenclature of antibody variable regions [27]. They divided light and heavy chain variable sequences in 21 parts, termed “words”. Every “word” matches the secondary structure element of the antibody like helix and beta sheets. Further, they sub-categorized the helix to two indexing letters (AB, BC, CD) and beta sheets in one indexing letter (A, B, C) [28]. This numbering system does not take into account the deletions or gaps, but correlates antibody secondary structures (helix and beta sheet) with aligned variable region sequences. However in the Gelfand system, several loops do not exactly match with the definition of the Chothia system [27, 29, 30].
Honegger’s numbering system (AHo’s)
The Honegger’s number system is also known as AHo’s numbering system; it was developed by Honegger and Plückthun [31]. AHo’s, the most recent numbering system for the amino acid residue to define the various regions in the variable domain of heavy and light chains. AHo’s numbering system used different 3D structures of antibody variable regions for alignment of heavy and light chains and they additionally covered immunoglobulins with different lengths for their analysis. This system first described the
AHo’s numbering system is similar to Chothia numbering system since both are based on the 3D structural alignments. In 2018, Wagner’s group designed their own synthetic library of VHH (Variable Heavy domain of Heavy chain) specifically focused on the CDR3 region, additionally they introduced randomized residues in the CDR2 region. CDR3 regions are known to be highly diverse and crucial for antigen binding. AHo’s system recognizes the appropriate FR and CDR regions due to its ability to define the conserved residues C
IMGT numbering system
Marie-Paule, in 1989 [34], established IMGT
IMGT has developed an unique numbering system that enables comparison of variable domains across species and different types of antibodies and antigen binding receptors [39, 40]. IMGT system provides a standardized skeleton for defining the regions of the antibody which include the CDR regions (CDR1-IMGT: 27 to 38, CDR2-IMGT: 56 to 65, and CDR3-IMGT: 105 to 117) and the FR regions (FR1-IMGT: 1 to 26, FR2-IMGT: 39 to 55, FR3-IMGT: 66 to 104, and FR4-IMGT: 118 to 128). Moreover, this system recognizes the cruciality of CDRs length in the variable domain, therefore the IMGT numbering system defines CDRs in the unique format since gaps represent unoccupied positions. This IMGT unique numbering system is utilized in 2D graphical representations known as IMGT Colliers de Perles [41]. The IMGT method offers a significant advantage as it is based on sequence alignments derived from a comprehensive reference gene database that includes the entire IgSF regime. This approach has resulted in the creation of extremely valuable tools. IMGT
Numbering system: Significance and application
Antibody numbering is important for development and optimization of the therapeutic antibodies for a range of applications, including cancer, autoimmune disease, and infectious disease. The engineering of variable domains is a common technique used to modify the properties of antibodies such as the specificity, affinity, and stability under different physiological or experimental conditions. Primarily, the improvement in the affinity of an antibody is usually a most crucial property which is used in antibody engineering. This process involves identifying the CDRs of the antibody and making specific modifications through site-directed mutagenesis. Then, the goal is to improve the biochemical and biophysical properties of the antibodies for more effective therapeutics properties. Moreover, the most common aim of antibody engineering is to reduce the immunogenicity of therapeutic antibodies of murine origin to avoid anti-mouse antibody response in humans which is commonly known as humanization of an antibody [45]. To get humanized antibodies, researchers may use a CDR-grafting approach or fusion of the murine variable domains with human constant regions to create chimeric antibodies [46]. Usually, humanized mAbs are generated by grafting mouse CDR residues onto human acceptor antibody frameworks. Defining the boundaries of the CDRs is important, to subsidize the number of non-human residues, the CDRs region should be as small as possible. It is important to ensure that the CDRs contain all the residues that are in direct contact with the epitope of an antigen. It should be noted that different definitions of CDRs provided by different numbering systems have both advantages and disadvantages when it comes to CDR-grafting. For example, the IMGT includes residues 93 and 94 in CDR-H3, as per the IMGT consideration they are crucial for maintaining the conformation of the CDR, however the numbering systems like Kabat, Kothia and Martin does not consider in these residues in CDR3 [47, 48, 49]. It has been observed that approximately 20% of the residues that bind the antigen are located outside of the CDRs, regardless of which CDRs are selected [11]. These residues are equally essential to antigen binding as those found within the CDRs, and in certain instances, they are even more energetically significant. Therefore, while considering the CDR definition is suitable for CDR grafting, it is important to take into account the FR residues that interact with the antigen. When dealing with shorter CDRs, more FR residues are generally required, whereas, for longer CDRs, fewer FR residues are needed for back mutations [11].
An accurate numbering of an antibody is the key for the success of antibody engineering and development. It requires precise identification of the residues that have an impact on the affinity, solubility, stability etc. of an antibody to develop for diagnostics or therapeutic purposes. [10]. For example, the CDR grafting method is highly used for the humanization of the non-human antibodies, this technique needs precise information about the residue number and location for a change or substitution of an existing residue [50]. Several studies have highlighted the importance of AbN in rational design of therapeutic antibodies. For example, Klein and group (2013) used various AbN to identify and define key residues in the CDRs and FRs of antibodies that were critical for antigen binding and neutralization of HIV-1 [51]. AbN can be used to identify "hotspots", residues making up a very small fraction in the overall interface yet contribute significantly to antigen binding. By targeting these hotspots, researchers can optimize the antibody for improved binding and therapeutic efficacy. Jian and colleagues (2019) used antibody numbering to identify hotspots in the CDRs of therapeutic antibodies that were critical for binding to the protein target, and used this information to generate a phage displayed synthetic antibody library [52]. Antibody phage display is a high throughput technique considered as the best alternative to the traditional hybridoma technology, to discover antibodies specific to different target antigen, by employing this method fully human-derived mAbs can be isolated from a large size Ig gene repertoire which are displayed on the surface of the bacteriophages [53]. There is a need to translate the nucleotide gene sequence into amino acid sequence and their annotation of CDRs and FRs, however the IMGT numbering system can do the annotations as well as classify the nucleotide sequence into lambda and kappa light chains of immunoglobulins, which is an advantage of the IMGT numbering system [47].
AbN is critical for the development of bispecific and multispecific antibodies, which are designed to target multiple antigens or pathways simultaneously, bispecific antibodies are the type of antibodies with two different binding sites directed at two different antigens or two different epitopes on the same antigen. By identifying specific residues in the CDRs that contribute to binding to one antigen, researchers can engineer the antibody to also bind to a second antigen or pathway. Similarly, researchers used antibody numbering to engineer a bispecific antibody that targeted both CD3-positive T cells and B-cell maturation antigen in the treatment of multiple myeloma [54].
List of different tools for antibody numbering systems
List of different tools for antibody numbering systems
Antibody numbering and annotation is a critical step in antibody engineering that allows the rational antibody design, optimization, and characterization of therapeutic antibodies. The various numbering tools provide a standardized and consistent way of identifying specific residues and its labeling. AbN tools may enable and strengthen researchers to develop more effective and targeted therapies with desirable properties for a wide range of applications including in the treatment of diseases. We have listed different tools utilized for antibody numbering and labeling of residues in Table 1 with brief description and source information. Furthermore, ANARCI is a tool that can be used for VH, V
Comparative CDR analysis by various numbering systems
Comparative CDRs region analysis is performed by employing different numbering systems. In this review, for representation purpose an antibody that targets type II transmembrane serine protease, matriptase (MT-SP1) was taken from PDB ID (3SO3) and aligned the light chain sequence using various numbering systems as depicted in Fig. 3 [55]. As the outcome shown, the CDR1 region varies according to different numbering systems, 24 to 34 AHo’s, 30 to 36 Martin, 24 to 35 Chothia and Kabat, 27 to 33 IMGT. It can be noted that Kabat and Chothia are depicting CDR1, CDR2 and CDR3 regions from the same amino acid residue i.e 24(R) to 35(A), CDR2 51(G) to 57(T). CDR3 prediction is the same by 3 numbering systems Kabat, Chothia and IMGT 90(Q) to 100(T). These observations show the variability arising due to different tools used for predicting CDR regions. This can be attributed to different concepts used in serving the numbering systems as discussed under the respective subtopic.
Conclusion
Finally, we would like to conclude that in order to engineer antibodies for desirable properties, it is essential to precisely identify the suitable amino acids positions, hence various numbering systems have been developed that can help in the engineering of the antibodies by identifying specific regions in antibodies, such as CDRs and FRs. To the best of our knowledge, we would like to infer that amongst all the available numbering systems, the IMGT numbering system supersedes due to its extensive and widely accepted immunogenetics database. Unlike Kabat, Chothia, and AHo’s, which are based on sequence and structural alignments, the IMGT numbering system includes germline data from species apart from humans and plethora of other organisms, such as mice, camels, and bovines. Hence, the IMGT system’s distinctive approach to the antibody numbering has provided significant advantages over other systems.
Footnotes
Acknowledgments
The authors are thankful to Innoplexus Consulting Services Pvt Ltd, Pune India for facilitating and supporting this work.
Conflict of interest
The authors declare that they have no conflict of interest.
Author contribution
Each mentioned author in the manuscript has substantially contributed to the research and drafting of this manuscript. Conception: JP; Data collection: RP, PV, AKN; Interpretation or analysis of data: RP, PV, AKN, AG, JP; Preparation of the manuscript: RP, PV, AKN, AG, JP; Revision for important intellectual content: AKN, OS, JP; Supervision: OS, JP.
