Significance of antibody numbering systems in the development of antibody engineering

Abstract

Immunotherapy has become increasingly popular in recent years for treating a variety of diseases including inflammatory, neurological, oncological, and auto-immune disorders. The significant interest in antibody development is due to the high binding affinity and specificity of an antibody against a specific antigen. Recent advances in antibody engineering have provided a different view on how to engineer antibodies in silico for therapeutic and diagnostic applications. In order to improve the clinical utility of therapeutic antibodies, it is of paramount importance to understand the various molecular properties which impact antigen targeting and its potency. In antibody engineering, antibody numbering (AbN) systems play an important role to identify the complementarity determining regions (CDRs) and the framework regions (FR). Hence, it is crucial to accurately define and understand the CDR, FR and the crucial residues of heavy and light chains that aid in the binding of the antibody to the antigenic site. Detailed understanding of amino acids positions are useful for modifying the binding affinity, specificity, physicochemical features, and half-life of an antibody. In this review, we have summarized the different antibody numbering systems that are widely used in antibody engineering and highlighted their significance. Here, we have systematically explored and mentioned the various tools and servers that harness different AbN systems.

Keywords

Antibody numbering system IMGT kabat antigen epitope and paratope

1. Introduction

Figure 1.

Schematic illustration of an IgG antibody: (A) Legends for the various parts of the antibody structure. (B) Antibody structure: depicting the variable domain of the light chain consisting of three CDR loops and FR regions, V ${}_{L}$ stands for variable light and V ${}_{H}$ is for variable heavy. The C ${}_{L}$ stands for constant light chain and C ${}_{H}$ stands for constant heavy chain, there are three constant heavy domains, the chains are connected by disulfide bonds.

Antibodies also known as immunoglobulins (Ig) which are produced from B cells, serve as a crucial part of the immune system. Lymphocytes play a pivotal role in the antigen-specific or acquired immune response. Among these lymphocytes, B cells mainly conceit into plasma cells and produce antibodies, which are majorly responsible for humoral immunity, whereas T cells are responsible for cell immunity. T cells harbors a receptor (known as TCR) that enables them to identify a broad range of antigens derived from pathogens, tumors, and surroundings. Additionally, they play a crucial role in maintaining immunological memory and self-tolerance [1, 2]. In the field of drug discovery, development of a therapeutic monoclonal antibody via antibody engineering holds a stupendous potential to treat various diseases and disorders. To obtain a therapeutic antibody with a superior performance, it is certainly considered to conduct standardized analysis processes and various molecular aspects such as binding affinity, half-life (stability), specificity, effector properties, and antigenicity [3, 4]. Specifically, the strength of binding interaction between antigen and antibody is known as binding affinity of that antibody. Interestingly, the approaches designed by the next generation sequencing (NGS) makes far-greater insights into antibody library diversity by providing a large number of sequences (approximately 10 ${}^{7}$ sequences). As a subsequent step the post processing of those sequences must be annotated by an antibody numbering system like IMGT to generate the precise information about the sequences within the dataset, and perform the pairing of VH and VL domain sequences. Finally, bioinformatics tools or coding scripts can be used for data analysis and to summarize the diversity and other characteristics of the NGS library [5].

Globally, antibody research and development is significantly increasing with scientific and technological advancement. Monoclonal antibodies (mAbs) have a great therapeutic importance as they are produced by a single clone or B cell lineage and bind to a specific epitope, a segment of an antigen where the antibody binds. Besides, mAbs have been developed into effective clinical therapies and around 79 mAbs have been approved by the US Food and Drug Administration till the year 2020 [6]. Structurally, antibodies are glycoproteins having a molecular weight of 150 kDa which consists of two sets of identical heavy and light chains linked by disulfide bonds. The heavy chains contain variable heavy (VH) and constant heavy (CH1, CH2, CH3) domains, and the light chains have constant light (CL) and variable light (VL) domains. Antibody could be specific and has high affinity for antigen-antibody interactions primarily due to complementary determining regions (CDRs), CDRs are part of the variable domain, which are majorly responsible for binding to their specific epitope of an antigen. Additionally, in an antibody a set of CDRs which bind to a particular epitope of an antigen are known as paratopes or antigen-binding pockets. As shown in Fig. 1, the antibody variable domain is made up of four FR regions represented as FR1, FR2, FR3, and FR4; and three CDR regions represented as CDR1, CDR2, and CDR3. As compared to CDR regions FR regions have relatively more stable amino acids sequence in the heavy and light chains [7, 8]. As a natural requirement, the sequence diversity is essential for antibodies to bind on a diverse range of antigens. Hence, the gene arrangement of antibodies is highly complex that leads to various germlines formation for antibody repertoires [9].

The development of therapeutic antibodies by the engineering of the variable regions directed against the specific epitope of the target antigen demands a precise identification of the CDRs and hence requires a sufficient alignment of antibody sequences from human and non-human species. Researchers have observed that the framework regions of an antibody might as well exert a significant impact on the antibody affinity [10]. The various antibody numbering systems can help identify the precise corresponding positions of amino acid residues in the heavy and light chains of the immunoglobulin. Nonetheless, the use of various numbering schemes available are baffling and may possibly be responsible for the irregular identification of CDR and FR residues.

Figure 2.

Antibody numbering systems: Timeline depicting evolution and progress of different numbering systems. Kabat first introduced a numbering system followed by Chothia, IMGT, Gelfand, AHo’s and Martin.

As the emergence of sequencing techniques and numerous structures of antigen-antibodies complex have been reported over the time, this has facilitated the statistical identification of an antibody. Eventually, the antibody numbering systems have become a crucial technique in immunoinformatics and antibody analysis [10, 11, 12]. It is established that the position of each residue in the antibody sequence is crucial for its binding affinity so it is of utmost importance to have accurate definitions of the CDR and FR regions [13]. Antibody CDRs numbering can be used to define and modify the antibody functions such as binding affinity, stability and decrease the non-humanoid antibodies immunogenicity [10, 11]. Chimeric antibody is an antibody molecule which is developed from different species and hence requires humanization for therapeutic applications [14]. Further, this process requires identification, accuracy of the CDRs and appropriate alignment of antibody sequences from humanoid and non-humanoid species. Necessity of the numbering system was recognized at a early stage of antibody research which eventually led to the development of various different numbering systems such as IMGT, Kabat, Chothia, Martin, and Honneger’s (AHo’s), Fig. 2 depicting the timeline evolution of the different numbering systems. In this review, we have discussed the various numbering systems and their importance in the perspective of antibody engineering and listed the various tools that are currently being used for identification and definition of the CDRs and FRs. Additionally, we have compared and analyzed outputs of different numbering systems by studying an antibody PDB ID (3SO3).

2. Antibody numbering systems and importance

2.1 Kabat numbering system

Kabat and Wu were the pioneers to evaluate the differences in the composition of amino acids at each consecutive position in the variable regions of different antibodies. They have analyzed and aligned a total of 77 Bence-Jones protein and immunoglobulin light and heavy chain sequences, Bence-Jones proteins are monoclonal immunoglobulin light chain proteins that are excreted in the urine of patients having multiple myeloma type of cancer [15]. Moreover, they coined the term “Variability Parameter” which can be defined as the count of various amino acids at a given position divided by the frequency of the most occurring amino acid at that particular position. Their analysis and alignment studies eventually came up with the observation that three hypervariable regions exist in the variable region of the heavy and light chain; these regions are the part of the heavy and light chains that are in direct contact with antigen and are able to frequently mutate to allow diverse epitope specific recognition. It was observed that cysteine and tryptophan were two highly conserved amino acids in the variable region [15, 16, 17]. Kabat and colleagues aligned various light chains ( $\lambda$ , $\kappa$ ) and heavy chain sequences for a number of antibodies ( $\alpha$ , $\beta$ , $\gamma$ , $\delta$ ). Additionally, they aligned and analyzed several TCR sequences [18]. Based on these alignment studies, they postulated the AbN systems for the variable regions of antibody [19]. They analyzed antibody variable region sequences of different length and concluded that amino acid insertions are possible only at specific positions, mostly amino acid insertions were found to be located in CDRs regions only with the exception in light chain CDR2 and insertion was rarely observed in framework regions [18]. Although, Kabat numbering system has been widely used and accepted, it has few limitations. The whole inference was drawn on the basis of alignment by using only a small number of proteins i.e. 77, hence the limited positions of insertion and deletion in the CDRs and FRs region were taken into consideration for antibody modification. Likewise, it does not take into account the 3 dimensional (3D) structure of the antibody to define the CDRs and FRs [19].

Figure 3.

Comparison of different CDRs region: Anti-matriptase (MT-SP1) light chain sequence was taken from PDB ID (3SO3) and aligned. Pink, blue, green, purple and yellow represent different CDRs regions from IMGT, Kabat, Chothia, Martin and AHo’s numbering systems respectively. Different numbering systems represent three CDRs from different amino acids. Kabat and Chothia predict CDR1,2,3 on the same amino acid. IMGT, Martin and AHo’s predict CDRs positions at different amino acids.

2.2 Chothia numbering system

In 1987, Chothia and Lesk introduced the first structure based antibody numbering scheme for variable regions. They aligned variable regions of different antibodies mainly based on the 3D structures of antibodies in which they explained in detail about CDRs forming loops and found out the possible insertional positions of amino acids in the CDRs of heavy and light chains. CDRs have a higher degree of variation in their sequences, however these loops have a limited number of main chain conformations which are referred to as “canonical structures”. Canonical structures are the three-dimensional structures that exhibit a definite number of conformations for five of six hypervariable loops of the antibodies. It has been observed that the length of the CDR and presence of amino acids within loop regions determine these conformations which eventually acquire the shape of antigen binding pocket or paratope [20]. The characteristics of a canonical structure are defined by including loop length, loop conformation and conserved amino acid residues that are present in CDR and FR regions of the antibody. Antibody Modeling Assessments demonstrated that accurate identification and appropriate utilization of canonical structure are crucial aspects of antibody modeling [20, 21]. It was suggested that certain residues were to be attributed more for variations in conformation in canonical structures, these residues include glycine, proline, aromatic residues and hydrogen donor and acceptors. Due to the greater number of structures available in 1997, Chothia and colleagues discovered a total 25 canonical classes in their publication [22]. Interestingly, all these classes were defined by manual grouping of antibody loops and sequences [20, 22].

Though Chothia numbering system is based on the antibody three dimensional alignment, this numbering shifts the amino acid insertion point from light chain position L27 to L30 and from heavy chain position H35 to H32 [23]. The most important aspect of the Chothia numbering system is that it assigns the same position number to structurally aligned residues from different antibodies and CDRs usually defined by matching the structural antigen-binding loops from the sequences of antibodies. However, this numbering system also has limitations since it is based on the similar length of CDRs region, therefore sequences with different sequence length have been ignored. The other challenge may be the vast diversity of antibodies present in a species and the very limited number of refined structures available that can increase the accuracy of the Chothia numbering system. It is to be noted that the Chothia and Kabat numbering systems predict the CDRs region at the same position, which implies that they are essentially the same (Fig. 3) except for the placement of insertions in CDR-L1 and CDR-H1.

2.3 Martin numbering system

In 2008, Martin and group [24] introduced antibody structural alignment of different CDRs and FRs regions which have different sequence lengths. Moreover, the Martin numbering system is the modernized and updated version of the Chothia scheme of AbN. To consider the FR regions, structural considerations for insertion and deletion in the variable region is one of the key features of this numbering system. The only difference between Chothia and Martin numbering systems is the site of insertions and deletions of CDRH1 and CDRL1 regions.

In addition, Martin and group used a quantitative clustering approach to define the canonical classes of variable loops instead of applying manual clustering of the antibody loops [25]. In Martin system, CDRs were grouped into the clusters based on their structural features; they have considered 244 hypervariable regions from 49 immunoglobulins fragment antigen-binding (Fab) or variable domain (Fv) structures that were resolved at resolutions between 1.7 and 3.1 Å [26]. The Martin numbering system has similar challenges because of the limited number of available structures to define the exact regions of CDRs and FRs of the antibody.

2.4 Gelfand numbering system

The Gelfand numbering system is one of the interesting but relatively complex AbN systems introduced in 1997, Gelfand and group defined nomenclature of antibody variable regions [27]. They divided light and heavy chain variable sequences in 21 parts, termed “words”. Every “word” matches the secondary structure element of the antibody like helix and beta sheets. Further, they sub-categorized the helix to two indexing letters (AB, BC, CD) and beta sheets in one indexing letter (A, B, C) [28]. This numbering system does not take into account the deletions or gaps, but correlates antibody secondary structures (helix and beta sheet) with aligned variable region sequences. However in the Gelfand system, several loops do not exactly match with the definition of the Chothia system [27, 29, 30].

2.5 Honegger’s numbering system (AHo’s)

The Honegger’s number system is also known as AHo’s numbering system; it was developed by Honegger and Plückthun [31]. AHo’s, the most recent numbering system for the amino acid residue to define the various regions in the variable domain of heavy and light chains. AHo’s numbering system used different 3D structures of antibody variable regions for alignment of heavy and light chains and they additionally covered immunoglobulins with different lengths for their analysis. This system first described the $\alpha$ positions for structurally conserved regions to deduce the exact FR and CDR lengths. Along with FR and CDR regions, they also defined the gaps in the specific location of CDR1 and FRs (CDR1-27 to 28, FR2-36, FR-3-63, and FR-4-123) and the position of conserved amino acid residues in FR region Cys 23, Trp 43, Cys 106, Gly 140 [31].

AHo’s numbering system is similar to Chothia numbering system since both are based on the 3D structural alignments. In 2018, Wagner’s group designed their own synthetic library of VHH (Variable Heavy domain of Heavy chain) specifically focused on the CDR3 region, additionally they introduced randomized residues in the CDR2 region. CDR3 regions are known to be highly diverse and crucial for antigen binding. AHo’s system recognizes the appropriate FR and CDR regions due to its ability to define the conserved residues C $\alpha$ positions. In order to ensure the accuracy, they compared and aligned the corresponding amino acids using sequence logos derived from the alignment of eight hapten- and nine protein-binding VHH. The library was generated using the AHo’s numbering system to minimize structural deviations. By analyzing these sequence logos, they came up with highly diverse positions in both the CDR2 and CDR3 regions, such as residues at position 60, 67, and 69 in CDR2. Authors assumed that these positions can be feasible for affinity maturation based on their natural diversity [32, 33]. Here, the affinity maturation of an antibody is the process by which antibodies mainly gain increased affinity.

2.6 IMGT numbering system

Marie-Paule, in 1989 [34], established IMGT ${}^{\@setsize{\scriptsize}{8pt}{\viipt}{\@viipt}\textregistered}$ , the international ImMunoGeneTics information system ${}^{\@setsize{\scriptsize}{8pt}{\viipt}{\@viipt}\textregistered}$ , to standardize and manage enormous diverse data of antigen receptors, TCR and antibodies or immunoglobulins [35]. Globally, IMGT ${}^{\@setsize{\scriptsize}{8pt}{\viipt}{\@viipt}\textregistered}$ is recognized as the reference for immunogenetics and immunoinformatics by providing access to standardized and highly integrated omics data from different high throughput proteome, genome (genetics) and 3D structure information [36]. However, the unique IMGT numbering system was proposed by considering the structural conservation of the variable domain (V-DOMAIN) by utilizing and analyzing the vast information from various sources such as sequence alignments, literature data, CDRs characterizations and structural information data (X-ray diffraction) [37]. The region of FRs-IMGT and CDRs-IMGT was established by analyzing the longest CDR1, CDR2 alignments of multiple of Ig germline and TCR genes, and from statistical analysis of Ig and TCR rearrangements for CDR3-IMGT [38]. Initially, it was designed for Ig and TCR V-DOMAIN numerotation, however this system was subsequently extended to the V-LIKE-DOMAIN and C-LIKE-DOMAIN of Ig superfamily (IgSF) and later to the C-DOMAIN of Ig and TCR [35, 38].

IMGT has developed an unique numbering system that enables comparison of variable domains across species and different types of antibodies and antigen binding receptors [39, 40]. IMGT system provides a standardized skeleton for defining the regions of the antibody which include the CDR regions (CDR1-IMGT: 27 to 38, CDR2-IMGT: 56 to 65, and CDR3-IMGT: 105 to 117) and the FR regions (FR1-IMGT: 1 to 26, FR2-IMGT: 39 to 55, FR3-IMGT: 66 to 104, and FR4-IMGT: 118 to 128). Moreover, this system recognizes the cruciality of CDRs length in the variable domain, therefore the IMGT numbering system defines CDRs in the unique format since gaps represent unoccupied positions. This IMGT unique numbering system is utilized in 2D graphical representations known as IMGT Colliers de Perles [41]. The IMGT method offers a significant advantage as it is based on sequence alignments derived from a comprehensive reference gene database that includes the entire IgSF regime. This approach has resulted in the creation of extremely valuable tools. IMGT ${}^{\@setsize{\scriptsize}{8pt}{\viipt}{\@viipt}\textregistered}$ comprises seventeen online tools, seven databases, and more than 20,000 pages of web resources [42, 43]. By providing a standardized analysis framework for antibody sequences and 3D structures, IMGT ${}^{\@setsize{\scriptsize}{8pt}{\viipt}{\@viipt}\textregistered}$ supports research in antibody engineering, including single-chain fragment variable (scFv), combinatorial libraries, and phage displays [44].

3. Numbering system: Significance and application

Antibody numbering is important for development and optimization of the therapeutic antibodies for a range of applications, including cancer, autoimmune disease, and infectious disease. The engineering of variable domains is a common technique used to modify the properties of antibodies such as the specificity, affinity, and stability under different physiological or experimental conditions. Primarily, the improvement in the affinity of an antibody is usually a most crucial property which is used in antibody engineering. This process involves identifying the CDRs of the antibody and making specific modifications through site-directed mutagenesis. Then, the goal is to improve the biochemical and biophysical properties of the antibodies for more effective therapeutics properties. Moreover, the most common aim of antibody engineering is to reduce the immunogenicity of therapeutic antibodies of murine origin to avoid anti-mouse antibody response in humans which is commonly known as humanization of an antibody [45]. To get humanized antibodies, researchers may use a CDR-grafting approach or fusion of the murine variable domains with human constant regions to create chimeric antibodies [46]. Usually, humanized mAbs are generated by grafting mouse CDR residues onto human acceptor antibody frameworks. Defining the boundaries of the CDRs is important, to subsidize the number of non-human residues, the CDRs region should be as small as possible. It is important to ensure that the CDRs contain all the residues that are in direct contact with the epitope of an antigen. It should be noted that different definitions of CDRs provided by different numbering systems have both advantages and disadvantages when it comes to CDR-grafting. For example, the IMGT includes residues 93 and 94 in CDR-H3, as per the IMGT consideration they are crucial for maintaining the conformation of the CDR, however the numbering systems like Kabat, Kothia and Martin does not consider in these residues in CDR3 [47, 48, 49]. It has been observed that approximately 20% of the residues that bind the antigen are located outside of the CDRs, regardless of which CDRs are selected [11]. These residues are equally essential to antigen binding as those found within the CDRs, and in certain instances, they are even more energetically significant. Therefore, while considering the CDR definition is suitable for CDR grafting, it is important to take into account the FR residues that interact with the antigen. When dealing with shorter CDRs, more FR residues are generally required, whereas, for longer CDRs, fewer FR residues are needed for back mutations [11].

An accurate numbering of an antibody is the key for the success of antibody engineering and development. It requires precise identification of the residues that have an impact on the affinity, solubility, stability etc. of an antibody to develop for diagnostics or therapeutic purposes. [10]. For example, the CDR grafting method is highly used for the humanization of the non-human antibodies, this technique needs precise information about the residue number and location for a change or substitution of an existing residue [50]. Several studies have highlighted the importance of AbN in rational design of therapeutic antibodies. For example, Klein and group (2013) used various AbN to identify and define key residues in the CDRs and FRs of antibodies that were critical for antigen binding and neutralization of HIV-1 [51]. AbN can be used to identify "hotspots", residues making up a very small fraction in the overall interface yet contribute significantly to antigen binding. By targeting these hotspots, researchers can optimize the antibody for improved binding and therapeutic efficacy. Jian and colleagues (2019) used antibody numbering to identify hotspots in the CDRs of therapeutic antibodies that were critical for binding to the protein target, and used this information to generate a phage displayed synthetic antibody library [52]. Antibody phage display is a high throughput technique considered as the best alternative to the traditional hybridoma technology, to discover antibodies specific to different target antigen, by employing this method fully human-derived mAbs can be isolated from a large size Ig gene repertoire which are displayed on the surface of the bacteriophages [53]. There is a need to translate the nucleotide gene sequence into amino acid sequence and their annotation of CDRs and FRs, however the IMGT numbering system can do the annotations as well as classify the nucleotide sequence into lambda and kappa light chains of immunoglobulins, which is an advantage of the IMGT numbering system [47].

AbN is critical for the development of bispecific and multispecific antibodies, which are designed to target multiple antigens or pathways simultaneously, bispecific antibodies are the type of antibodies with two different binding sites directed at two different antigens or two different epitopes on the same antigen. By identifying specific residues in the CDRs that contribute to binding to one antigen, researchers can engineer the antibody to also bind to a second antigen or pathway. Similarly, researchers used antibody numbering to engineer a bispecific antibody that targeted both CD3-positive T cells and B-cell maturation antigen in the treatment of multiple myeloma [54].

Table 1
List of different tools for antibody numbering systems

No.	Tool name	Number system	Description	Source
01	IMGT/HighV-QUEST	IMGT	Used nucleotide sequences and automatically detect heavy and light chain of CDRs regions [37]	https://www.imgt.org/HighV-QUEST/home.action
02	ANARCI	IMGT $\|$ Kabat $\|$ Chothia $\|$ Martin $\|$ AHo	ANARCI tool used amino acids sequences and automatically detect heavy and light chains of CDR region [56]	http://opig.stats.ox.ac.uk/webapps/newsabdab/sabpred/anarci/
03	AbRSA	Kabat $\|$ Chothia $\|$ IMGT	AbRSA tools used amino acids sequences and automatic detect heavy and light chain of CDRs regions [57]	http://cao.labshare.cn/AbRSA/abrsa.php
04	Novoprolabs	Kabat $\|$ Chothia $\|$ IMGT	Novoprolabs tool use amino acids sequences and automatic detect heavy and light chain of CDRs regions [58]	https://www.novoprolabs.com/tools/cdr
05	OPIG	IMGT	Does Not give all heavy and light chain of CDRs regions [59]	https://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/sabdab
06	Abnum	Kabat $\|$ Chothia $\|$ Martin	Abnum tool use amino acids sequences and automatic detect heavy and light chain of CDRs regions [58]	http://www.bioinf.org.uk/abs/abnum/
07	PyIgClassify	IMGT	Uses nucleotide,Amino acid, RNA sequences and automatic heavy and light chain of CDRs regions [60]	http://dunbrack2.fccc.edu/PyIgClassify/User/UserSequences.aspx
08	VBASE2	IMGT	DNA sequence as Input is used [61]	http://www.vbase2.org/
09	proABC	Chothia	Machine learning tool [62]	https://wenmr.science.uu.nl/proabc2/

4. Major numbering system tools in antibody engineering

Antibody numbering and annotation is a critical step in antibody engineering that allows the rational antibody design, optimization, and characterization of therapeutic antibodies. The various numbering tools provide a standardized and consistent way of identifying specific residues and its labeling. AbN tools may enable and strengthen researchers to develop more effective and targeted therapies with desirable properties for a wide range of applications including in the treatment of diseases. We have listed different tools utilized for antibody numbering and labeling of residues in Table 1 with brief description and source information. Furthermore, ANARCI is a tool that can be used for VH, V $\lambda$ and V $\kappa$ domain amino acids sequences, it can apply various schemes to the IMGT, AHo, Kabat, Chothia or extended Chothia. Whereas AbRSA is a tool that applies the expert knowledge of region-specific characteristics into sequence mapping to improve the quality. The standard shows that AbRSA displays strong performance in numbering sequences with variable lengths and patterns in comparison to the best available tools. It has been observed that for antibody numbering and specifically antigen receptor classification, Novoprolabs is the most useful tool. The OPIG has developed a SAbDab database that is an online resource which contains all the publicly accessible antibody PDB structures annotated and presented in a persistent way. Abnum tool is used for numbering an antibody sequence or structure by applying the Chothia or Kabat or the Martin scheme. Moreover, PyIgClassify is a database and web server that gives entry to the assignments of all CDR structures available in the PDB to the classification system. The PyIgClassify database has assignments to the IMGT germline V regions for heavy and light chains for multiple species. The germ-line sequences of human and mouse immunoglobulin variable genes can be retrieved from the VBASE2 database. The proABC-2 is available as a web server, this tool only needs input as the sequences of the heavy and light chains. The given input is further processed to compute all of the sequence-associated features then they are given to the algorithms to do the sequence predictions.

5. Comparative CDR analysis by various numbering systems

Comparative CDRs region analysis is performed by employing different numbering systems. In this review, for representation purpose an antibody that targets type II transmembrane serine protease, matriptase (MT-SP1) was taken from PDB ID (3SO3) and aligned the light chain sequence using various numbering systems as depicted in Fig. 3 [55]. As the outcome shown, the CDR1 region varies according to different numbering systems, 24 to 34 AHo’s, 30 to 36 Martin, 24 to 35 Chothia and Kabat, 27 to 33 IMGT. It can be noted that Kabat and Chothia are depicting CDR1, CDR2 and CDR3 regions from the same amino acid residue i.e 24(R) to 35(A), CDR2 51(G) to 57(T). CDR3 prediction is the same by 3 numbering systems Kabat, Chothia and IMGT 90(Q) to 100(T). These observations show the variability arising due to different tools used for predicting CDR regions. This can be attributed to different concepts used in serving the numbering systems as discussed under the respective subtopic.

6. Conclusion

Finally, we would like to conclude that in order to engineer antibodies for desirable properties, it is essential to precisely identify the suitable amino acids positions, hence various numbering systems have been developed that can help in the engineering of the antibodies by identifying specific regions in antibodies, such as CDRs and FRs. To the best of our knowledge, we would like to infer that amongst all the available numbering systems, the IMGT numbering system supersedes due to its extensive and widely accepted immunogenetics database. Unlike Kabat, Chothia, and AHo’s, which are based on sequence and structural alignments, the IMGT numbering system includes germline data from species apart from humans and plethora of other organisms, such as mice, camels, and bovines. Hence, the IMGT system’s distinctive approach to the antibody numbering has provided significant advantages over other systems.

Footnotes

Acknowledgments

The authors are thankful to Innoplexus Consulting Services Pvt Ltd, Pune India for facilitating and supporting this work.

Conflict of interest

The authors declare that they have no conflict of interest.

Author contribution

Each mentioned author in the manuscript has substantially contributed to the research and drafting of this manuscript. Conception: JP; Data collection: RP, PV, AKN; Interpretation or analysis of data: RP, PV, AKN, AG, JP; Preparation of the manuscript: RP, PV, AKN, AG, JP; Revision for important intellectual content: AKN, OS, JP; Supervision: OS, JP.

References

Bingaman

A.W.

Patke

D.S.

Mane

V.R.

Ahmadzadeh

Ndejembi

Bartlett

S.T.

and Farber

D.L.

, Novel phenotypes and migratory properties distinguish memory CD4 T cell subsets in lymphoid and lung tissue, Eur J Immunol 35 (2005), 3173–3186.

Cohen

I.R.

, Activation of benign autoimmunity as both tumor and autoimmune disease immunotherapy: a comprehensive review, J Autoimmun 54 (2014), 112–117.

Fagain

and Kennedy

, Antibody stability: A key to performance – Analysis, influences and improvement, Biochimie 177 (2020), 213–225.

R.M.

Hwang

Y.C.

Liu

I.J.

and Wu

H.C.

, Development of therapeutic antibodies for the treatment of diseases, J Biomed Sci 27 (2020).

Rouet

Jackson

K.J.L.

Langley

D.B.

and Christ

, Next-Generation Sequencing of Antibody Display Repertoires, Front Immunol 9 (2018), 118.

R.M.

Hwang

Y.C.

Liu

I.J.

Lee

C.C.

Tsai

H.Z.

H.J.

and Wu

H.C.

, Development of therapeutic antibodies for the treatment of diseases, J Biomed Sci 27 (2020), 1.

J.L.

and Davis

M.M.

, Diversity in the CDR3 region of V(H) is sufficient for most antibody specificities, Immunity 13 (2000), 37–45.

D’Angelo

Ferrara

Naranjo

Erasmus

M.F.

Hraber

and Bradbury

A.R.M.

, Many Routes to an Antibody Heavy-Chain CDR3: Necessary, Yet Insufficient, for Specific Binding, Front Immunol 9 (2018), 395.

Mikocziova

Greiff

and Sollid

L.M.

, Immunoglobulin germline gene variation and its impact on human disease, Genes Immun 22 (2021), 205–217.

10.

Dondelinger

Filée

Sauvage

Quinting

Muyldermans

Galleni

and Vandevenne

M.S.

, Understanding the Significance and Implications of Antibody Numbering and Antigen-Binding Surface/Residue Definition, Front Immunol 9 (2018), 2278.

11.

Kunik

Peters

and Ofran

, Structural consensus among antibodies defines the antigen binding site, PLoS Comput Biol 8 (2012), e1002388.

12.

Robin

Sato

Desplancq

Rochel

Weiss

and Martineau

, Restricted diversity of antigen binding residues of antibodies revealed by computational alanine scanning of 227 antibody – antigen complexes, J Mol Biol 426 (2014), 3729–3743.

13.

Jarasch

and Skerra

, Aligning, analyzing, and visualizing sequences for antibody engineering: Automated recognition of immunoglobulin variable region features, Proteins 85 (2017), 65–71.

14.

Khoo

Y.L.

Cheah

S.H.

and Chong

, Humanization of chimeric anti-CD20 antibody by logical and bioinformatics approach with retention of biological activity, Immunotherapy 9 (2017), 567–577.

15.

Kabat

E.A.

and Wu

T.T.

, Attempts to locate complementarity-determining residues in the variable positions of light and heavy chains, Ann NY Acad Sci 190 (1971), 382–393.

16.

T.T.

and Kabat

E.A.

, Pillars article: an analysis of the sequences of the variable regions of Bence Jones proteins and myeloma light chains and their implications for antibody complementarity, J Exp Med 132 (1970), 211–250. J Immunol 180 (2008), 7057–7096.

17.

Capra

J.D.

and Kehoe

J.M.

, Variable region sequences of five human immunoglobulin heavy chains of the VH3 subgroup: definitive identification of four heavy chain hypervariable regions, Proc Natl Acad Sci USA 71 (1974), 845–848.

18.

Kabat

E.A.

T.T.

and Bilofsky

, Sequences of Immunoglobulin Chains: Tabulation and Analysis of Amino Acid Sequences of Precursors, V-regions, C-regions, J-chain and BP-microglobulins 1979.

19.

Kabat

E.A.

T.T.

Perry

Gottesman

and Foeller

, Sequences of Proteins of Immunological Interest, Fifth Edition 1991.

20.

Chothia

and Lesk

A.M.

, Canonical structures for the hypervariable regions of immunoglobulins, J Mol Biol 196 (1987), 901–917.

21.

Almagro

J.C.

Beavers

M.P.

Hernandez-Guzman

Maier

Shaulsky

Butenhof

Labute

Thorsteinson

Kelly

Teplyakov

Luo

Sweet

and Gilliland

G.L.

, Antibody modeling assessment, Proteins 79 (2011), 3050–3066.

22.

Al-Lazikani

Lesk

A.M.

and Chothia

, Standard conformations for the canonical structures of immunoglobulins, J Mol Biol 273 (1997), 927–948.

23.

Chothia

Lesk

A.M.

Tramontano

Levitt

Smith-Gill

S.J.

Air

Sheriff

Padlan

E.A.

Davies

and Tulip

W.R.

, Conformations of immunoglobulin hypervariable regions, Nature 342 (1989), 877–883.

24.

Abhinandan

K.R.

and Martin

A.C.R.

, Analysis and improvements to Kabat and structurally correct numbering of antibody variable domains, Mol Immunol 45 (2008), 3832–3839.

25.

Martin

A.C.

, Accessing the Kabat antibody sequence database by computer, Proteins 25 (1996), 130–133.

26.

MacCallum

R.M.

Martin

A.C.

and Thornton

J.M.

, Antibody-antigen interactions: contact analysis and binding site topography, J Mol Biol 262 (1996), 732–745.

27.

Gelfand

Kister

Kulikowski

and Stoyanov

, Geometric invariant core for the V(L) and V(H) domains of immunoglobulin molecules, Protein Eng 11 (1998), 1015–1025.

28.

Gelfand

Kister

Kulikowski

and Stoyanov

, Algorithmic determination of core positions in the VL and VH domains of immunoglobulin molecules, J Comput Biol 5 (1998), 467–477.

29.

Gelfand

I.M.

Kister

A.E.

and Leshchiner

, The invariant system of coordinates of antibody molecules: prediction of the “standard” C alpha framework of VL and VH domains, Proc Natl Acad Sci USA 93 (1996), 3675–3678.

30.

Gelfand

I.M.

and Kister

A.E.

, Analysis of the relation between the sequence and secondary and three-dimensional structures of immunoglobulin molecules, Proc Natl Acad Sci USA 92 (1995), 10884–10888.

31.

Honegger

Plückthun

, Yet another numbering scheme for immunoglobulin variable domains: an automatic modeling and analysis tool, J Mol Biol 309 (2001), 657–670.

32.

Fellouse

F.A.

Compaan

D.M.

Peden

A.A.

Hymowitz

S.G.

and Sidhu

S.S.

, Molecular recognition by a binary code, J Mol Biol 348 (2005), 1153–1162.

33.

Wagner

H.J.

Wehrle

Weiss

Cavallari

and Weber

, A Two-Step Approach for the Design and Generation of Nanobodies, Int J Mol Sci 19 (2018, 3444, doi: 10.3390/ijms19113444.

34.

Lefranc

M.P.

Giudicelli

Ginestoux

Jabado-Michaloud

Folch

Bellahcene

Gemrot

Brochet

Lane

Regnier

Ehrenmann

Lefranc

and Duroux

, IMGT®, the international ImMunoGeneTics information system®, Nucl Acids Res 37 (2009), D1006–D1012.

35.

Lefranc

M.P.

Giudicelli

Ginestoux

Bosc

Folch

Guiraudou

Jabado-Michaloud

Magris

Scaviner

Thouvenin

Combres

Girod

Jeanjean

Protat

Yousfi-Monod

Duprat

Kaas

Pommié

Chaume

and Lefranc

, IMGT-ONTOLOGY for immunogenetics and immunoinformatics, In Silico Biol 4 (2004), 17–29.

36.

Lefranc

M.P.

, IMGT, the international ImMunoGeneTics database, Nucleic Acids Res 31 (2003), 307–310.

37.

Brochet

Lefranc

M.P.

and Giudicelli

, IMGT/V-QUEST: the highly customized and integrated system for IG and TR standardized V-J and V-D-J sequence analysis, Nucleic Acids Res 36 (2008), W503–8.

38.

Ehrenmann

Kaas

and Lefranc

M.P.

, IMGT/3Dstructure-DB and IMGT/DomainGapAlign: a database and a tool for immunoglobulins or antibodies, T cell receptors, MHC, IgSF and MhcSF, Nucleic Acids Res 38 (2010), D301–7.

39.

Lefranc

M.P.

, Unique database numbering system for immunogenetic analysis, Immunol Today 18 (1997), 509.

40.

Lefranc

M.P.

, IMGT Collier de Perles for the variable (V), constant (C), and groove (G) domains of IG, TR, MH, IgSF, and MhSF, Cold Spring Harb Protoc 2011 (2011), 643–651.

41.

Kaas

Ehrenmann

Lefranc

M.-P.

, IG, TR and IgSF, MHC and MhcSF: what do we learn from the IMGT Colliers de Perles? Brief Funct Genomic Proteomic 6 (2007), 253–264.

42.

Lefranc

M.P.

Giudicelli

Regnier

and Duroux

, IMGT, a system and an ontology that bridge biological and computational spheres in bioinformatics, Brief Bioinform 9 (2008), 263–275.

43.

Kaas

and Lefranc

M.P.

, T cell receptor/peptide/MHC molecular characterization and standardized pMHC contact sites in IMGT/3Dstructure-DB, In Silico Biol 5 (2005), 505–528.

44.

Lefranc

M.P.

Pommié

Ruiz

Giudicelli

Foulquier

Truong

Thouvenin-Contet

and Lefranc

, IMGT unique numbering for immunoglobulin and T cell receptor variable domains and Ig superfamily V-like domains, Dev Comp Immunol 27 (2003), 55–77.

45.

Klee

G.G.

, Human anti-mouse antibodies, Arch Pathol Lab Med 124 (2000), 921–923.

46.

Jones

P.T.

Dear

P.H.

Foote

Neuberger

M.S.

and Winter

, Replacing the complementarity-determining regions in a human antibody with those from a mouse, Nature 321 (1986), 522–525.

47.

Lefranc

M.P.

, IMGT, the international ImMunoGeneTics information system: a standardized approach for immunogenetics and immunoinformatics, Immunome Res 1 (2005), 3.

48.

Lefranc

M.P.

Ehrenmann

Kossida

Giudicelli

and Duroux

, Use of IMGT Databases and Tools for Antibody Engineering and Humanization, Methods Mol Biol 1827 (2018), 35–69.

49.

Chiu

M.L.

Goulet

D.R.

Teplyakov

and Gilliland

G.L.

, Antibody Structure and Function: The Basis for Engineering Therapeutics, Antibodies (Basel) 8 (2019), 55.

50.

Sela-Culang

Kunik

and Ofran

, The structural basis of antibody-antigen recognition, Front Immunol 4 (2013), 302.

51.

Klein

Diskin

Scheid

J.F.

Gaebler

Mouquet

Georgiev

I.S.

Pancera

Zhou

Incesu

R.-B.

B.Z.

Gnanapragasam

P.N.P.

Oliveira

T.Y.

Seaman

M.S.

Kwong

P.D.

Bjorkman

P.J.

and Nussenzweig

M.C.

, Somatic mutations of the immunoglobulin framework are generally required for broad and potent HIV-1 neutralization, Cell 153 (2013), 126–138.

52.

Jian

J.W.

Chen

H.S.

Chiu

Y.K.

Peng

H.P.

Tung

C.P.

Chen

I.C.

C.M.

Tsou

Y.L.

Kuo

W.Y.

Hsu

H.J.

and Yang

A.S.

, Effective binding to protein antigens by antibodies from antibody libraries designed with enhanced protein recognition propensities, MAbs 11 (2019), 373–387.

53.

Winter

, Milstein Man-made antibodies, Nature 349 (1991), 293–299.

54.

Yam

Yang

Armstrong

and Abigail

Y.U.

, Anti-bcma receptor antibodies, compositions comprising anti bcma receptor antibodies and methods of making and using anti-bcma antibodies, World Patent (2019, Patent Number: WO2019190969A1.

55.

Schneider

E.L.

Lee

M.S.

Baharuddin

Goetz

D.H.

Farady

C.J.

Ward

Wang

C.-I.

and Craik

C.S.

, A reverse binding motif that contributes to specific protease inhibition by antibodies, J Mol Biol 415 (2012), 699–715.

56.

Dunbar

and Deane

, ANARCI: antigen receptor numbering and receptor classification, Bioinformatics 32 (2016), 298–300.

57.

Chen

Miao

Liu

Xiao

Z.-X.

and Cao

, AbRSA: A robust tool for antibody numbering, Protein Sci 28 (2019), 1524–1531.

58.

Abhinandan

K.R.

and Martin

A.C.

, Analysis and improvements to Kabat and structurally correct numbering of antibody variable domains, Mol Immunol 45 (2008), 3832–9.

59.

Dunbar

Krawczyk

Leem

Baker

Fuchs

Georges

Shi

and Deane

C.M.

, SAbDab: the structural antibody database, Nucleic Acids Res 42 (2014), D1140–6.

60.

Adolf-Bryfogle

North

Lehmann

Dunbrack

R.L.

, Jr., PyIgClassify: a database of antibody CDR structural classifications, Nucleic Acids Res 43(2015), D432–8.

61.

Retter

Althaus

H.H.

Münch

and Müller

, VBASE2, an integrative V gene database, Nucleic Acids Res 33 (2005), D671–4.

62.

Ambrosetti

Olsen

T.H.

Olimpieri

P.P.

Jiménez-García

Milanetti

Marcatilli

and Bonvin

A.M.J.J.

, proABC-2: PRediction of AntiBody contacts v2 and its application to information-driven docking, Bioinformatics 36 (2020), 5107–5108.

Significance of antibody numbering systems in the development of antibody engineering

Abstract

Keywords

1. Introduction

2.1 Kabat numbering system

2.3 Martin numbering system

2.4 Gelfand numbering system

2.5 Honegger’s numbering system (AHo’s)

2.6 IMGT numbering system

3. Numbering system: Significance and application

Table 1 List of different tools for antibody numbering systems

5. Comparative CDR analysis by various numbering systems

6. Conclusion

Footnotes

Acknowledgments

Conflict of interest

Author contribution

References

Table 1
List of different tools for antibody numbering systems