Abstract
Gene therapy with adeno-associated viral (AAV) vectors has reached the clinical stage for many inherited and acquired diseases. However, due to a cargo capacity limited to <5 kb, AAV-mediated treatment of diseases that require transfer of larger genes still appears elusive. This is a major drawback of a platform that has otherwise been repeatedly found to be safe and effective. Thus, great efforts have been directed toward the identification of strategies to overcome this limitation. Among the most studied approaches is the use of dual vectors, in which a transgene is split across two separate AAV vectors. Mechanisms acting at either the DNA, pre-mRNA, or protein levels have been explored to restore full-length transgene expression in infected cells. Here, we will review them as well as additional strategies developed to deliver large genes with AAV. We discuss the pros and cons of these strategies and the aspects that still need to be addressed.
Introduction
Recombinant adeno-associated viral (AAV) vectors are currently considered the leading platform for in vivo gene therapy. Since the initial isolation of an infectious AAV clone, 1,2 many efforts have been devoted to refining this virus as a gene therapy vehicle. Various reasons are behind AAV vectors gaining popularity over the years, including the lack of pathogenicity of wild-type AAV, their low immunogenicity, and the availability of different serotypes with broad but distinct tropism and ability to transduce a variety of target tissues, including liver, retina, cardiac muscle, and central nervous system. 3 –5 These aspects, as well as the substantial innovations that have been established over the years, have boosted the investigation of AAV as a human gene therapy platform.
Although initially explored for the delivery of genes mutated in monogenic inherited diseases, AAV vectors were later adapted for targeting more complex conditions or delivering innovative tools, as those for genome editing. More than three decades of pre-clinical research have thus culminated in three AAV-based gene therapy products for inherited diseases affecting different tissues receiving market approval from the European Medicines Agency and the American Food and Drug Administration between 2012 and 2019. 6 –8 Despite the encouraging clinical success of these AAV-based gene therapy products, there are still a number of challenges of this gene delivery platform that limit its broad use. Addressing these challenges is one of the main current focus of the gene therapy field.
One of the drawbacks of AAV is that, being the smallest virus described to date, its optimum genome-packaging capacity is <5 kb. Considering that recombinant AAV vectors are generated by replacing the whole AAV genome, except the two 145 bp-long inverted terminal repeats (ITRs), still this allows the incorporation of less than 4.5 kb of foreign DNA, which should include at a minimum the coding sequence (CDS) of the therapeutic gene as well as the promoter and the polyadenylation (polyA) sequence. Therefore, this cargo capacity is clearly insufficient to deliver genes with a CDS that exceeds 4 kb, including those causative of common inherited diseases (Table 1).
Large therapeutic genes and multiple adeno-associated viral vector-based strategies for their delivery
AAV, adeno-associated viral vectors; CDS, coding sequence.
In this review, we will discuss the various strategies that have been explored to adapt AAV for delivering large genes.
Fitting Large Genes in a Single AAV Vector
A seemingly straightforward way to overcome the limited cargo capacity of AAV is to shrink the transgene expression cassette.
Efforts have been directed toward generation of either shortened versions of the regulatory elements (i.e., promoters and polyA),
9
–12
retaining their efficiency and specificity, or development of shortened versions of therapeutic genes,
9,11,13
to encode a truncated but functional protein. Both strategies have been explored with some degrees of success but have also shown some limitations. Indeed, generation of shorter versions of the regulatory elements often led to reduced levels of transgene expression. A combination of short synthetic enhancers and promoters was explored to overcome this issue.
14
Development of truncated versions of a therapeutic gene requires a deep understanding of the structure
Historically, the first attempt at optimizing the delivery of a large gene via AAV vectors is represented by the AAV2-mediated delivery of the cystic fibrosis transmembrane conductance regulator (CFTR) gene.
The large size of the CFTR CDS (∼4.4 kb), indeed, leaves little space for the regulatory elements in the vector; therefore, a number of efforts have been directed toward shrinking the CFTR expression cassette by either generating truncated forms of CFTR that retain their functionality or identifying short promoter/polyA signals. 15 Flotte et al. first attempted the delivery of an AAV vector in which the expression of a truncated version of CFTR was driven by the ITRs of AAV. These, indeed, were found to have a low promotorial activity, resulting in detectable levels of CFTR both in vitro and in vivo. 16 –19 However, the levels of expression in patients with cystic fibrosis (CF) enrolled in the first clinical trial were found to be generally low. 18,20,21
Further studies have shown that targeted deletions in the CFTR CDS, to allow inclusion in the AAV genome of longer and more efficient promoters and/or enhancers, result in higher efficiency of CFTR reconstitution in cells, 9,14,15,22 –24 confirming that reducing the length of the expression cassette by minimizing the size of the regulatory elements, at the cost of their strength, might limit the efficacy of the gene therapy approach.
Micro-gene therapy is under extensive clinical evaluation also for gene therapy of Duchenne muscular dystrophy (DMD), with three ongoing clinical trials in the United States and one being planned in Europe. 25 The large size of the dystrophin gene (CDS: ∼11 kb) has been a major obstacle in developing methods for DMD gene therapy for a long time. However, the initial discovery of a highly functional mini-version of the dystrophin protein in mildly affected Becker muscular dystrophy patients, 26 as well as the numerous studies that have generated considerable knowledge regarding the structural domains of dystrophin, prompted rational design of smaller but highly functional mini- and micro-dystrophin versions suitable for AAV-mediated gene therapy applications. These have been shown to provide some degrees of therapeutic benefit. 27 –30
The size of the CDS of additional large proteins has been optimized to fit into AAV vectors. 31 –34 Some of these mini-protein versions have already reached the clinical stage, 35 whereas other forms of therapeutic applicability are still under preclinical investigation, to deeply evaluate retention of the proper functionality. 36 For many diseases, however, the generation of truncated versions that retain protein function has so far not been achieved. Consequently, the development of additional methods to express large therapeutic proteins by AAV gene transfer has remained an important area of research.
Another interesting strategy that has been used by several research groups to deliver large genes through AAV vectors is to encapsidate the full-length expression cassette (>5 kb) in a single AAV, without any size optimization. 37 –41 These so called “oversize” AAV vectors have been found to successfully express full-length large proteins both in vitro and in vivo. 37 –41 However, it was found that the genome contained in oversize AAV vectors was not a pure population of intact large-size genomes but rather a mixture of genomes highly heterogeneous in size. 39,42 –45 Reassembly of these truncated genomes in the target cell nucleus after infection has been hypothesized to be the mechanism responsible for the reconstitution of the full-length expression cassette in infected cells and, thus, for the full-length protein expression seen in different studies. 39,40,42 –45
Despite the well-documented ability of this straightforward approach to mediate large gene transduction in vivo, the genome heterogeneity of oversize AAV vectors opens fundamental questions regarding the safety of the approach, which limit its further clinical translation without additional optimizations.
An alternative strategy that is being investigated to deliver large genes is to leverage the capsid of the human bocavirus 1 (HBoV1), 46,47 an autonomous parvovirus relative of AAV, with a 5.5 kb genome. Indeed, it has been shown that an oversize AAV2 vector genome, including the CFTR expression cassette, can be effectively packaged in the HBoV1 capsid, resulting in efficient transduction of the human-polarized airway epithelia. 47 Further investigation of the tropism of this vector for additional tissues could expand the portfolio of strategies available for delivery of large genes.
Exploring the Use of Multiple AAV Vectors to Deliver Large Genes
Although the ability to deliver large transgenes in a single AAV vector has been a longstanding goal of the gene therapy field, the strategy that has been most extensively studied and tested over the past years to overcome the limited AAV cargo capacity is to co-administer two or more AAV vectors that carry separate parts of a gene. Joining of the two halves at either the DNA, pre-mRNA, or protein level is then explored to reconstitute the expression of the large therapeutic protein (Fig. 1), as described later.

Overview of multiple AAV-based approaches for expression of large therapeutic proteins. The CDS of a large gene is split in two halves (5′ and 3′), flanked by the ITRs, which are separately packaged into two AAV capsids. On co-transduction of the same cell, different mechanisms are explored to reconstitute full-length protein expression through joining of the two halves at either the DNA
Dual AAV-mediated reconstitution of a large gene
Various dual AAV strategies to reconstitute a large gene in target cells have been described, which rely on different mechanisms: trans-splicing, overlapping, and hybrid AAV vectors (Fig. 1A).
The trans-splicing approach takes advantage of the inherent ability of AAV's ITRs to concatemerize to reconstitute full-length genomes. 48,49 In this approach, the two vectors carry two separate halves of the transgene, and they are designed as follows: the 5′-vector has the promoter, the 5′ half of the CDS, and a splicing donor (SD) signal; the 3′-vector has a splicing acceptor (SA) signal, the 3′ half of the CDS, and the polyA signal. On reconstitution of the full-length genome through tail-to-head ITR-mediated concatemerization of the two AAV genomes, the SD and SA signals will allow splicing of the concatemerized ITR structure that forms at the junction point to obtain an intact large mRNA molecule.
The overlapping approach relies on homologous recombination between an overlapping sequence of the CDS included in both AAV vectors to reconstitute a full-length genome. 50 Thus, the 5′-vector carries the promoter and the 5′ half of the CDS, whereas the 3′-vector carries the 3′ half of the CDS and the polyA signal.
The hybrid approach is a combination of the trans-splicing and the overlapping approaches, as it is based on the addition of a highly recombinogenic exogenous sequence to the trans-splicing vectors to increase recombination efficiency. 51 This recombinogenic sequence is placed downstream of the SD signal in the 5′-vector and upstream of the SA signal in the 3′-vector. On either ITR-mediated concatemerization of the two AAV vectors or homologous recombination mediated by the exogenous recombinogenic sequence, the splicing signals will allow splicing of the recombinogenic sequence and/or recombinant ITR structure to restore the large gene mRNA.
Each of these strategies has some advantages as well as drawbacks for successful application.
Although the overlapping approach is the simplest in design since it does not require foreign or artificial DNA elements compared with the other approaches, its efficiency is strictly both transgene- and target cell-dependent. Indeed, both the extent of activity of the repair mechanism on which overlapping vectors rely for large gene reconstitution in the target cells and the intrinsic ability of the overlapping region to mediate efficient homologous recombination will significantly impact the efficiency of the platform. 52,53
Vice versa, ITR concatemerization, on which trans-splicing vectors rely, is a spontaneous process that has been described to occur in target cells on infection. 49 However, for efficient trans-splicing to occur, there are a number of rate-limiting steps to be overcome (for a detailed discussion see Ghosh and Duan 54 ). These include formation of the concatemer in the productive orientation (i.e., tail-to-head) and effective splicing, the last being influenced by both the sequence of the gene that surrounds the splitting point and the efficiency of the splicing signals.
The hybrid approach was, thus, developed with the aim of overcoming the limitations of the two previous approaches. 51 The use of a highly recombinogenic exogenous sequence is expected to increase the chances of directing dual AAV vectors joining in the proper orientation compared with the trans-splicing approach. Concomitantly, it should assure efficient homologous recombination in a transgene-independent manner, differently from the overlapping approach. Yet, similarly to the trans-splicing approach, the sequences surrounding the splicing signals have an impact on splicing efficiency, so careful selection of the splitting point is requested to achieve maximal efficacy of large gene reconstitution.
Assessment of the potential of dual AAV vectors in reconstituting both reporter and therapeutic genes in vitro and in vivo has been performed, having as a target various tissues, including sensory organs, 55,56 muscle, 41,57 –59 lung, 60 heart, 61 and liver. 59,62 The different studies have led to variable results. This is somewhat expected as there are several factors to consider: the design of the platform, target cell types, AAV vector serotype and dose used, as well as transduction levels achieved. These factors all impact the success of transduction from dual AAV vectors. Thus, even slight differences between the studies might account for variable outcomes.
Historically, the trans-splicing approach has been the first to be developed 48,63 –65 and initially tested for erythropoietin 48 and factor VIII 62 gene delivery. Some years later, successful reconstitution of the mini-dystrophin gene was also achieved by intramuscular injection of trans-splicing vectors. 58 Despite the mini-dystrophin studies that clearly highlighted the importance for this platform of the selection of the optimal splitting point to achieve efficient transgene expression, 58,66 the success in reconstituting the gene and improving the phenotype of the DMD mouse model prompted further investigations of the ability of trans-splicing vectors to reconstitute large transgenes. 57,67,68
Overlapping vectors have been more consistently used to reconstitute genes in the muscle, compared with other tissues, as the retina, confirming that some tissues are more amenable than others to homologous recombination-based approaches.
Intravenous dual AAV overlapping-mediated delivery of the mini-dystrophin gene has been found to result in levels of expression that, although lower than in wild-type muscles, were able to improve muscle performance. 29,69 Interestingly, the efficiency of reconstitution from overlapping vectors was also found to be superior to three different sets of hybrid vectors, suggesting that specific regions of the DMD gene might be particularly prone to homologous recombination. 70 Similarly, a study comparing different strategies for dysferlin delivery to the muscle identified the overlapping as the most efficient approach, although the hybrid vectors relied on a region of homology derived from an intron of the dysferlin gene, which might have been not sufficiently recombinogenic. 41
Differently, in the retina, low levels of overlapping vectors-mediated transduction have been generally found in the terminally differentiated photoreceptors across different studies, 52,71,72 whereas more efficient reconstitution was found in the retinal pigmented epithelium. 52 Optimization of the overlapping region and use of serotypes with high retinal transduction ability were found to be prerequisites to achieve sustained levels of transgene expression. 53,73
In the absence of identification of optimal overlapping regions in the transgene or in the presence of poor splitting sites, the hybrid approach has been found to significantly increase the efficiency of transgene reconstitution compared with overlapping or trans-splicing vectors. 51,52
In the retina, various studies have shown the ability of hybrid vectors to reconstitute large transgenes,
52,72
–74
at levels that were higher compared with the other dual AAV strategies side-by-side tested
52,72,73
and that resulted in improvement of the retinal phenotype of animal models of inherited retinal diseases.
52,75
As a confirmation, prompted by the promising results obtained in the delivery of the large MYO7A gene to the retina of the Usher 1B mouse model,
52
a phase I/II clinical trial has been planned to investigate the efficacy of hybrid vectors in Usher 1B patients (
Recently, the possibility of expanding the dual-vector system to a triple-vector system, increasing the genome capacity of AAV vectors even up to 14 kb, has also been investigated by different groups. 76 –78 Although full-length gene reconstitution has been shown to occur with triple AAV vectors in different studies, 76 –78 resulting in transient improvement of the phenotype of a mouse model of an inherited retinal disease, 78 the expression levels overall were found to be weak. Nonetheless, future optimizations that improve single or multiple AAV transduction might allow the use of triple AAV vectors for delivery of proteins that are too large for even dual-vector systems.
Two major drawbacks are associated with dual and triple AAV vector platforms. First, production of short unwanted protein products derived from the free single halves, which have not been engaged in full-length gene reconstitution, have been observed in various studies. Optimization of dual AAV vectors design as well as inclusion of in-frame degradation sequences, which impact only production of the unwanted protein products without altering full-length protein production, has been used to overcome this issue. 53,75
Second, with the exception of a few reports, 60,79 all the strategies have shown lower efficiency compared with a single normal-size AAV vector. Whether these levels are sufficient to achieve therapeutically meaningful improvements in some diseases is still an open question that needs to be investigated in future clinical trials.
Pre-mRNA trans-splicing-mediated reconstitution of a large mRNA
The discovery that primary transcripts of eukaryotic genes can undergo the process of RNA trans-splicing (i.e., spliceosome-mediated splicing between two different pre-mRNA molecules) 80,81 opened new possibilities for gene therapy, based on correction of defective genes at the RNA rather than at the DNA level.
Although initially trans-splicing has been explored to correct mutations in endogenously produced mutated pre-mRNAs by delivery of an exogenous corrected fragment of the endogenous gene, 82 adaptations of this process have been later developed. In the approach defined as “segmental trans-splicing” (STS), two engineered individual DNA fragments encode for the 5′ and 3′ fragments of the pre-mRNA of a large gene and share an intronic hybridization domain that can favor trans-splicing, leading to joining of the two half-transcripts into an intact full-length mRNA 83 (Fig. 1B). Interestingly, AAV-mediated delivery of the 5′- and 3′-halves of the CFTR cDNA in human CF airway epithelial cells followed by STS was found to result in full-length CFTR reconstitution and restoration of CF cells' functionality. 84
Despite this pivotal proof-of-concept study, however, the inability to achieve significant levels of pre-mRNA trans-splicing, as well as the several scientific and technical challenges related to the use of RNA-based repair strategies 82,85 have limited further investigation of STS in combination with AAV vectors as a strategy for corrections of mutations in large genes.
Protein trans-splicing-mediated reconstitution of a large protein
Another system that has been tested to reconstitute large proteins via AAV vectors relies on the use of the protein trans-splicing (PTS) mechanism.
PTS is a post-translational process catalyzed by intervening proteins called split-inteins. Split-inteins are expressed as two independent polypeptides (N-intein and C-intein) at the extremities of two host proteins (N-polypeptide and C-polypeptide) and remain inactive until encountering their complementary partner. 86 On association with the counterpart, the reconstituted intein precisely excises itself from the host proteins while mediating ligation of the N- and C-polypeptides via a peptide bond, in a traceless manner (Fig. 1C). This PTS process occurs without any known cofactor or source of energy. Proper folding of the split-inteins as well as the presence of a nucleophilic residue as first amino acid of the C-polypeptide, to assist catalysis, are the only requirements. 87
The first natural split-intein was identified in 1998. 86 Since then, a plethora of natural and artificial split-inteins have been developed and exploited in different biotechnological applications, 88,89 including reconstitution of large genes. The first proof of concept that split-inteins could effectively reconstitute large genes was obtained by using the functional B-domain deleted FVIII (BDD-FVIII). BDD-FVIII light and heavy chains were found to be efficiently ligated from split-inteins, resulting in increased circulating FVIII and coagulation activity in the plasma. 90,91
Given this promising result, split-intein use has been further adapted to AAV-based applications, to deliver therapeutic genes that are mutated in muscle, liver, and retinal diseases. The first evidence that split-intein-based gene therapy approaches could be used to deliver large genes via AAV vectors was obtained by using the mini-dystrophin gene. 92 On co-delivery in mdx mice of the two halves of the mini-dystrophin cDNA fused to N- and C-inteins, efficient production of the two polypeptides was shown. However, no detailed investigation on the functionality of the reconstituted protein was performed. 92
The most extensive use of the AAV-split-inteins platform has been for the reconstitution of the large and widely used clustered regularly interspaced short palindromic repeats (CRISPR)–Cas9 genome editing nuclease, both in vitro and in vivo. 93 –95 Indeed, many of the developed Cas9 proteins have a CDS that exceeds the canonical AAV cargo capacity 96 ; thus, identification of split-intein-Cas9 sets that retain Cas9 endonuclease efficiency on reconstitution has been an exciting field of research over the past few years.
The possibility to split Cas9 proteins in halves has, in addition, opened the unprecedent opportunity to deliver engineered Cas9-fusion proteins, 94 which have even larger sizes than wild-type Cas9, via AAV vectors. Editing of a disease-causing mutation using CRISPR-associated base editors in a mouse model of phenylketonuria was found to restore physiological blood phenylalanine levels. 97 Similarly, targeted repression of a master regulator gene of photoreceptor determination, mediated by split-inteins-Cas9 fused to a repressor domain, prevented vision loss in a mouse model of retinitis pigmentosa. 98
We have recently demonstrated that AAV-intein-based platforms can be explored to reconstitute large therapeutic proteins in the retina of animal models and in human photoreceptors from retinal organoids. Delivery of AAV-intein vectors encoding for the ABCA4 protein in a mouse model of Stargardt disease resulted in significant amelioration of the phenotype. 99 Importantly, in a side-by-side comparison between the AAV-intein and dual AAV hybrid platforms, we found that the levels of AAV intein-mediated large protein reconstitution largely exceeded those achieved via AAV genome recombination by dual AAV vectors. 99 In addition, in the large pig retina we have found levels of AAV-intein-mediated protein reconstitution comparable to those achieved with a single AAV vector.
One of the drawbacks of the AAV-intein platform is that part of the cloning capacity will be taken up by regulatory elements that need to be replicated within each vector. Due to this design, the packaging capacity of the platform might still be insufficient for some large genes. Thus, use of short, and often weaker, regulatory elements would be required.
Thanks to the availability of different inteins, which do not cross-react, 91,99,100 we have shown that it is possible to expand the AAV intein system to a triple vector-based system in which the three polypeptides are fused with two different sets of inteins, thus mediating directional full-length protein reconstitution. 99 Along this line, triple AAV-intein-mediated delivery of the large CEP290 protein resulted in more efficient full-length protein expression than that obtained with two AAV-intein vectors, which were lacking additional potent regulatory elements, and in therapeutic efficacy in the mouse model of Leber congenital amaurosis type 10. 99
Additional drawbacks associated with the AAV-intein platform need to be addressed in the near future to expand the use of this promising platform. First, construct design appears to be particularly critical compared with other platforms and will need careful evaluation for each large protein. Second, although the AAV-intein platform appears not to be toxic, 99 the presence of both intein excised from the mature protein and non–trans-spliced single polypeptides opens questions regarding the safety of the platform that will require formal long-term toxicity studies to be finally addressed.
Conclusions and Future Directions
Expanding the cargo capacity of AAV vectors beyond 5 kb has been a longstanding goal of the gene therapy field. Both adaptations of large expression cassettes to fit into a single AAV vector and adaptations of the AAV platform to deliver large genes have been explored. Some strategies, having shown more success, have gained more attention and, thus, their efficiency has been investigated in multiple tissues. The results are often conflicting and have highlighted that there is not one strategy that can fit every need.
Overall, dual AAV strategies have been more reproducibly found to reconstitute large genes, although clearly there are optimizations required for each platform to achieve maximal success, as described earlier. Limitations related to the efficiency of gene reconstitution, which still remains lower than that of a single vector, as well as to the safety of the platform, because of the potential production of shorter protein products, have not been definitively addressed. Due to these limitations, dual AAV applications to target diseases that affect small and confined tissues, as the eye, could have higher chances of success, because of the high rates of co-transduction achievable 74 and limited safety concerns raised from the production of the shorter protein products.
More recently, strategies that rely on intein-mediated PTS for large protein reconstitution are being increasingly explored. Interestingly, the efficiency of reconstitution appears to be higher compared with other platforms.
However, the design of AAV-intein constructs is particularly critical since it needs to take into account both preservation at the junction points of amino acid residues needed for efficient PTS and splitting of the therapeutic proteins outside of structural domains to avoid incorrect polypeptide folding. Thus, screening of various splitting points might be required to identify a suitable set of AAV-intein vectors for each therapeutic gene. Despite the optimizations, some half polypeptides might show limited stability or incorrect targeting inside the cell, resulting in inefficient PTS. In addition, safety issues related to the production of additional protein products remain.
In conclusion, although still at the pre-clinical stage, the research around large gene delivery using AAV vectors is showing encouraging signs of progression. The choice of the best strategy for AAV-mediated large gene delivery will most probably depend on the protein to be reconstituted, the target cells, and the nature of the disease. New emerging strategies, as either genome editing 101,102 or targeted exon skipping using anti-sense oligonucleotides, 103 –105 could further enrich the portfolio of available strategies for correction of diseases due to mutations in large genes. With all these advancements, more and more diseases will become amenable to AAV-mediated gene delivery.
Footnotes
Author Disclosure
No competing financial interests exist.
Funding Information
This work was supported by the University of Naples Federico II under STAR Program (to I.T.).
