Abstract
Recombinant proteins have long been used in the pharmaceutical, chemical, and agricultural industries. These proteins can be produced in hosts such as mammalian cells, bacteria, insect cells, yeast, and plants. However, the demand for recombinant proteins, especially for the prevention, diagnosis, and treatment of diseases, is increasing. Meeting this growing demand on a large scale remains a challenge for many industries. Developing new tools to increase the yield and quality of these proteins is therefore a necessity. Many strategies to optimize protein production in various expression systems have emerged in recent years. This review summarizes the different characteristics of expression systems, as well as the current strategies used to improve the yield of recombinant proteins.
Introduction
As modified forms of protein that can be biosynthesized in huge quantities, recombinant proteins are beneficial to the chemical, agricultural, and biopharmaceutical industries [1]. They have already been produced employing several expression systems using mammalian cells, insect cells, yeast, bacteria, plant, and animals as hosts. However, several parameters need to be taken into account for production on a large scale of these proteins. These factors include the recombinant enzyme’s mass, purity, or solubility; the number of disulfide linkages; the required post-translational modifications (PTMs); the location where the expressed enzyme will be used; the cost and ease of use of the protein expression system; and, finally, regulatory issues in the case of pharmaceutical production [2]. Heterologous pharmaceutical proteins have been produced in expression hosts such as Escherichia coli, Streptomyces [3, 4], Pichia pastoris, Saccharomyces cerevisiae [5, 6], and HEK293T cells [7]. Prokaryotic expression systems are very appealing for the expression of recombinant DNA (rDNA). Because of its rapid growth, continuous fermentation capacity, low cost, well-understood metabolism and genetics, and high production yields, Escherichia coli is by far the most widely used host cell for protein expression [3, 9]. Moreover, the absence of PTMs in the latter makes the yeast Pichia Pastoris, an excellent choice for producing proteins ranging from simple to complex [11]. However, studies are still needed to provide strategies for optimizing the production efficiency of these proteins, given the ever-increasing demand [11, 12]. Recently developed bioengineering and in silico approaches provided an undeniable advantage in terms of yield, cost, and production time, as well as final product quality. The goal of this review is to describe the various expression systems used to biosynthesize recombinant proteins, as well as their benefits and drawbacks, with a focus on current strategies used for high protein production.
Literature review
Strategies for overexpression of protein in prokaryotic expression hosts
Escherichia coli
For over forty years, the E. Coli expression system has been used to produce very huge quantities of recombinant proteins due to its simplicity, speed, easy genetic modification, and overall low cost [8, 13]. This made this gram-negative bacterium a satisfactory gene expression and commercial production platform. E. coli has contributed to the development of a variety of enzymes and proteins used industrially, for clinical applications, and biological analyses [14–17]. However, E. coli has a lot of drawbacks, including protein expression as inclusion bodies (IBs), intracellular accumulation of proteins, endotoxin production, codon bias, protein degradation action of proteases and a lack of PTMs [2, 13]. Many methods have been created in recent years to address these issues. The supplementation of cofactors, co-expression of the protein of interest with molecular chaperones, and the addition of fusion tags to the gene sequence, can all assist in decreasing IBs formation [9, 13]. As recently demonstrated, several tags such as -fructofuranosidase (-FFase), 30-phosphoadenosine-50-phosphatase (CysQ), and carbohydrate binding module 66 (CBM66) can aid in the production of a correctly folded and soluble protein [18–20]. Furthermore, in silico approaches for predicting the tendency of IBs formation and then identifying proteins with the appropriate structural characteristics that can be expressed in soluble form have been established [21, 22]. Moreover, optimizing heterologous gene transcription is essential to avoid IB formation, which is an inefficient translocation/transport mechanism of produced proteins [20]. In other hands, for solving the problem of endotoxin production, endotoxin-free recombinant strain ClearColi® BL21(DE3) can be used for the synthesis of non-toxic protein [23]. Light induction of protein synthesis in E. Coli with cell autolysis can also help to secrete protein in the culture medium [24] and codon bias can be solved using host modification and codon optimization of the gene of interest [25, 26]. E. coli is well known for its inability to properly process proteins. Molecular chaperones can help with issues like incorrect disulphide binding and proteolytic degradation [27], and the intracellular accumulation of protein [14, 15]. In addition, the N-glycosylation system of Campylobacter jejuni was isolated and transferred to E. coli, enabling simple protein glycosylation [28]. Another previous study introduced individual PTMs such as phosphorylation and acetylation into one protein simultaneously used the genetic code expansion technique [29]. Some predictive models, such as the mutation predictor for increased protein expression (MPEPE), can also enhance protein expression in this system [30].
Bacillus subtilis
Bacillus subtilis is a Gram-positive prokaryotic cell which is widely recognized for its exceptional ability to secrete proteins and presented as a better alternative to the E. coli expression system [31]. Unlike Gram-negative bacterium which lacks an outer membrane, B. subtilis allowing proteins secretion directly into the culture medium while avoiding the production of endotoxins. Furthermore, there are no significant differences in the use of codons, and it can ease large-scale production of proper folding protein [11, 32]. Another benefit of this bacterium is its lack of toxins, which has led the US Food and Drug Administration (FDA) to declare the proteins produced by this cell factory to be generally recognized as safe (GRAS). Furthermore, what makes B. subtilis so appealing as a host cell is its classification by the European Food Safety Authority (EFSA) as having Qualified Presumptive Safety (QPS) status [32, 33]. However, as with other systems, protein production with this bacterium is challenging due to factors such as plasmid instability, the production of extracellular proteases and the presence of multiple secretion bottlenecks in the general (Sec) secretion pathway, including translocation, protein folding and membrane targeting [11, 32]. The optimization of media conditions and the combination of a signal peptide with a strong promoter can improve protein expression in this system [34]. Furthermore, using a structurally stable replicating plasmid, molecular chaperones, and signal peptides can improve translocation, folding, stability, and yield of correct folding proteins [35, 36]. Also, mediate the gene expression promotion by multiple tandem promoters can help to overexpress proteins in B. subtilis [37]. Moreover, using of strains with numerous deletions in extracellular proteases genes reduced protein degradation in this system significantly [38].
Streptomyces
Streptomyces are Gram-positive bacteria that live in aerobic soils and can degrade complex organic. Many strains have been used to produce recombinant proteins, but Streptomyces lividans remains the most appealing [39, 40]. Compared to Bacillus, Streptomyces can also secrete a correctly folded protein into the extracellular medium, making it easy to purify [40, 41]. However, Streptomyces are distinguished by a high proportion of guanine and cytosine in their DNA (70–73% GC), a very large genome (11.9 Mbps) [42] with gene clusters that biosynthesize diverse secondary metabolites [43]. Furthermore, Streptomyces Lividans strain has low proteolytic and restriction enzyme activities, making easy recombinant DNA insertion [40, 44]. Nevertheless, this expression system is not always efficient for certain proteins and they need optimization processes to improve their activity. By using robust and stable Streptomyces antibiotic marker-free expression system, we can enhance production of secreted proteins [45]. The Sec-dependent secretion pathway is currently used to secrete protein in Streptomyces, but the Tat-dependent pathway can also be used [46]. Unexpectedly, it has been demonstrated that Sec-dependent translocation with tatB mutant deletion, can increase protein secretion in this system [47]. Additional research is then needed to fully understand these findings. Transcriptomic and fluxomic changes, on the other hand, can help to develop a more efficient S. lividans strain allowing for the easy secretion of protein [48]. Optimization of operational conditions, screening of signal peptides, vectors, alternative promoters [39, 50] and codon optimization [51] have been also shown as strategies to improve protein production in S. lividans.
Strategies for overexpression of protein in Eukaryotic expression hosts
Yeast
Because of its rapid growth, ease of gene engineering, low cost of growth medium, and ability to insert new genes, yeast has emerged as the best suitable industrial host for protein expression [11, 52–54]. The methylotrophic Komagataella phaffii (K. phaffii) and non-methylotrophic Saccharomyces cerevisiae (S. cerevisiae) yeast expression hosts are the most promising for large-scale protein production [52, 54].
K. phaffii or Pichia pastoris is a yeast that is commonly used in pharmaceutical and biotechnological industries. Many industrially important proteins have been produced using K. phaffii [54–57]. The most significant benefit of this host is its ability to process protein using mechanisms such as protein folding and signal peptide cleavage, intracellular PTMs and a lower degree of glycosylation [53, 54]. Indeed, the alcohol oxidase genes (AOX1 and AOX2), as well as the GAP promoter (PGAP), exhibit strong constitutive expression in the presence of glucose and glycerol, contributing significantly to K. phaffi’s reputation as an alcohol producer [58]. However, this system presents some drawbacks such as a high concentration of proteases which can degrade the product and risks associated with using of methanol as it has been shown as toxic [54, 59]. To overcome the protein degradation, several strategies have been used, including the addition of amino acids, yeast peptone, protease inhibitors, lowering the pH, and reducing the induction time and temperature during fermentation [53]. Furthermore, because controlling the concentration of methanol in the culture medium is critical for high protein yield, using glycerol or ascorbic acid/sorbitol as co-substrate in a mixture ratio with methanol could increase PAOX1 induction levels when compared to cultures with methanol only [55, 60]. On the other hand, PGCW14 promoter and PPDF promoter can be used as appropriate free methanol inducer in P. pastoris fermentations as alternative to PGAP promoter [61, 62]. Recently, a novel dual-plasmid Pichia pastoris system was developed by combining integrated PGAP and episomal PGCW14 and is able to increase protein production yield without methanol induction. Indeed, it has increased yield of XynA protein 16.7-fold and 2.86-fold, respectively, when compared to constitutive and methanol-induced expression. As methanol’s toxicity is reduced, this dual-plasmid system is very promising for use in the pharmaceutical and food industries [63]. Moreover, co-expression of transcription factor, molecular chaperones and fusion tags can increase yield of protein and reduce the intracellular aggregation of proteins in K. phaffii expression system [5, 65].
S. cerevisiae differs from other yeast systems in that the proteins it produces are GRAS, making it an ideal host for protein production. It is non-infectious and has been used for many decades in the nutritional and biopharmaceutical industries [66]. Some marketed products derived from S. cerevisiae expression system include insulin, glucagon, macrophage, hepatitis B surface antigen, and hirudin [67, 68]. Nevertheless, some drawbacks have limited the efficiency of this system, particularly protein hyperglycosylation [54]. Indeed, the hypermannosylation capacity of S. cerevisiae can lead to reduction of heterologous glycoprotein activity. By its ability to inhibit the yeast hypermannosylation gene (OCH1) and to insert glycosyltransferase and glycosidase genes for the biosynthesis of the glycosylated protein, the GlycoSwitch® system represents a good option for solving hypermannosylation [69]. Furthermore, disrupting the activity of the essential Golgi mannosyltransferases Och1p and Mnn9p in S. cerevisiae increased protein secretion by activating the secretory pathway and affecting cell wall integrity [70]. The unfolded protein response signaling pathway is an endoplasmic reticulum (ER) stress response pathway that acts as a quality controller during secreted or membrane-localized protein synthesis. Hac1p, a key regulator of this pathway, when overexpressed, has increased xylanase production S. cerevisiae expression host [6]. Furthermore, moderate overexpression of SEC16, result of anterograde protein trafficking from the ER to the Golgi apparatus in S. cerevisiae could improve protein secretion [71]. On the other hand, the use of fittest mutated α-factor preproleader (MFα1), introduction of conserved amino acid residues into the enzyme to improve protein folding and stability, mutations by random mutagenesis of a sequence and engineering of new N-glycosylation sites in the enzyme have enhanced fungal laccase production in S. cerevisiae [72]. It has also been shown that using a secretion-enhancing peptide cassette derived from hIL-1 can strengthen production of the human granulocyte colony-stimulating factor (hG-CSF) in this system [73]. Moreover, CRISPR-associated protein 9 (Cas9) was used to create yeast platform strains by integrating a site-specific gene or deleting unwanted genes to improve protein production [52, 75]. Metabolic pathway bottlenecks continue to be a major problem in industrial bioproduction using yeast cell factories. The HapAmp approach, as a solution, has proven to be an efficient way to eliminate metabolic bottlenecks by using haploid sufficiency as an evolutionary force and promoting gene amplification in vivo [76].
Mammalian cells
Mammalian cell-based expression systems are the best option for producing recombinant proteins used in disease diagnosis and treatment [77, 78]. They can introduce correct protein folding, PTMs, and product assembly, all of which are critical for the final protein’s quality (See Table 1). Furthermore, these cells can recognize the signals used to synthesize, process, and secrete eukaryotic proteins with ease and efficiency [79].
Protein of industrial interest have been produced using a variety of mammalian cell lines [80–83]. The Chinese hamster ovary (CHO) cell line is the most commonly used among other cell lines, followed by human embryonic kidney (HEK), mouse myeloma (NS0), mouse myeloma (SP2/0), and baby hamster kidney (BHK-21) [84, 85]. A recent study found that CHO cell-based systems contribute 84% of approved monoclonal antibodies (mAb) [86]. However, industrial protein production using this system is currently hampered by high costs, complicated technology, and the risk of contamination with animal viruses [11]. Because the CHO cell line can be grown at high cell densities in serum-free and chemically defined media and has a low risk of infection with animal viruses, it can be used to generate a safety protein [87]. It’s the case of the recombinant fusion protein of the Fc domain of the human tumour necrosis factor receptor - IgG1, which has been successfully produced in genetically modified CHO-DG44 cells and is being presented as a new therapeutic strategy for cytokine-mediated diseases [88]. Furthermore, mitochondrial membrane potential-enriched CHO hosts have been presented as an innovative and powerful tool for improving bioproduction capacity, outperforming the parental host in terms of long-term cell viability, fed batch productivity, lactate metabolism, and cell cloning efficiency during the generation of monoclonal cell lines [89].
In another study, the use of appropriate protein secretion signaling sequences, fusion tags, and expression vectors has proven beneficial in increasing the yield of proteins purified from mammalian cell lines [90]. Promoters are required for transcriptional signal integration and processing. As a result, using the correct promoter, optimizing the combination of promoter and different regulatory elements, and avoiding promoter methylation can all strengthen the protein’s stability and expression [91]. This is the case with the Hspa5 promoter, which has been identified as a powerful promoter for increasing monoclonal antibody production per cell at a late stage of culture [92]. Moreover, optimization of culture process as well as codon optimization can be used as strategies to improve protein yields in mammalian system [93].
CRISPR and other gene editing tools have been used successfully to improve mammalian expression system product quality [94, 95]. The function of the ARC gene was investigated in vitro in a recent study by creating an ARC-knockout (KO) HEK293 cell line via CRISPR/Cas9-mediated gene editing [96]. Some CHO cell lines have been engineered to resist apoptosis in order to increase protein yield. Many variables, however, disrupt the kinetics that govern cell fate in bioreactors. Using knockouts of three BCL2 family effector proteins, Bak1, Bax, and Bok, CRISPR/Cas9 was effective in modelling apoptosis resistance in CHO cells [97]. Another study found that a newly discovered CRISPR/Cas12a-based synthetic transcription factor induced multiplexed activation in mammalian cells and upregulated gene expression [98]. Furthermore, CRISPR/Cas13a is a versatile platform that can be modified for RNA silencing and binding in mammalian cells [99].
Recombinant proteins, including vaccines, antibodies, antigens, growth factors, cosmetic ingredients, enzymes and research or diagnostic reagents, have been produced using transgenic plants. We’ve been talking about plant molecular farming concept for over a decade, which describes the production of protein of industrial interest using plants as host platforms [100–103]. Plant-based expression systems have several advantages, including low cost, well-understood production processes, high scalability, the ability to combine complex proteins with eukaryotic PTMs, and a low risk of infection by animal pathogens [12, 105]. Recently, edible vaccines, a new vaccine concept that can replace traditional vaccines and has the unique ability to combine medical science and plant biology, have been developed. These conventional vaccines are made up of pathogens with reduced virulence that have the ability to replicate in the host plant, such as maize, rice, and potatoes [106–108]. They have several advantages, including the ability to elicit mucosal immunity, performing method of immunization without needing sterile injections, storage as seed for years, no need for adjuvants to elicit an immune response, and no reverse virulence [106, 109]. Nevertheless, current challenges of this system include a lack of non-human glycosylation and regulatory approval [12, 100]. Then using stronger vectors and promoters, engineering glycosylation and sialylation, codon optimization, fusion tags and signal sequences can enhance the production of a correct folded human glycosylated protein in this system [110].
Transgenic animals
Transgenic animal technologies have used genetic engineering to insert epigenetic genes into the animal genome, allowing these genes to be inherited and expressed by offspring. Its effectiveness is dependent on two factors: the transport of this DNA across the nuclear membrane to the chromosomes and the transport of the DNA across the recipient cell’s plasma membrane. Transgenic efficiency and precise control of gene expression are two factors that can limit the success of animal transgenesis [111]. Seminal plasma, mammary gland, egg white, urine, blood, and silkworm cocoon are all examples of transgenic animal systems. Because of its high level of expression and ability to introduce PTMs, the mammary gland is considered an excellent system for recombinant protein biosynthesis. Moreover, the rabbit is the most suitable for protein production among all the other mammalian species currently being studied as bioreactors (pig, goat, sheep, and cow) [112]. Some authors, however, have highlighted promising methodologies that favor the use of transgenic cattle as bioreactors for the production of protein in milk for industrial use [113]. Furthermore, significant advances in genome editing over the last decade, such as the use of the CRISPR/Cas9 system, have greatly simplified animal transgenesis, allowing for the development of new approaches to animal production, including economically important species such as milk [114]. The advantages and disadvantages of various expressions systems used for recombinant protein production is given in Table 1.
Insect cells
Insect cells-based expression systems have been used to produce recombinant proteins important in vaccination, diagnosis and treatments of many diseases [17, 116]. BEVS (baculovirus expression vector system) is a tool for protein production in insect cells that was developed through baculovirus genetic engineering. Co-transfection of a plasmid with a linearized, non-infectious autotrophic plant multinucleopolyhedrosis virus (AcMNPV) result in the production of recombinant BEVS [115].
Compared to other systems, BEVS has many advantages such as providing a correct folding protein with a disulphide bridge and introducing PTMs, very important for protein function. Furthermore, because the baculovirus genome can insert DNA fragments of up to 34 kbp, using a strong viral promoter increases the system’s protein yield [116]. However, protein N-glycosylation in insect cells differs from that in mammalian cells, which can result in a significant reduction of protein biological activity [117]. The development of humanized insect cells and the modification of the baculoviral genome to form the SweetBac system can allow the expression of mammalian glycosylating enzymes and overcome this problem [118, 119].
Baculoviruses are typically propagated in insect cell lines called Sf9. Furthermore, recombinant proteins were also successfully expressed in S2, Sf21 and Tn-368, as they grow quickly and tolerate stress [94, 120]. However, a stable master clone High-Fivetrademark cell line was recently created for the first time by combining RMCE with Flipase (Flp) and fluorescence-activated cell sorting (FACS). Unlike other cells, these cells can express single or multiple proteins of interest from tagged genomic loci, which aids in the development of biological therapies [121]. Furthermore, MultiBac, a recently developed advanced baculovirus-insect cell system, is efficient for heterologous multigene transfer and multiprotein complex production [122]. In addition, BEVS, which is commercially available and similar to BaculoDirect, is a gateway technology that uses a site-specific recombination system of bacteriophage lambda and is simple to integrate [123]. This improved MultiBac system was used in another study to express high levels of VP2 protein and efficiently purify canine microviruses, stimulating the production of high levels of hemagglutination inhibitory antibodies in mice [124]. Moreover, an extended process for the continuous production of Gag virus-like particles (VLPs) in high cell density pseudotypes (haemagglutinin influenza, HA) using stable insect cells adapted to low-temperature culture has also been established. It is regarded as an appealing alternative platform to the insect cell-baculovirus expression vector system (IC-BEVS) [125]. Meanwhile, a novel baculovirus surface presentation system was developed to efficiently present the S proteins of SARS-CoV and SARS-CoV-2 on the surface of recombinant baculoviruses. Indeed, it can transduce HEK 293T cells overexpressing ACE2 receptors with higher transduction efficiency [126]. It has also been proved that RNAi technology, enhancer optimization, co-expression of chaperone proteins (Hsp40, PDI, and Hsc70), and promoter can increase the production of folded proteins in this system [123].
CRISPR-Cas9 technology allows for specific genetic modifications to further improve and expand the BEVS platform’s suitability for recombinant protein production [123]. When compared to the original Drosophila melanogaster CRISPR/Cas9 plasmid, a Culex-optimized CRISPR/Cas9 plasmid efficiently edited the RNAi proteins Dicer-2 and PIWI4 genomic locus in the Hsu cell line [127]. This technology has also been used to create transgenic Cas9 cell lines for future research, including knock-in and multi-threaded editing [128]. Moreover, CRISPR/Cas9 has been used to create single baculovirus vectors as well as single and multiple master editing kits capable of performing cleavage-free DNA search and replacement interventions with no detectable imprinting [129].
Conclusion
Recombinant proteins are very useful in clinical, food, and biopharmaceutical industries. E. coli is the simplest and least expensive system for protein production but Gram-positive bacteria, such as Bacillus and Streptomyces, are also suitable alternative hosts due to their ability to secrete proteins directly in growth medium, while reducing purification steps. As proteins produced by yeast especially S. Cerevisiae are GRAS, added to their capacity to introduce PTMs, these hosts are most used for the production of complex and safety proteins compared to others. Moreover, current limitations of these expression systems can be addressed through bioengineering approaches including HapAmp, GlycoSwitch system, use of strong promoters and plasmids, co-expression of transcription factors, molecular chaperones, and fusion tags. Some technologies like the editing tool CRISPR and MPEPE have been shown to be important in disrupting unwanted genes, predicting misfolding of protein, and engineering apoptosis-resistant cells for optimization of protein biological activity. However, these strategies are still ineffective in producing large quantities of recombinant proteins to meet the market’s ever-increasing demand. Therefore, more funds should be allocated by policymakers to research institutes working on recombinant protein production, as well as the collaboration of expert researchers in the field from around the world to achieve meaningful results.
Conflict of interest
The authors have no conflict of interest to report.
