Abstract
Nucleocytoplasmic large DNA viruses (NCLDVs) are a group of large viruses that infect a wide range of hosts, from animals to protists. These viruses are grouped together in NCLDV based on genomic sequence analyses. They share a set of essential genes for virion morphogenesis and replication. Most NCLDVs generally have large physical sizes while their morphologies vary in different families, such as icosahedral, brick, or oval shape, raising the question of the possible regulatory factor on their morphogenesis. The capsids of icosahedral NCLDVs are assembled from small building blocks, named capsomers, which are the trimeric form of the major capsid proteins. Note that the capsids of immature poxvirus are spherical even though they are assembled from capsomers that share high structural conservation with those icosahedral NCLDVs. The recently published high resolution structure of NCLDVs, Paramecium bursaria Chlorella virus 1 and African swine fever virus, described the intensive network of minor capsid proteins that are located underneath the capsomers. Among these minor proteins is the elongated tape measure protein (TmP) that spans from one icosahedral fivefold vertex to another. In this study, we focused on the critical roles that TmP plays in the assembly of icosahedral NCLDV capsids, answering a question raised in a previously proposed spiral mechanism. Interestingly, basic local alignment search on the TmPs showed no significant hits in poxviruses, which might be the factor that differentiates poxviruses and icosahedral NCLDVs in their morphogenesis.
The Shared Structural Features of Icosahedral Nucleocytoplasmic Large dsDNA Viruses
Nucleocytoplasmic large dsDNA virus (NCLDV) (24) is a loosely defined cluster of large DNA viruses that currently contains ten different families, including Ascoviridae (5), Asfarviridae (3), Iridoviridae (14), Marseilleviridae (8), Mimiviridae (69), Pandoraviridae (45), Phycodnaviridae (57), Pithoviridae (31), Mininucleoviridae (52), and Poxviridae (23). These families were clustered because they have large genomes with size from about 100 KB to 2.5 MB and share similar features in their genomes such as genes from all three kingdoms. Early phylogenetic analyses suggested that they might have a common cellular ancestor (16 –18,24). In the last two decades, different theories regarding the evolution and classification of these viruses have been proposed as more NCLDVs were discovered, leading to many debates (19,21,27,36,38 –40,44,46,47,54,56,70 –72). Recent studies (2,9,28) suggested that these NCLDVs are more likely to evolve from smaller viruses via acquisition and duplication in multiple genes gain and lose events, rather than evolving from a cellular ancestor via reductive evolution.
While being interesting targets for understanding the evolution relationship of viruses, NCLDVs also post significant challenges in their structural studies (64) due to their gigantic sizes, some of which are comparable to the size of bacteria (30,45). Most NCLDVs, except those from the families of Poxviridae, Pandoraviridae, Ascoviridae, and Pithoviridae, are roughly icosahedral in shape. Early cryogenic electron microscopy (cryo-EM) reconstruction of several NCLDVs, such as Acanthamoeba polyphaga mimivirus (APMV) (60,63), Cafeteria roenbergensis virus (CroV) (61), Paramecium bursaria Chlorella virus 1 (PBCV-1) (12,66,74), Faustovirus (26), Chilo iridescent virus (CIV) (66,67), and Phaeocystis pouchetii virus 01 (PpV01) (65), has shown that these viruses mostly possess an external capsid, an inner membrane, and a nucleocapsid, with the exception of Faustovirus, which contains an interprotein capsid instead of an inner membrane (26). The external capsids of NCLDVs from certain families process some unique features. For example, the outer surfaces of Mimivirus and Samba virus (10) are decorated with long fibers that may potentially facilitate host recognition and attachment (18,25,63,73). Short fibers were also seen on the surface of PBCV-1 (74), CIV (66,67), and PpV01 (65). In addition to the fibers, Tupanvirus (1), a virus closely related to Mimivirus, also consists a unique large cylindrical tail attached to the capsid. Furthermore, unique portals were also observed in many NCLDVs, such as the stargate portal in APMV (25,63,73) and the spike portal of PBCV-1 (12,74).
Except these unique structural features, the main part of these icosahedral NCLDV capsids are all assembled from the similar building blocks, called capsomers (64). Cryo-EM reconstruction of icosahedral NCLDVs reveals that their capsomers were arranged into the triangular and pentagonal patches separated by discontinuous lines on the surface (61,65,66). These discontinuous lines appear because capsomers on one side of the lines share the same orientation, which is 60° different to that of capsomers on the other side of the lines. These patches were named as trisymmetrons and pentasymmetrons, respectively (Fig. 1A). Trisymmetron and pentasymmetron were first used by Wrigley in 1969 to describe the triangular and pentagonal patches observed in the decomposed samples of the Sericesthis iridescent virus (58). In addition to the trisymmetron and pentasymmetron, Wrigley (58) proposed the disymmetron, which has never been observed in cryo-EM reconstruction. Wrigley (58) also formulated the possible sizes of di-, tri- and pentasymmetron for constructing the icosahedral capsid. All the possible sizes of tri- and pentasymmetron were further explored mathematically (49). These studies led to the assumption that the capsomers are preassembled into individual symmetron and then assembled into the large icosahedral capsid.

The icosahedral PBCV-1 capsid and NCLDV structural proteins.
The Structures of the Major and Minor Capsid Proteins of Icosahedral NCLDVs
The sizes of these NCLDVs approach the scale of small bacteria, presenting significant challenges in their structural studies (64). Therefore, for decades, the structures of NCLDVs were studied through a combination of techniques, including cryo-EM reconstruction and X-ray crystallography. Structural determinations of the capsomers from PBCV-1 (43) and Faustovirus (26) through X-ray crystallography show that the capsomers are trimers of the major capsid proteins (MCPs), each of which consists of two tandemly linked jelly roll folds (Fig. 1B–D). A jelly roll fold is a wedge-shaped β-barrel structure (48), built from ∼8 antiparallel β-strands, and is commonly seen in viruses infecting hosts throughout all three kingdoms of life (7). This trimeric double jelly roll capsomer was later found in many icosahedral NCLDVs. There are six jelly roll folds in each capsomer, contributing to its pseudohexagonal shape (Fig. 1C). In each double jelly roll, when looked from the side, one jelly roll (J2 in Fig. 1B) is higher (toward the exterior of the virus) than the other one (J1 in Fig. 1B). In addition, the surface loops of the two jelly rolls have different lengths. These two structural features contribute to the trimeric appearance of the capsomers (Figs. 1A and 2D), leading to two distinct orientations when they are packed together (43,61,64). These two orientations are different by 60° (Fig. 2D), which is consistent with the observation in cryo-EM reconstructions mentioned above.

The icosahedral framework formed by PBCV-1 TmPs and their interactions with capsomers at one pentasymmetron.
Due to the recent innovations in cryo-EM techniques, significant breakthroughs in NCLDVs structural studies occurred in 2019, when structures of two NCLDVs, PBCV-1 (20) and the African swine fever virus (ASFV) (35,55), with 3.5 Å and the 4.5 Å resolution, respectively, were published. Besides providing high-resolution details on the capsomers, both structures also uncovered 14 and 4 different types of minor capsid proteins (mCPs) in PBCV-1 (20) and ASFV (35,55), respectively. One mCP is the penton protein that is on each fivefold vertex of the icosahedral capsid. This penton protein has five single jelly roll folds. All the other mCPs identified in these two structures form the hexagonal networks located underneath the capsomers. These mCPs were proposed to stabilize the capsomer assembly. Most mCPs have parts associated with membrane, suggesting that they play roles in shaping the membrane into capsid. While all these mCPs play significant roles in viral capsid assembly, most of our interest falls in the elongated tape measure proteins (TmPs), P2 (gene A342L) in PBCV-1, and M1249 (gene L454) in ASFV. These TmPs build the framework of the aforementioned network beneath the capsomers.
The Role of TmP in Establishing the Icosahedral Shape of the Viral Capsid While Favoring the Spiral Assembly Hypothesis
Understanding the capsid assembly process of NCLDVs has been an intriguing but difficult challenge as the assembly process can have thousands of protein units involved. Despite the high complexity in the capsid assembly of NCLDVs, several interesting and insightful hypotheses were proposed (20,55), among of which is the spiral mechanism (61). This spiral hypothesis is largely based on the orientation of the capsomers, especially on the 30 capsomers located on the vertices of the pentasymmetron (Fig. 2C, D). As aforementioned, capsomers located in the same trisymmetron share the same orientation. The capsomers within a trisymmetron are rotated by 60° compared to the capsomers of its neighboring trisymmetron (Fig. 1A). However, the capsomers located in the same asymmetric unit (ASU, highlighted with yellow dash triangle in Fig. 2C that was enlarged in Fig. 2D) of pentasymmetron do not share the same orientation, as the capsomer at Pd location (P1d, P2d, P3d, P4d, and P5d) is rotated 60° compared to the other five, sharing the same orientation with the capsomers from the neighboring ASU of pentasymmetron (Fig. 2C, D). For example, P1d has the same orientation as P5f. As a result of this rotation, the orientation distribution of the capsomers in the pentasymmetron shows a pattern of five interlocked golf clubs (Fig. 2C) (61). What's more interesting, the capsomer at Pd location also shares the same orientation with those from the nearby trisymmetron, expanding the orientation distribution into a spiral pattern (61). The same spiral pattern was conserved in many icosahedral NCLDVs (61). Based on this observation, the capsid assembly of icosahedral NCLDVs was proposed to initiate from the fivefold vertex and proceed continually through a spiral manner (61). An animation of the spiral assembly pathway (62) has been presented in the same study (61). This hypothesis is consistent with previous observations of virus particle assembly through EM studies in their viral factories (37,51). In these studies, the virus capsid was assembled in a continuous way from its icosahedral fivefold vertex instead of stepwise if the capsomers are preassembled into symmetrons. In addition, patches of the symmetrons have never been observed inside the cells in any studies. Although the hypothesis provides a beautiful model for the initiation of the assembly, a question remains unanswered about how the assembly proceeds from the initial fivefold vertex to the neighboring vertices to complete the whole capsid (61).
The assembly of these capsomers in such a selective orientation behavior can be a result of multiple factors, including the electrostatic interactions among the capsomers (59) as well as their interaction with mCPs. As mentioned above, these mCPs were suggested to play significant roles in stabilizing the neighboring capsomers by forming a hexagonal network underneath the MCPs layer. Here, we focus on the longest mCP, the TmP, that lays out the frame of the network.
The TmP from PBCV-1 is rich in proline and glycine residues, 7.3% and 9.7%, respectively, suggesting that the conformation of this TmP has rigid turns and flexible regions. The cryo-EM reconstruction of PBCV-1 shows that there are in total 60 copies of TmPs, which has a remarkably extended polypeptide structure that is ∼720 Å in length and harbors many bends. These bends are likely due to the richness of proline and glycine, allowing the TmP to flexibly snaking through the gaps of the capsomers. This TmP spans from the near center of one fivefold vertex to the edge of a neighboring pentasymmetron. Two TmPs lie antiparallel on the boundary of two adjacent trisymmetrons linking the two neighboring pentasymmetrons (Fig. 2A). In addition, the length of currently identified TmP genes of PBCV-1 and ASFV, as well as those putative ones in Mimivirus and CroV, are approximately proportional to the physical capsid sizes of correspondent viruses (20). These results support the idea that the length of the TmP will determine the size of the capsid since all icosahedral NCLDVs have the same size of their pentasymmetrons (30 capsomers) and their trisymmetron sizes vary (61).
In general, the TmP from PBCV-1 has an overall extended structure with 17.7% and 0.6% residues in α-helices and β-strand, respectively (Fig. 2E). It is noteworthy that none of these secondary structures is packed into global shapes. There are six short α-helices (5–6 amino acids) and one β strand (3 amino acid) located in the middle regions of the TmP that is relatively straight (Fig. 2E). There are longer α-helices (5–13 amino acids) located at the detectable N- and C-terminal regions of the TmP (Fig. 2E). These helices may stabilize the significant curvature of the TmP when it wraps around certain capsomers (Fig. 2). The very beginning of the N-terminus of TmP was suggested to be disordered as its density could not be detected in the cryo-EM reconstruction. The detectable density of the TmP begins below the center of the capsomer at Pa location, which is adjacent to the penton protein P0 (Fig. 2C, D). The TmP then wraps around capsomer Pb. It is noted that instead of following the boundary of the ASU of pentasymmetron, the TmP bends to wrap around capsomer Pe excluding Pd before it reaches to the trisymmetron. It is interesting that the capsomer Pd is orientated differently from the other five within the same ASU of the pentasymmetron (Fig. 2D). One can speculate that the bending of TmP allows two differently orientated capsomers (Pd and Pe) to pack together. As a result of this bending, the structure of the five TmPs from the same fivefold vertex shows an interesting spiral pattern that is consistent with the capsomer orientation distributions (Fig. 2C). Based on these observations, the TmP likely facilitates the previously proposed spiral assembly of the capsomers.
While the spiral assembly mechanism is consistent with many EM studies on viral assembly inside the cell, this hypothesis did not address the question of how the neighboring vertexes are connected to the initial vertex (61). As an extension to this hypothesis, Figure 2A and B shows a pair of TmPs running between every two vertices. These two TmPs run antiparallel to each other across the icosahedron edges. We speculate that as the assembly of capsomers proceeds from the initial fivefold vertex as presented in the animation (62), the second TmP comes in contact with the assembled capsomers located at the boundary of the pentasymmetrons (Fig. 2B). Then, together with the first TmP, they connect the second fivefold vertexes, allowing the continuous assembly of capsomers to proceed around the neighboring vertices (Supplementary Video S1).
Two Similar but Different Schemes of NCLDVs Assembly
Beside high-resolution structural studies on icosahedral NCLDVs, their morphogenesis was also intensively studied through a combination of various imaging techniques within the host cells (37,42,50,51,73). The viral assembly process of NCLDVs happens in the viral factories where mass reorganization of the host cell cytoskeleton and membranes occurs to coordinate viral genome replication and virion assembly. For example, in the PBCV-1 viral factory, the membrane cisternae serve as the scaffolds for capsid protein assembling into an icosahedral procapsid with the membrane being held inside. This capsid remains incomplete until the genome is packaged through a large portal (37). Similar viral assembly processes were also seen in Mimivirus (42) and ASFV (50), suggesting a general mechanism of their viral membrane formation and capsid assembly.
It is noteworthy that a similar assembly process in the viral factory was also observed in Vaccinia virus (VACV) (13,15,29), the prototypic member of the Poxviridae family. VACV's scaffold protein D13 also possesses a double jelly roll structure that is remarkably similar to that of icosahedral NCLDVs such as PBCV-1 (Fig. 1B–G) (6). However, instead of forming an icosahedral capsid, the assembly of the D13 protein on the exterior of the membrane cisternae results in a spherical procapsid. Deep-etch electron microscopy revealed that the D13 formed a curved honeycomb lattice, which is made mostly of hexagons and some pentagons (53). Inserting pentagonal structure into the flat surface of hexagonal arrays is a common scheme in viruses (11,22). However, no similar arrangement of the capsomers into symmetrons as in icosahedral NCLDVs was observed in VACV. Two other proteins, A14 and A17, were identified to associate with D13, both are anchored in the membrane. Cleavage of the A17 N-terminus results in the disassociation of D13 proteins from the membrane of the spherical immature virions, allowing their transition into mature brick-shaped virions (34).
Basic local alignment search tool (BLAST) (4,41) of two known TmPs from PBCV-1 (A342L) and ASFV (M1249L) produces hits in three main families of NCLDVs: Megaviridae, Asfarviridae, and Phycodnaviridae (Fig. 3), suggesting that the viruses from these three families may also process a similar TmP that plays roles in establishing their icosahedral morphology. These hits are consistent with previous studies that identified TmP in other NCLDVs (20,35). No hit was found in the families of nonicosahedral NCLDVs, including Poxviridae, suggesting poxviruses are unlikely to process TmP-like genes. Based on these observations, we speculate that the absence of the TmP would result in a nonicosahedral viral capsid, supporting our hypothesis that TmPs are essential in assembling the icosahedral capsid.

A taxonomic tree built from the merged BLASTp hits of the two known TmPs. Two BLAST searches using the amino acid sequences of the two known TmPs from PBCV-1 and ASFV were performed against the all available virus genomes with an e-threshold of 0.01. On the left, the merged taxonomic tree of these BLAST hits of the two known TmPs based on their taxonomy ID provided by the International Committee on Taxonomy of Viruses. On the right of the tree are two columns of heatmaps based on the BLAST e-values corresponding to the hits on the tree. The exact e-values are shown in the columns. The color scale is shown on the right. The tree was generated using phyloT tool version 2019.1 (32) using National Center for Biotechnology Information taxonomy. The tree was visualized using iTOL (33). BLAST, basic local alignment search tool; ASFV, African swine fever virus. Color images are available online.
Summary
In this review, we highlighted the role of TmP by thoroughly investigating its structural characteristics, its location within the capsid as well as its interactions with the MCP. We suggest that the TmP plays critical roles in regulating the size of the viral capsid by connecting the neighboring fivefold vertices as well as defining the boundaries of trisymmetrons. The function of TmP extends the previous hypothesis of spiral assembly by providing a mechanism on how the assembly proceeds from the initial vertex to the neighboring ones.
Despite the fact that the D13 protein of VACV also processes a similar double jelly roll fold, the assembly of VACV D13 proteins results in a spherical procapsid instead of the icosahedral one as seen in many other NCLDVs. A BLAST search of the TmPs against virus genomes shows that the poxviruses do not process TmP-like genes, which supports our hypothesis that the TmP is essential for the morphogenesis of icosahedral NCLDVs. Future studies on the assembly mechanisms of NCLDVs will deepen our knowledge about how TmP, as well as other mCPs, facilitates the viral assembly.
Footnotes
Disclaimer
The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Author Disclosure Statement
No competing financial interests exist.
Funding Information
Research reported in this publication was supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number R01GM129525 to C.X. and Z.Y. is supported by the National Institute of Allergy and Infectious Diseases of the National Institutes of Health under Award number R01AI143709.
Supplementary Material
Supplementary Video S1
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
