Abstract
Alternative splicing (AS) is one of the most important ways to enhance the functional diversity of genes. Huge amounts of data have been produced by microarray, expressed sequence tag, and RNA-seq, and plenty of methods have been developed specifically for this task. The most frequently asked questions in previous research were as follows. What is the content rate of AS genes among the whole gene set? How many AS types are presented in the genome, and which type is dominant? How about the conservation ability of AS among different species? Which kinds of isoforms from some genes have the environmental response to help individual adaptation? Based on this background, we collected analysis results from 17 species to try to map out the landscape of AS studies in plants. We have noted the shortages of previous results, and we appeal to all scientists working in the AS field to make a standard protocol so that analyses between different projects are comparable.
Introduction
Alternative splicing (AS) efficiently enhances protein diversity by using limited gene loci in the formation of different isoforms. In the process of AS, particular exons of a gene are retained in the messenger RNA (mRNA), whereas some introns are erased. AS was observed first-hand in 19771,2 and in the following several years, it became one of the most active fields in biological research, especially in the search for the mechanism of AS.3,4 When we used “alternative splicing” as the keyword to search on the web, the growth of research in this area was apparent. For example, in the journal Cell, the total number of articles has doubled every 10 years. Research interests have varied greatly over time; in the early stages, focus was on AS phenomenon of one gene, 5 but researchers now consider AS at the genome level6,7 or even on many diverse conditions compared tempospatially. 8
It is accepted that there are five basic modes of AS9–11: exon skipping, intron-retention, mutually exclusive exons, alternative donor site, and alternative acceptor site (Fig. 1). Exon skipping means that one exon may be spliced out to become an intron in some cases or may be retained in others; it is the most common mode of AS in mammals. 9 Intronretention means that an intron may be either spliced out or unspliced but combines with the flanking two exons to form a new, larger exon instead; intron-retention is the most common mode in plants. 12 Compared with intron-retention and exon skipping, other AS modes are not dominant. Mutually, exclusive exons suggest that of two exons, only one will be displayed in the final mRNA; alternative donor site means that the 3’ boundary of the upstream exon will be changed, whereas alternative acceptor site means that the 5’ boundary of the downstream exon will be changed. Besides these five basic modes, there are several other interesting modes, such as alternative promoters and alternative poly(A). 13 We present in this work a review of genome-level analyses of AS in 17 plants.

Five basic modes of AS (drawing by Illustrator for Biological Sequences [IBS] 59 ).
AS analysis at the genome level is prevalent, since microarray profiling, expressed sequence tag complementary DNA (EST-cDNA) sequence data, and RNA-seq data are easy to obtain. According to our research, AS analyses of 17 species have been released at the genome level: Oryza sativa, Arabidopsis thaliana, Glycine max, Populus euramericana, Eucalyptus grandis, Populus trichocarpa, Physcomitrella patens, Digitalis purpurea, Chlamydomonas reinhardtii, Brachypodium distachyon, Sorghum bicolor, Medicago truncatula, Lotus japonicus, Gossypium raimondii, Zea mays, Vitis vinifera, and Sola-num lycopersicum. For most of these species, reference genomes have been released, which would help in AS analysis. These species include nine dicots, six monocots, one Bryopsida, and one Chlorophyceae; among them, a large number of species are from Poaceae, which indicates that crops are the most studied plants.
Arabidopsis
Arabidopsis has been studied in depth; we have collected five publications relating to AS analysis at the genome level.6,7,14–16 The first technology used in genome-level analysis is ESTs and full-length cDNA. Because of the limited data generated by these two methods, the detected level of AS rate is very low, ~11.6%. 7 In studies analyzing high-throughput RNA-seq data, estimates of the AS rate ranged from a little more than 40%-60%.14–16 In the Arabidopsis genome analyses, scientists also discovered some interesting results in the huge data set. For example, the main AS type in Arabidopsis is intron retention, with a high rate of ~40%, and these retained introns have a great potential to splice out as an intron in the same frame. 16 Also, the alternative 3’ and 5’ splice site may introduce a frameshift mutation, 15 which means that the protein diversity of the same gene is much larger than the previous expectation. Another interesting conclusion was that although the annotation of Arabidopsis was widely used in the published studies and referred to as the gold standard, novel ASs were still detected. This means that the AS rate in Arabidopsis may be much higher than what we expected, and the annotation can be improved.14–16
Rice
Analyses of another model plant, rice, yielded many similar conclusions. Recent genome-level analyses have suggested the AS rate in rice ranges from 33% 17 to 48%. 18 One interesting phenomenon that there are many chimera transcripts was also observed in rice 17 - prevalent in tumor and also known as trans-splicing of RNAs or gene fusion. 19 These transcripts contain the partial sequences from two genes with direct location or opposite location and also have the potential to come from two chromosomes (Fig. 2). It was suggested that AS may contribute to the fusion transcripts. 17

Other Plants
Of other plants for which we found published genome-level AS analyses, most were crops, such as soybean, 8 sorghum, 20 maize, 21 tomato, 22 and cotton. 23 The analyses from these nonmodel plants suggested that the AS events may relate to the developmental stages, with plants at a younger developmental stage possibly having a higher probability to present AS. 8 Also, gene features, transcriptional level, intron length, and exon number will contribute to the AS rate. 8 Analysis of cotton suggested a higher rate of AS events, of which intron-retention was the major type. 23 The transposable elements (TEs) are present at a much higher rate in the retained intron (43%) compared with the rate in other introns (2.9%), and TE insertion-induced mechanisms may play an important role in the birth of new exons. 24 Analyses of more than90 RNA-seq libraries of maize suggested that the majority of genotype-specific AS can be genetically mapped by cis-acting quantitative trait loci. This kind of AS plays an important role in tissue identity and genotypic variation in maize. 21
Lower as Rate in Plants Compared to Animals
With the development of improved technology and tools, as well as the transcriptome depth increase over time, it is admitted that the whole-genome-level analysis of plants in the AS-related field is in the early stage compared with that of human beings or rats, and it is one of the main reasons that the plants have a lower AS rate. However, another interesting hypothesis valuable to consider is that the AS rate is lower in plants than in animals. In 1994, 25 Hughes proposed a hypothesis that after gene duplication, two AS isoforms can be fixed in two copies, which is called functional-sharing in short. Although other hypothesis, eg, duplicability-age (Age-dependent gain of alternative splice forms and biased duplication explain the relation between splicing and duplication), tries to challenge it, it is rebut by Su and Gu (Revisit on the evolutionary relationship between AS and gene duplication). Thus, as 50%-80% of angiosperms are polyploids (genome evolution in polyploids) and A. thaliana and rice are both ancient polyploids (Analysis of the genome sequence of the flowering plant A. Thaliana. Duplication and DNA segmental loss in the rice genome implications for diploidization), we can logically expect plant genes to have lower AS rate.
Comparative Analysis in Plants
Comparative analyses of AS patterns in multiple plants are prevalent, including analysis of pairs such as rice and Arabidopsis,12,26,27 Populus and Eucalyptus, 28 Brassica and Arabidopsis, 29 Populus and Arabidopsis, 30 and tomato and Arabidopsis. 31 The analysis between Populus and Arabidopsis indicates that the isochorismate synthase gene in Populus has extensive AS, while it is rare in Arabidopsis. 30 Through analysis of AS events in developing xylem of two trees, Xu et al reveal that the woodforming tissues of Populus and Eucalyptus have one-third of their genes related to AS, and among these AS genes, ~42% of AS events result in changes to the original reading frame and about one-third cause protein domain modification. 28
Variable Rate of as Events in Plants
From analysis of microarray profiling, EST-cDNA sequence data, and RNA-seq data, it has been estimated that more than 95% of human genes and 60% of Drosophila multiexon genes are alternatively spliced.9,32 In plants, this content rate is much lower; 61% of intron-containing genes in Arabidopsis undergo AS and 48% of genes in rice are affected by AS.16,18 More than 63% of soybean multiexon genes undergo AS. 8 The AS ratio data in plants and animals are provided in Table 1. Recent studies in plants have found AS rates of 50%-60%.14,16,18 The most common type of AS in Drosophila and humans is exon skipping, which occurs when an exon is spliced out along with its flanking introns (Fig. 1), accounting for ~40% of AS events. 13 Even though the AS rates are lower in plants, they have a higher percentage of intron-retention AS, where an intron remains in the mature RNA transcript as part of the exon (Fig. 1); intronretention accounts for ~45.1%-55% of the AS events in rice and 30%-64.1% of those in Arabidopsis (Table 1) but occurs at a much lower frequency in Drosophila and humans, with rates ~5%-15% (Table 1). Unlike the model organisms, other plants have a much lower AS ratio, eg, the AS ratios in S. lycopersicum, L. Japonicus, and M. truncatula are less than 10%.22,31,33
The AS rate and the dominant AS types in species (updated from Zhang et al. 44 ).
Conservation Ability of AS in Plants
The evolutionary conservation of plant AS has been studied across many species, and the results indicate that the conservation ratio is not high. Genome-wide analysis has identified that 56 out of 380 AS events (14.7%) are conserved between Arabidopsis and rice and 49 out of 298 AS events (16.4%) are conserved between rice and maize. 27 In closely related species of Brassica and Arabidopsis, 537 out of 9878 AS events (5.4%) were reported as conserved events, and the results indicated that intron-retention and exon skipping were underrepresented. 29 In L. japonicus, 22 out of 115 AS events (19.1%) were conserved AS events between two or more legume species 33 and 71 out of 716 orthologous groups (9.9%) have conserved AS events between Populus and Eucalyptus. 28 AS between paralogous genes in two independently synthesized allotetraploid Brassica napus lines showed parallel loss of AS events after polyploidy, and 26%-30% of genes showed changes in AS compared with the parents. 34
Call AS Junctions from RNA Sequencing Data
The AS data now available, even for the human genome, are not enough to support a perfect or complete annotation for genes at the genome-wide level through the whole life cycle. The reality is that plenty of RNA sequencing data are produced and sleeping in the NCBI's SRA database. The number of RNA sequencing data sets in NCBI is now ~1,70,000. The most prevalent organisms are Homo sapiens and Mus musculus, with more than 40,000 total data sets. The most represented data sets in plants, A. thaliana, Z. Mays, and O. sativa, number far less than for animals, ranging from 887 to 3646. These data may partially account for why estimates of the AS content are as high as 95% in humans, whereas they are only ~60% in plants. Additionally, a large number of RNA sequencing data were not used to call the AS junction.
Useful Tools for as Analysis in Plants
Most software for transcriptome analysis, such as Cufflinks, could be used in AS analysis. However, the so-called gold standard for isoform annotations is far from impeccable. Even in such exhaustive annotations as human and mouse, the new gene annotations are sometimes inconsistent. 35 Previous analysis used eight popular publicly available software packages to reveal that, even using the exact same data, the detected isoforms using different tools can be distinctly variable. 36 These results have indicated that no significant evidence supports the notion that any one tool has the best performance, and the overlap of detected isoforms from different tools is low. In light of this, we suggest people use as many tools as they can and transfer each result into tracks that are suitable for display in a genome browser (eg, GBrowse) to provide convenient access and use for others.
Distinctive as Analysis in Plants
Unlike animals, plants cannot move, run, or hide, so they need to put up with some difficult environments, such as the dry weather, floods and waterlog, salts, and many kinds of passive damage. Environmental stresses will lead to specific changes in the AS analysis of plants.
As Patterns under Stress
The patterns of AS in plants are not only variable along the life cycle and in different tissues but also subject to variation in diverse conditions, such as salt stress, cold stress, strong light, and diseases. High-coverage RNA-seq data of seedlings treated with different concentrations of NaCl indicated that AS increases significantly under salt stress compared with unstressed conditions; most differential AS genes may not be regulated by salt stress but associated with specific functional pathways linked to stress responses and RNA splicing, like serine/arginine-rich splicing factors. 37 Analysis of AS under cold conditions indicated that the process of cold regulation may introduce a premature termination codon, and the resulting transcripts could be potential targets for degradation by the nonsense-mediated mRNA decay process, thus protecting the normal transcripts. 38 Analysis based on P. patens indicated that AS was related to light exposure and that during photomorphogenesis, light regulated AS, with intron-retention occurring preferentially in transcripts involved in photosynthesis and translation. 39 Another analysis based on P. patens indicated that heat shock treatment also leads to novel AS events, and ~50% of genes are alternatively spliced. 40 Analysis of B. distachyon with the Panicum mosaic virus and its satellite virus reveals that ~30% more transcripts were detected, and the infected plant was enriched in defense-related genes, such as Nucleotide-binding site, Leucine-rich repeat (NB-LRR)-resistance proteins. 41 In addition, in a more specific group of genes, such as clock genes, the analysis suggested that the structure of ELF3 (Early Flowering) gene and ZTL (Zeitlupe) gene varied significantly during photoperiod, temperature, and salt stress. 42 Extensive AS in clock genes including LHY (Late Elongated Hypocoyl) pseudo-response regulator 7. 43
Intron-Retention May Contribute to Very Complex Gene Structures in Plants
Since intron-retention is a dominant mode of AS in the majority of plants, our recent research has focused on the structural evolution related to AS. We have found that it is possible for intron-retention isoforms from the ancestor genes coupled with RNA to reverse to cDNA and then be recruited into the genome and become a new gene. When compared with the parental gene, the new gene has lost some introns and also kept some introns, thus making the gene structure look very complex among these homologs. In the new gene, the lost intron was due to the RNA splicing, whereas the retained intron was kept in the RNA due to the intron-retention. This hypothesis was proved using 25 pairs of such complex genes to show that the ratio of complex gene pairs in plants is much higher than that in animals. 44 This hypothesis could help to explain the recurrent loss of introns in orthologous gene families in plants. 45
Conclusion and Future Perspectives
Shortage of Current Studies in as Analysis
Although a lot of AS analyses were conducted in the last few decades and hundreds of thousands of RNA-seq data were produced, we are still far away from fullness of knowledge. The results of different comparisons among species are biased because the AS was affected by the depth of coverage, the stage of the sample, the environment of time, and the tools used. Since these elements have too much variability, the results may lead to wrong conclusions. Besides, the conservation ratios of different analyses now are not comparable, because of the unclear methods. We propose the following method for measuring AS conservation: (1) find the orthologous gene pairs between two species (eg, L pair); (2) find the AS events among these gene pairs, respectively (eg, Ma and Mb); and (3) find the perfect CDS match pairs with no gaps and no extension (eg, N); thus, the conservation ratio can be presented as N/L or 2N/(Ma + Mb). To compare different groups, dividing the divergence time of two species would be a choice.
AS in the Developmental Stage with the Single Cell
In recent years, more and more single cell RNA sequencing analyses have been done, opening a new chapter of AS research focused on more specific targets. Zhang et al profiled the mRNA in 10 types of cells, including five major cell types of the differentiating endosperm, and in the embryo and four maternal compartments of the maize kernel. 46 The results suggest that even in the small seed, the mRNA population has diverged gene expression profiles. 46 Adrian et al quantified the gene expression profile using RNA sequencing data of stomata lineage cells marked by fluorescence and showed the regulatory modules that have different developmental decisions compared to previous expectation. 47
This review has discussed AS analysis in the plant field as well as the tools used in such research. Although plenty of analyses of AS in plants have been presented and a huge quantity of data produced, it is necessary to think about the most efficient way to display these data. An AS database, eg, Alternative Splicing in Plants (ASIP), 12 and genome browser tracks that are open source and easy to download may be the best choice.
Also, since AS was affected by many factors, such as time, stress, and tissues, a professional standard may also be essential in the future. In this standard, the species name, the sampling time, the environment, the tissues, the analysis tools, and the reference genome should be recorded as indispensable.
Most importantly, in the plant field, more appropriate tools or suitable pipelines need to be developed according to gene structure characters.
Author Contributions
Hong Yang is responsible for data collection and writing. Huizhao Yang and Chengjun Zhang are responsible for writing. All authors reviewed and approved the final manuscript.
