Multiscale Feedback Loops in SARS-CoV-2 Viral Evolution

Abstract

COVID-19 is an infectious disease caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The viral genome is considered to be relatively stable and the mutations that have been observed and reported thus far are mainly focused on the coding region. This article provides evidence that macrolevel pandemic dynamics, such as social distancing, modulate the genomic evolution of SARS-CoV-2. This view complements the prevalent paradigm that microlevel observables control macrolevel parameters such as death rates and infection patterns. First, we observe differences in mutational signals for geospatially separated populations such as the prevalence of A23404G in CA versus NY and WA. We show that the feedback between macrolevel dynamics and the viral population can be captured employing a transfer entropy framework. Second, we observe complex interactions within mutational clades. Namely, when C14408T first appeared in the viral population, the frequency of A23404G spiked in the subsequent week. Third, we identify a noncoding mutation, G29540A, within the segment between the coding gene of the N protein and the ORF10 gene, which is largely confined to NY ( $>$ 95%). These observations indicate that macrolevel sociobehavioral measures have an impact on the viral genomics and may be useful for the dashboard-like tracking of its evolution. Finally, despite the fact that SARS-CoV-2 is a genetically robust organism, our findings suggest that we are dealing with a high degree of adaptability. Owing to its ample spread, mutations of unusual form are observed and a high complexity of mutational interaction is exhibited.

1. Introduction

COVID-19 is an infectious disease caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) (Hui et al., 2020). It was first identified in December 2019 in Wuhan, China, and has since been spreading rapidly across the globe reaching pandemic levels (Ghebreyesus, 2020; Li et al., 2020).

SARS-CoV-2 is highly transmissible, estimates for its basic reproductive ratio, R₀, vary widely, but commonly range between $2.2$ and $3.9$ (Lv et al., 2020) versus the R₀ for seasonal flu being between $1.4$ and $1.6$ (Epstein et al., 2007). SARS-CoV-2 has a prolonged incubation period, which can be as high as 14 days (Lauer et al., 2020), during which an infected person can present asymptomatic while still being able to infect others. Finally, SARS-CoV-2 has a high mortality rate, with estimated deaths per confirmed cases varying within the range of 0.5 $%$ –15 $%$ (Zhou et al., 2020). The combination of these features poses significant challenges to public health responses concerning this pandemic.

After the first genome sequence from Wuhan was isolated, the National Center for Biotechnology Information (NCBI) and global initiative on sharing all influenza data (GISAID) databases began collecting full genome data worldwide (Elbe and Buckland-Merrett, 2017; Hatcher et al., 2017; Shu and McCauley, 2017). The collected viral genomes were annotated using sequence alignment, and tagged with their corresponding geographical and temporal metadata. Phylogenetic relations between the novel SARS-CoV-2 viral genome and other types of coronaviruses were reported in Andersen et al. (2020) and Lu et al. (2020), which helped to identify the origin of this virus (Xie and Chen, 2020). Its genomic architecture was described in Astuti and Ysrafil (2020) and Kim et al. (2020), enabling efficient genetic annotation. Although significant mutations on the viral genome were observed (Yin, 2020), the genome itself is considered stable (Jia et al., 2020), with current mutational analysis being mostly focused on the coding segments.

A23404G is one such significant mutation that leads to a D–G amino acid change at the 614th position in the Spike protein. It is responsible for an increase in the severity of infection across Europe (Korber et al., 2020). In silico structural modeling suggests that the amino acid change can improve binding to the ACE2 receptor (Song et al., 2018; Xu et al., 2020) and laboratory testing suggests that G23404 significantly increases the viral load in patients when compared with A23404 (Korber et al., 2020), with this in turn affecting the transmissibility of the virus. This mutation started dominating the viral population in almost every geographical location since the end of March 2020.

We provide evidence that despite the fact that SARS-CoV-2 is a genetically robust organism (Jia et al., 2020), the mutational signature of SARS-CoV-2 is geospatially differentiated and that we are dealing with a virus exhibiting a high degree of adaptability. The high transmission rate of this virus facilitates an ample spread, which in turn allows for a wide range of mutations and a high complexity of mutational interactions. Specifically, this article provides evidence that macrolevel pandemic dynamics, such as social distancing, steer the genomic evolution of the SARS-CoV-2 population. This view complements the prevalent paradigm that microlevel observables control macrolevel parameters.

We organize our analysis as follows: first, due to differences in culture, policy, and the severity of infections, as the pandemic progresses macrolevel dynamics vary with geographical location, resulting in distinct selective pressures. This observation manifests in distinct mutational signatures of the virus population for different geospatial blocks. Indeed, we observe differences in mutational signals for geospatially separated populations such as the prevalence of A23404G in CA versus NY and WA, suggesting a feedback loop connecting sociobehavioral patterns with mutational signatures. We show that the feedback between macrolevel dynamics and viral population can be captured employing a transfer entropy (TE) framework.

Second, we establish the aforementioned complexity of mutational interactions by having a closer look at the mutation C14408T and its connection with A23404G. Our analysis indicates that the interaction of these two mutations among others is responsible for an increase in viral fitness.

Third, we illustrate the high degree of adaptability the virus possesses by identifying a noncoding mutation, G29540A, which is overwhelmingly localized in NY ( $> 95 %$ ). Such adaptability is further underscored by reports of patients in Jilin and Heilongjiang China that take longer to develop symptoms and to test negative after an infection event (Tu et al., 2020). This extended incubation period seems to be a result of the pressure generated by the drastic social distancing rules imposed in China.

All these findings suggest that macrolevel sociobehavioral measures have direct impact on viral genomics, with TE emerging as a well-suited framework for analyzing the resulting feedback loops.

To facilitate an in-depth analysis, we develop an analysis platform that monitors mutations in the SARS-CoV-2 genome at different geographical and temporal resolutions, with interactions among multiple mutations being of particular interest. We believe that it is of crucial importance to identify such feedback loops to formulate effective responses to the COVID-19 crisis.

2. Methods

2.1. SARS-Cov-2 full genome data and significant mutations

We collected full SARS-CoV-2 genomic sequences from the NCBI and GISAID databases (Elbe and Buckland-Merrett, 2017; Hatcher et al., 2017; Shu and McCauley, 2017), covering isolates gathered starting from December 24, 2019 up to May 11, 2020. The data were differentiated by geographical and temporal resolutions. All sequences were aligned to the reference sequence (NCBI ID: NC_045512) to obtain genomic annotation.

2.2. Macrolevel mobility-derived data

The workplace mobility indices used in the TE calculations have been collected from the publicly available Google mobility-derived data sets (Aktay et al., 2020) for the states of CA and NY. The Google workplace mobility indices reflect how many LH (location history) users spent >1 hour at their places of work for each day and each geographical (state) area. The counts are aggregated by places of residence of LH users with the metrics being protected by differential privacy. To fit our available microscale series, mobility-derived data were considered over the following time intervals: for CA 02/29/2020-05/01/2020 and for NY 02/29/2020-04/24/2020, see Supplementary Data for any details on the data sets.

2.3. TE estimation and statistical tests

We employed TE measures, rooted in information theory (Schreiber, 2000) to quantify the causal flow from the mobility-derived data to the mutational composition of the viral population. TE is a nonparametric statistic that measures the amount of directed (time-asymmetric) transfer of information between two random processes. TE calculations employ portions of the histories of the two times series and as such can naturally account for the time delay between the series studied. Given two time series I and J, the TE from J to I quantifies the degree to which knowledge about the history of J allows one to predict future values of I. The quantity we compute is based on the Rényi entropy that introduces a weight parameter $q > 0$ for the individual probabilities $p (j)$ in the J series. For a given series J and a fixed q, this entropy is given by $H_{J}^{q} = \frac{1}{1 - q} l o g (\sum_{j} p^{q} (j))$ (Schreiber, 2000). Rényi entropy represents a flexible tool for estimating uncertainty, since the parameter q allows one to emphasize distinct areas of the distribution. The formula for the TE is given by $T_{J \to I} (k, l) = \frac{1}{1 - q} l o g (\frac{\sum_{i} ϕ_{q} (i_{t}^{(k)}) p^{q} (i_{t + 1} | i_{t}^{(k)})}{\sum_{i, j} ϕ_{q} (i_{t}^{(k)}, j_{t}^{(l)}) p^{q} (i_{t + 1} | i_{t}^{(k)}, j_{t}^{(l)})}),$

where $ϕ_{q} (j) = \frac{p^{q} (j)}{\sum_{j} p^{q} (j)}$ is the escort distribution (Beck and Schögl, 1995; Jizba et al., 2012), $i_{t}^{(k)} = {i_{t}, \dots, i_{t - k + 1}}$ and $j_{t}^{(l)} = {j_{t}, \dots, j_{t - l + 1}}$ are past states, and I and J can be approximated by kth- and lth-order Markov processes, respectively, such that I depends on its k previous values and J depends on its l previous values. In the literature, k and l are also known as the embedding dimensions (Schreiber, 2000).

In our analysis, the TE is estimated using the RTransferEntropy package (Behrendt et al., 2019). By construction, the aforementioned TE formula requires discrete time series as input. As our microscale data are mutational frequencies, at each point in time, we need to retrieve values from a continuous distribution. As a result, a discretization scheme is employed for each of the macro- and micro time series. For each series its values are distributed into a three bin triplet based on the $33 %$ and $66 %$ quantiles. The q parameter was set to $q = 0.1$ while $l = k = 1$ .

Effective TE is a means of reducing self-correlation within each time series. The RTransferEntropy package accomplishes this by subtracting from the base TE the mean TE from a shuffle of the series to the series itself, obtained by averaging >100 shuffles. Namely, $E T_{J \to I} (k, l) = T_{J \to I} (k, l) - T_{J_{s h u f f l e d} \to I}^{a v g} (k, l) .$

The p-value and standard error for the TE calculation is computed by the RTransferEntropy package in the following manner: for a $(J, I)$ time series pair, a set of 2000 pairs are sampled using a Markov block bootstrapping technique (Dimpfl and Peter, 2013) with a burn parameter of 50 under the assumption that both J and I are Markov chains of order 1. In contrast to the shuffling technique used to compute the Effective TE, bootstrapping preserves the dependencies within the variables J and I, but eliminates the statistical dependencies between them. The TE for each bootstrapped pair sampled is then calculated and the TE for the $(J, I)$ pair is then compared with the resulting TE value distribution. From this, the p-value and standard errors are then deduced w.r.t. this distribution.

3. Results

3.1. Geospatially differentiated mutational signatures

As the pandemic progresses, since macrolevel dynamics varies with geographical location, distinct selective pressures are being exerted. We provide evidence that these pressures modulate the mutational landscape and lead to distinct geospatially separated mutational signatures within the viral population, under the assumption that the isolates investigated form an adequate representation of the distribution of viral genomes across the pandemic.

To this end we first investigate the relative frequency of G23404 in the viral population in different locations and monitor its relative frequency changes over time. The relative frequency is computed within a 14-day sliding window. For such a fixed time window and location, we consider all sequences that are collected at that location and we compute the relative frequency of G23404 within that data set.

In the United States, infections were initiated and established across the country, by genomes of the SARS-CoV-2 virus exhibiting the original A23404 form. In early March, the G23404 variant was introduced to the U.S. territories, and by the end of March it dominated >60 $%$ of the viral population, see Figure 1. Since then, its pace has slowed considerably, taking G23404 another month to grow from 60 $%$ to 80 $%$ . We also observe a slight decrease in G23404's relative frequency around the beginning of May. This observation is counterintuitive: assuming G23404 is more advantageous, it should increase its prevalence at an increasing rate as it begins to dominate the viral population instead of slowing down (Forst et al., 1995). One potential explanation of this phenomena is the imposition of social distancing and travel restrictions: in the course of March, various policies to increase social distance had been established. These policies greatly reduced host population mobility and thus slowed down the viral population dynamics.

FIG. 1.

Relative frequency of G23404 in the United States: the y-axis represents the G23404 frequency in the SARS-CoV-2 data collected in the United States within a 14-day time window. The x-axis indicates the last day of the time window. The shaded area denotes a $95 %$ Wilson binomial CI. CI, confidence interval; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2.

If we increase the resolution to the state level, we observe the aforementioned distinct mutational signatures, see Figure 2. The observed patterns reflect the impact of the social-behavioral dynamics on the mutational composition within the viral population. For instance, the frequency of G23404 for WA shows a similar trend to that of the U.S. data: A23404 dominates, then G23404 takes over the viral population. In NY, however, the frequency is distinctively different. At the beginning of March, G23404 already accounts for 80 $%$ of the NY viral population. This could be a result of the founder effect (Mayr, 1999), as the initial infections in NY were related to European travel, and in Europe G23404 was already dominant at that time. It took G23404 almost 2 months to entirely take over the NY viral population. Compared with the fact that G23404 grew from 60 $%$ to 80 $%$ in 1 month in the U.S. data, its behavior in the NY data indicates it was significantly slower. CA exhibits an even more interesting phenomenon: from the beginning to the end of March, the relative frequency of G23404 went from 0 $%$ to 60 $%$ , similar to the national trend. Then, the frequency decreased within the next 14 days, to <20 $%$ , after which it increased again. This could suggest that the viral population responds rapidly to distinguished macrolevel dynamics. We speculate that, for specific sociobehavioral induced pressures, A23404 is more advantageous compared with G23404.

FIG. 2.

The relative frequency of G23404 in WA (green), NY (blue), and CA (red): the y-axis represents the G23404 frequency in the SARS-CoV-2 sequences collected within the 14-day window. The x-axis indicates the last day of the time window. The shaded area denotes a $95 %$ Wilson binomial CI.

3.2. Quantifying the effects of policies on the viral population

To quantify the impact of sociobehavioral patterns on the composition of the viral population, we employ TE measurements between sociobehavioral and mutational composition data for CA and NY, respectively. TE is a nonparametric statistical measure that quantifies the amount of directed (time-asymmetric) transfer of information between two random processes (Schreiber, 2000). For this purpose we recruit Google mobility-derived data sets (Aktay et al., 2020), in the form of a day-to-day time series of workplace mobility scores for CA and NY, respectively. We compute the directed Rényi TE ( $q = 0.1$ ) as well as the effective Rényi TE from the time series of mobility-derived data to the series of mutational compositions of each state. Our findings are displayed in Table 1. Assuming the two of time series being Markovian, we can report a positive statistically significant information flow from the mobility-derived data to the mutational composition data. The significance of the TE calculation, captured by the p-value, is computed by testing against a null hypothesis. In this study, we shall assume that the two time series are given by independent Markov processes. The p-value represents the probability that under the aforementioned independence assumption, we observe a more extreme TE value than the one computed.

Table 1.

Macro to Micro Causal Flow for CA and NY

State	TE	Effective TE	SE	p	Sig
CA	0.5305	0.1456	0.1493	0.0910	^*
NY	0.5526	0.1079	0.1358	0.0170	^**

From left to right: columns represent the states considered, the macro to micro TE values, the effective macro to micro TE, the SE, and the p-values. The last column indicates the TE significance: $p < 0.05$ “^**”, $p < 0.1$ “^*”.

SE, standard error; TE, transfer entropy.

Our TE analysis indicates that the observed viral population of CA and NY are causally linked to their respective state policies that govern social contact and workplace mobility in each of the two states.

3.3. Interactions of A23404G with other mutations

In this study, we address the complexity of mutational interactions by studying mutations co-occurring with A23404G. We observe that A23404G is often accompanied by three other mutations: C241T, C3037T, and C14408T (Yin, 2020). C241T is a mutation in the $5^{'}$ -untranslated region (UTR) of the SARS-CoV-2 genome. The mutation is in the hairpin loop close to the $3^{'}$ -end tail of the $5^{'}$ -UTR sequence and does not change the secondary structure of the $5^{'}$ -UTR. This implies that this mutation could impact target binding efficacy. C3037T is a synonymous mutation changing the TTC codon to TTT, coding the same amino acid (F) on the nsp3 protein. The C14408T mutation is a nonsynonymous mutation changing the codon CCT to CTT, leading to an amino acid change from P to L on the RdRp protein (Yang and Leibowitz, 2015; Astuti and Ysrafil, 2020; Kim et al., 2020).

The earliest A23404G mutation in Europe was identified in Germany (EPI_ISL_406862, sampled January 28, 2020), and was accompanied by C241T, C3037T, but not by the mutation at 14408. For European sequences with specific collection dates, C14408T was not observed until 2/20/2020. During this time period, C241T, C3037T, and A23404G always co-occurred. In the viral population, the A23404 and G23404 forms were cocirculating, with A23404 being more common while the relative frequency of G23404 oscillated around 20%, see Figure 3. One sequence sampled on February 20, 2020 in Italy (EPI_ISL_412973) was the earliest data point that exhibited all four mutations in Europe. After that, the frequency of all four mutations grew abruptly to >60% in the span of a week. This observation suggests that the increase in fitness might not depend solely on the mutation A23404G, and instead, might be caused by the presence of interactions within a mutational clade. We suspect that these sites, taken together, are involved in certain biochemical functions that constrain them to coevolve.

FIG. 3.

The relative frequency of T241 (blue), T3037 (yellow), T14408 (green), and of G23404 (red) in Europe: the y-axis represents the relative frequency of the mutation in the SARS-CoV-2 sequences collected within the 14-day window. The x-axis indicates the last day of the time window.

3.4. The mutation G29540A

We illustrate the high degree of adaptability this virus possesses by highlighting the noncoding mutation G29540A, occurring within the segment between the coding gene of the N protein and the ORF10 gene (Ryder, 2020). This noncoding segment is 24 nts long and is believed to regulate viral ncRNA production (Kim et al., 2020). Although this mutation does not yet dominate the viral population, this viral strain is mostly localized within NY with > $95 %$ of the total variant strain being present there. As New York exhibits a comparatively large viral population with respect to the rest of the United States, this further supports the notion that sociobehavioral measures affect the viral population.

4. Discussion

Our analysis provides evidence that studying geospatially resolved SARS-CoV-2 populations allows one to assess how sociobehavioral patterns affect the virus, that is, to what extent social distancing measures represent an effective tool to combat the pandemic.

The evolution of SARS-CoV-2 populations shows that although maintaining a high degree of genetic robustness the virus is nevertheless highly adaptable, in both coding and noncoding regions.

We illustrated the existence of complex interactions of mutations and provided an instance of mutations emerging in viral noncoding segments, all of which pose challenging problems for an analysis of what governs viral evolution. We believe that many more mutational signals within the viral population will be identified once adequate analysis tools are put in place. The identification of these signals will be instrumental in understanding how SARS-CoV-2 evolves.

Unfortunately, data on the SARS-CoV-2 genomic populations and sociobehavioral patterns are too sparse and potentially biased to allow for definitive conclusions. However, given the available data, the distinct evolutionary patterns exhibited across different geographical locations and timescales, suggest the existence of macro–micro feedback loops within the dynamics of the COVID-19 pandemic. Differences in environment provide distinctive selection pressures, leading to mutations that form nonlinear interaction networks in the viral population.

Macroscale parameters of the COVID-19 pandemic, such as number of infections, deaths, and human encounters, are closely monitored by researchers and policy makers alike and are naturally representable by means of time series. This representation equally applies to the mutational composition of viral populations. Accordingly, any feedback loops manifest as relational patterns between two such time series.

Such patterns are not a result of instantaneous interaction, since there is a time delay between infection and viral genome collection. Additional difficulty arises from the fact that for each infection-collection event pair, the time delay actually varies. As a result, the relational patterns are not one to one. The mutational composition of the viral population in a single day is affected by the sociobehavioral dynamics of multiple previous days. In light of this, standard correlation methods might not be adequate in identifying such patterns. By construction, the two time series exhibit nonlinearity, a feature that is difficult to address through conventional correlation methods.

Our TE analysis suggests that the viral population of CA and NY is causally linked to their respective state policies that in turn govern social contact and workplace mobility. The TE analysis produces significant p-values; however, we remark that the quality of the analysis is affected by sparseness of data, sampling biases, and by the assumption that the underlying processes are Markov like. In particular, the number of collected sequences is relatively low while the collection time interval (January 2020–May 2020) is short. Such deficiencies have been shown to render statistical inferences, as for instance the identification of phylogenetic networks, unreliable (Mavian et al., 2020). The fact that the number of sequences restricted to a specific geographical location is limited, necessitates a certain aggregation of data over multiple days. Such aggregates have to be carefully designed as they could, at some point, become incompatible with the feedback loops we wish to identify. To better combat the pandemic, extended study on this information theoretic framework is called for. A balance for the temporal and geographical resolutions has to be struck to be able to conduct a meaningful TE analysis.

In upcoming study, we remedy the lack of data on COVID-19 by employing large-scale computer simulations. Synthetic agents, populations, and networks provide a natural data structure that facilitates the forecasting, planning, and intervention modeling in complex social systems. Synthetic populations have been successfully applied in infrastructure modeling, computational epidemiology, and disaster response (Eubank et al., 2004; Parikh et al., 2013; Marathe et al., 2014). By incorporating strain competition in viral infection, the synthetic populations can provide us simulated time series of macrolevel sociobehavioral dynamics and microlevel viral population compositions. This would enable a more comprehensive analysis of the macro–micro feedback loop.

In our analysis, the embedding dimensions of both, the macro- and micro time series, are set to one. This is arguably not optimal for identifying the macro–micro relational pattern, as the mutational composition of the viral population in a single day is affected by the sociobehavioral dynamics of multiple days. In the presence of sufficient data, however, one can apply methods introduced in Ragwitz and Kantz (2002) to determine the optimal embedding dimensions. The TE calculation can then be used to study the time delay between the coupled processes. In fact, the optimal time delay can be estimated using a scanning approach (Wibral et al., 2013).

Our TE analysis focuses on the impact the macrolevel sociobehavioral measures have on the microlevel mutational composition of the viral population. Similar analysis can be conducted to investigate the reverse: the information flow from the microscale to the macroscale. The combination of both analyses allows one to discern the full picture of the complex macro–micro coupling, providing deeper insights and aiding in combating the COVID-19 pandemic.

Footnotes

Acknowledgments

We gratefully acknowledge the discussions with Stephen Eubank, Henning Mortveit, Ricky Chen, and Neelav Dutta.

Author Disclosure Statement

The authors declare they have no conflicting financial interests.

Funding Information

The authors received no financial support for the research, authorship, and/or publication of this article.

Supplementary Material

References

Aktay

, Bavadekar

, Cossoul

, et al. 2020. Google CIVID-19 community mobility reports: Anonymization process description (version 1.0). arXiv preprint arXiv:2004.04145.

Andersen

K.G.

, Rambaut

, Lipkin

W.I.

, et al. 2020. The proximal origin of SARS-CoV-2. Nat. Med. 26, 450–452.

Astuti

, and Ysrafil.

2020. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2): An overview of viral structure and host response. Diabetes Metab. Syndr. 14, 407–412.

Beck

, and Schögl

1995. Thermodynamics of Chaotic Systems: An Introduction. Number 4. Cambridge University Press, New York, NY.

Behrendt

, Dimpfl

, Peter

F.J.

, et al. 2019. Rtransferentropy quantifying information flow between different time series using effective transfer entropy. SoftwareX,, 10, 100265.

Dimpfl

, and Peter

F.J.

2013. Using transfer entropy to measure information flows between financial markets. Stud. Nonlinear Dyn. Econ. 17, 85–102.

Elbe

, and Buckland-Merrett

2017. Data, disease and diplomacy: GISAID's innovative contribution to global health. Global Challenges, 11, 33–46.

Epstein

J.M.

, Goedecke

D. M.

, Yu

, et al. 2007. Controlling pandemic flu: The value of international air travel restrictions. PLoS One, 2, e401.

Eubank

, Guclu

, Kumar

V.A.

, et al. 2004. Modelling disease outbreaks in realistic urban social networks. Nature, 429, 180–184.

10.

Forst

C.V.

, Reidys

, and Weber

1995. Evolutionary dynamics and optimization. European Conference on Artificial Life. Springer, 128–147.

11.

Ghebreyesus

T.A.

2020. WHO Director-Generals opening remarks at the media briefing on COVID-19-11 March 2020. World Health Organization, 11.

12.

Hatcher

E.L.

, Zhdanov

S.A.

, Bao

, et al. 2017. Virus variation resource–improved response to emergent viral outbreaks. Nucleic Acids Res. 45(D1), D482–D490.

13.

Hui

, Azhar

E.I.

, Madani

, et al. 2020. The continuing 2019-nCoV epidemic threat of novel coronaviruses to global health: The latest 2019 novel coronavirus outbreak in Wuhan, China. Int. J. Infect. Dis. 91, 264–266.

14.

Jia

, Shen

, Zhang

, et al. 2020. Analysis of the mutation dynamics of SARS-CoV-2 reveals the spread history and emergence of RBD mutant with lower ACE2 binding affinity. BioRxiv; DOI: 10.1101/2020.04.09.034942.

15.

Jizba

, Kleinert

, and Shefaat

2012. Rényis information transfer between financial time series. Physica A, 391, 2971–2989.

16.

Kim

, Lee

J.Y.

, Yang

J.S.

, et al. 2020. The architecture of SARS-CoV-2 transcriptome. Cell 181, 914.e10–921.e10.

17.

Korber

, Fischer

, Gnanakaran

S.G.

, et al. 2020. Spike mutation pipeline reveals the emergence of a more transmissible form of SARS-CoV-2. bioRxiv; DOI: 10.1101/2020.04.29.069054.

18.

Lauer

S.A.

, Grantz

K.H.

, Bi

, et al. 2020. The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: Estimation and application. Ann. Intern. Med. 172, 577–582.

19.

, Guan

, Wu

, et al. 2020. Early transmission dynamics in Wuhan, China, of novel coronavirus–infected pneumonia. N. Engl. J. Med. 382, 1199–1207.

20.

, Zhao

, Li

, et al. 2020. Genomic characterisation and epidemiology of 2019 novel coronavirus: Implications for virus origins and receptor binding. Lancet, 395, 565–574.

21.

, Luo

, Estill

, et al. 2020. Coronavirus disease (COVID-19): A scoping review. Eurosurveillance, 25, 2000125.

22.

Marathe

M.V.

, Mortveit

H.S.

, Parikh

, et al. 2014. Prescriptive analytics using synthetic information, 1–19. In Hsu, W.H. (ed): Emerging Methods in Predictive Analytics: Risk Management and Decision-Making. IGI Global, Hershey, Pennsylvania, USA.

23.

Mavian

, Pond

S.K.

, Marini

, et al. 2020. Sampling bias and incorrect rooting make phylogenetic network tracing of SARS-CoV-2 infections unreliable. Proc. Natl. Acad. Sci. USA, 117, 12522–12523.

24.

Mayr

1999. Systematics and the Origin of Species, from the Viewpoint of a Zoologist. Harvard University Press, Cambridge, MA.

25.

Parikh

, Youssef

, Swarup

, and Eubank

2013. Modeling the effect of transient populations on epidemics in Washington DC. Sci. Rep. 3, 1–10.

26.

Ragwitz

, and Kantz

2002. Markov models from data by simple nonlinear time series predictors in delay embedding spaces. Phys. Rev. E. 65, 056201.

27.

Ryder

S.P.

2020. Analysis of rapidly emerging variants in structured regions of the SARS-CoV-2 genome. bioRxiv; DOI: 10.1101/2020.05.27.120105.

28.

Schreiber

2000. Measuring information transfer. Phys. Rev. Lett. 85, 461–464.

29.

Shu

, and McCauley

2017. GISAID: Global initiative on sharing all influenza data–from vision to reality. Eurosurveillance, 22, 30494.

30.

Song

, Gui

, Wang

, et al. 2018. Cryo-EM structure of the SARS coronavirus spike glycoprotein in complex with its host cell receptor ACE2. PLoS Pathog. 14, e1007236.

31.

, Tu

, Gao

, et al. 2020. Current epidemiological and clinical features of COVID-19; a global perspective from China. J. Infect. 81, 1–9.

32.

Wibral

, Pampu

, Priesemann

, et al. 2013. Measuring information-transfer delays. PLoS One, 8, e55809.

33.

Xie

, and Chen

(2020). Insight into 2019 novel coronavirus an updated interim review and lessons from SARS-CoV and MERS-CoV. Int. J. Infect. Dis. 94, 119–124.

34.

, Chen

, Wang

, et al. 2020. Evolution of the novel coronavirus from the ongoing Wuhan outbreak and modeling of its spike protein for risk of human transmission. Sci. China Life Sci. 63, 457–460.

35.

Yang

, and Leibowitz

J.L.

2015. The structure and functions of coronavirus genomic 3′ and 5′ ends. Virus Res. 206, 120–133.

36.

Yin

2020. Genotyping coronavirus SARS-CoV-2: Methods and implications. Genomics, 112, 3588–3596.

37.

Zhou

, Yang

X.L.

, Wang

X.G.

, et al. 2020. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature, 579, 270–273.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.01 MB