Abstract
The COVID-19 Genomics UK (COG-UK) Consortium was created to deliver large-scale and rapid whole-genome virus sequencing. Its data will help Public Health Agencies to manage the COVID-19 pandemic in the UK and inform vaccine research efforts. From a wider perspective, data sharing around the genomics of viruses (and bacteria, protozoa parasites and fungi) offers researchers contrasting perspectives and new biological insights into the evolution of microbes and the development of new tools in managing infectious disease, which on a global scale causes significant mortality. Genomics has an evolving role in pathogen diagnosis and surveillance.
Clinical case scenario
An international outbreak of Salmonella enteritidis was linked to restaurants around Europe and a hospital in central England. The infective strain was characterised by Whole Genome Sequencing. Using epidemiology and environmental health data with the sequence data, it was determined that the source was a German egg producer. Public health control measures were implemented to ensure appropriate processing of the contaminated eggs.
Background
Whole Genome Sequencing (WGS) is being implemented by Public Health England (PHE) and as part of the Modernising Medical Microbiology Movement (2020). The potential value of WGS in managing infectious disease includes increased precision, sensitivity and speed. With the cost of WGS falling, there is an opportunity for other cost-effective benefits. The use of WGS to identify and track emerging infections is, for example, becoming more a significant benefit (phgFoundation, 2014).
In the 100 000 Genome Project, WGS has a focus on infectious diseases, which includes the study of Human Immunodeficiency Virus (HIV), Tuberculosis (TB) and Hepatitis C (HCV). WGS is providing rapid insights into potential, existing and novel treatment targets. Treatments are likely to develop from the understanding of genetic pathway and genome sequencing studies, particularly in HCV and HIV studies which offer researchers new targets and new anti-viral therapies. HIV and HCV sequencing data will offer information that also takes into account current and evolving resistance to anti-viral therapies.
The aspiration is that advanced genomics sequencing technology can assist clinical practice addressing key issues such as those listed below:
Of importance will be information sharing, which includes populating databases linking epidemiology with genome data. Leading health organisations, such as the World Health Organisation (WHO), the European Centre for Disease Prevention and Control and PHE, provide crucial parallel activities of oversight and monitoring, contact tracing and expert advice to policy makers to inform the development of public health programmes. Other examples of supporting infrastructure are the CEPI initiative (https://cepi.net/ accessed May 2020), co-ordinating COVID-19 vaccine development globally, and the ARTIC project (https://artic.network/ncov-2019 accessed May 2020), aiming to support rapid diagnostic testing in the field.
Infectious disease surveillance
Disease surveillance is an ongoing process that involves the systematic collection, analysis, interpretation, and dissemination of information regarding the occurrence of diseases in defined populations for public health action to reduce morbidity and mortality (Osterholm and Hedberg, 2015). The goals of infectious disease surveillance are threefold:
To describe the current burden and epidemiology of disease, helping to guide vaccine development and inform recommendations such as infection control or drug administration To monitor trends such as treatment resistance and response to vaccine programmes To identify outbreaks and new pathogens (Murray and Cohen, 2017). In order to understand disease surveillance, there must be an understanding of how pathogens generate diversity and spread; these principles are discussed using the influenza virus as an exemplar
There are three types influenza, A, B and C. Influenza A (IAV) virus is the most virulent and is the pathogen responsible for most major epidemics or pandemics. Influenza B generally causes less severe illness in adults, but the severity may be similar in children to that caused by IAV. Influenza C causes mild illness, a ‘common cold’, or may even be asymptomatic. Up to 650 000 deaths annually are associated with respiratory diseases from seasonal influenza, (Burch et al., 2009), but this is the tip of the iceberg; disease burden also comprises impact on families and communities, as well as the economic impact on healthcare services and days lost from work.
In the context of influenza two important components are the major surface glycoproteins haemagglutinin (HA) and neuraminidase (NA), on which sub-typing is based (e.g. H1N1) for influenza. HA binds to sialic acid (SA) which acts as receptor on the host cell plasma membrane, mediating attachment and entry of the virus to host cells. HA is also a target for neutralising antibodies. NA has sialidase enzymatic activity and is involved in the final step of the replication cycle, leading to release of mature virions. HA and NA are significant in the infectious cycle of the virus and are targets for anti-viral medications and vaccine development.
Swine, avian and human viruses generally have HA receptor-binding specificity for particular SA: swine can become infected with human and avian viruses, and humans can become infected with avian and swine viruses. The existence of swine and avian reservoirs is fundamental to the evolution and generation of generation of diversity of influenza viruses, and emergence of zoonotic strains with potential to cause pandemics. Two factors, antigenic drift and antigenic shift, are the main mechanisms of generation of genetic diversity and are crucial in determining the potential for the virus to cause epidemics and pandemics (Treanor 2004).
Antigenic drift
Antigenic drift refers to the accumulation of point mutations in the virus genome and occurs as a result of lack of proof-reading capability of RNA polymerase, which in turn, means RNA viruses mutate and drift more rapidly than DNA viruses or higher organisms. This leads to high genetic diversity within a viral population, helping the influenza virus to adapt rapidly to a new environment and selection pressures such as host immune response and anti-viral agents. Viral sub-types with different variants can also gain survival advantage by exhibiting co-operation, for example, resulting from the co-existence of a variant, facilitating cell entry and one facilitating cell exit.
Antigenic shift
Antigenic shift can occur if different strains concurrently infect a cell. It describes what happens when genetic material from related but different strains of a virus recombine, to create a new virus, with some genetic material from one of the original viruses, and some from another. In this respect it has some parallels with sexual reproduction, although the material comes from related viral strains, rather than from parents. Antigenic shift thus refers to the evolution of the virus through reassortment and recombination of larger lengths of genetic material, or gene exchange, and is enabled by the segmental nature of the viral genome (Treanor, 2004). As swine can be infected concurrently by both human and avian influenza viruses, they have been described as a ‘mixing vessel’ for antigenic shift to occur. Antigenic shift causes a rapid change in the virus's antigenicity, and potentially pathogenicity, if there is a lack of existing immunity in the human population. If a resultant reassortment virus is then transmissible between humans an epidemic or pandemic can occur, exemplified by the 2009 pandemic caused by virus strain H1N1 and the current SARS-CoV-2 pandemic. Following an epidemic or pandemic, the causal strain may continue to evolve and become responsible for seasonal influenza.
Antigenic drift and shift necessitate development of influenza vaccine on at least an annual basis to tackle seasonal flu, and additional vaccines in response to epidemic or pandemic. In contrast to influenza, tracking of SARS-CoV-2 has so far demonstrated very little evolution within the viral genome as it does not undergo antigenic shift (Kolchinsky, 2020).
Phylogenetic trees
Comparison of viral genome sequence through WGS of samples from different patients is used to build a phylogenetic or ‘family’ tree (Mutreja et al., 2011). Phylogenetic trees show evolutionary relationships among species, sub-types or strains, based on genetic difference. The ability of WGS to detect variation between viral genomics down to the level of individual DNA base-pairs results in phylogenetic trees with much greater resolution, for example, in surveillance of IAV.
They are used as a tool in population level surveillance and infectious disease outbreaks in combination with epidemiological data and contact tracing to establish geographical and temporal origins and trends. They can be used to pinpoint infection outbreaks, infection ‘hotspots’ and also ‘superspreaders’ (circumstances where an individual infects significantly more people than expected). A key question is ‘Do we have an outbreak?’ i.e. is there evidence of transmission from one person to another? If it can be shown that the pathogen involved is different in one person than in the other (apart from minor differences that could have arisen through mutation in that time), then you can rule out transmission, and you do not have an outbreak to investigate and manage. This can be very powerful and WGS can be very helpful – as it has been when, for example, two people with some degree of contact have different strains of TB.
Influences on viral populations that lead to genetic diversity include frequency of replication, mutation rates and the fitness of the mutant strain, this together with comparative genomics offers a window to understanding how viral populations may change.
Disease tracking and surveillance
At the population level, influenza monitoring and surveillance are used to predict and track outbreaks, epidemics and pandemics, define characterisation of viral sub-types to inform vaccine development and in the case of influenza the embedding of both within a co-ordinated public health programme including vaccination of ‘at-risk’ groups.
WHO have an influenza surveillance system, with an underlying rationale that surveillance and laboratory diagnosis is necessary to allow antigenic characterisation of emerging strains to guide vaccine development, to detect epidemics in order to implement public health control measures, to plan for impact of disease on services and to collect morbidity and mortality data (Fakiola et al., 2015) Appropriate surveillance is a pre-requisite to vaccine development. The 2009 H1N1 influenza A pandemic began with emergence of a new swine-origin IAV in Mexico and the United States, spreading to 30 countries within a few weeks. Retrospective WGS of samples and phylogenetic analysis demonstrated that it was derived from several viruses which had been circulating in swine for many years prior to the out-break (Smith et al., 2009). This provided evidence to support pre-emptive monitoring of swine populations in predicting future pandemics.
Surveillance (Fig. 1) is important in detecting global pathogens such as Vibrio cholera (Moore et al., 2014) including emerging ones such as Ebola, Zika and of course Coronavirus. Surveillance assists in identifying emerging pathogens whether they are human-to-human or zoonotic in origin. Active Surveillance exemplified by a longitudinal prospective observational study using whole-genome sequencing establish and detect linked sources of infections following a cluster of cases such as for influenza, Legionella pneumophila and HCV. Mandatory surveillance by PHE includes that for Clostridium Difficile and particularly the surveillance of pathogens in anticipation of national and global outbreaks such as Influenza e.g. the National surveillance by the RCGP Research and Surveillance Centre for Influenza.
Microbial surveillance takes different forms.
There are many examples of the use of WGS data to show whether clusters of cases are in fact outbreaks (i.e. person-to-person transmission has occurred), or if there is an alternative explanation. For example, some ward ‘outbreaks’ of C.difficile have been shown to be a result of the same antibiotic pressure allowing overgrowth rather than transmission due to poor hygiene. Another example is the tracking of Pseudomonas infections to a cleaning solution marketed for ear piercings; WGS of the organisms in infected individuals proved conclusively that they had been infected by organisms within a barrel used to produce the cleaning solution. This enabled rapid tracking, successful removal of the inadequately manufactured product, and in fact, prosecution (Evans et al., 2018).
Cholera outbreaks highlight the importance of public health infectious disease monitoring and surveillance in tracking outbreaks and guiding implementation of public health and infection control measures particularly to prevent further devastation to communities that may have fragile infrastructure.
In the identification of the V.cholerae 01, which causes dehydration and diarrhoea, WGS may be applied to the study of outbreaks and epidemics in the context of local infrastructure disasters such as flooding which may lead to contaminated water and food. The importance of high-quality global genomics databases cannot be stressed enough. The value that WGS offers is the discrimination between isolates
A good example of where WGS may have an impact in prevention is around preventing healthcare-associated MRSA bacteraemia. WGS identified the involvement of a healthcare worker in the transmission chain on the Special Care Baby Unity (SCBU). This was made possible by the discriminatory power of WGS and enabled implementation of infection control measures to control the outbreak (Harris et al., 2013).
For complex studies, the use of combinations of techniques for investigation has worked in characterising strains involved in outbreak such as in the case of a US multistate Listeriosis outbreak. The use of WGS retrospectively for an outbreak of Legionnaires disease through rapid WGS demonstrated that it was possible to differentiate between outbreak and non-outbreak strains (Reuter et al., 2016).
Another value of WGS is the ability to provide data that support treatment decisions and in the investigations of outbreaks for organisms such as Chlamydia trachomatis that are sometimes difficult to culture. In a similar vein, WGS has been used in the investigation of a suspected TB outbreak, however, the challenge for outbreak analysis is the slow growing Tuberculosis organism and WGS has been shown to be an accurate method in typing M. tuberculosis in an outbreak (Gardy et al., 2011). The UK National Enhanced TB Surveillance system records typing data M. tuberculosis (UK Government, 2015). The use of WGS in observing the transmission of Neisseria gonorrhoeae infections illustrates a new tool in the analysis of the spread of gonorrhoea with the need for national coordination around case-finding for treatment purposes especially in antibiotic-resistant gonorrhoea cases (De Silva et al., 2016).
Diagnosis and treatment
‘Phenotype testing’ is based on observable characteristics of the pathogen, such as protein expression, receptor-binding or motility and methods include serological methods and enzyme-linked immunosorbent assays.
Current pathogen detection methods by culture-based techniques and genotyping with identification of virulent strains assist with identification and management of microbial resistance and outbreak investigation (Fournier et al., 2014).
The primary focus of pathogen diagnosis should be for quick accurate and timely reporting with the potential for high throughput (e.g. methicillin resistance Staphylococcus aureus (MRSA), salmonella, Escheria coli). The reporting must go through rigorous quality control ssessment including performance metrics. This is particularly true in the setting of the management of sepsis (where ideally answers should be within hours).
The transition to genotype has developed at least since the 1980s when polymerase chain reaction (PCR) was developed. Such candidate gene-based molecular typing methods then led on to refinements (RT-PCR) which improved analytic and clinical utility. Laboratories have switched to reverse transcriptase polymerase chain reaction (RT-PCR, or just PCR) technologies, which use gene primers to look for DNA or RNA sequences. This is a technology that relates to WGS, in as much as it also looks at sequences of DNA or RNA, but it only looks for specific sequences, rather than describing the whole genome. It is quick and uses the primers and kit, which are relatively cheap compared with labour-intensive traditional microbiology.
In the healthcare setting, diagnosis and assessment for anti-viral drugs needs to be rapid. Rapid influenza diagnostic tests (RIDTs) and RT-PCR may be used. RIDTs are antigen-based tests used at point of care and have high specificities, but low sensitivities. RT-PCR can be used directly on clinical samples giving rapid results and more in-depth genetic analysis in order to monitor evolution and degree of relatedness of viruses. It is currently regarded as the gold standard for influenza diagnosis (Sintchenko and Holmes, 2015), although its use is primarily in disease surveillance, with clinical diagnosis guiding the management of the individual patient.
‘Panels’ of PCR-based tests are now commonly used in testing samples for e.g. respiratory and gastrointestinal pathogens. These have the advantages of speed and economy, but interpreting the results is not always straightforward. Distinguishing pathogenic from non-pathogenic strains of E.coli – and, indeed, from other bacteria that produce the same toxins – has proven challenging, for example.
COVID-19 testing is currently based on qPCR based on data from the genome sequence. In this context PCR testing detects viral RNA. It does not detect, for example, viable, potentially infectious bacteria.
Time from sample to result is influenced by many other factors including sample transport to a testing laboratory, report generation and upload. In COVID-19, WGS provides an example where speed has conferred a major advantage in identifying potential treatments. Rapid comparison of the COVID-19 genome with that of other coronaviruses had initially identified that the existing medications hydroxychloroquine and remdesivir were of potential benefit to those infected with COVID-19, leading to initiation of clinical trials.
In the setting of cultures of slow growing pathogens such as M.tuberculosis sequencing can provide rapid drug susceptibility information in the setting of multi drug resistance and identify drug resistant (XDR) strains of TB. TB is one of the organisms for which WGS has proven most valuable. The use of the data is applied to assessing evolution, transmission, typing, and resistance analysis for M.tuberculosis. (Kato-Maeda et al., 2013; UK Government, 2015). The use of WGS for this slow-growing organism has been validated for retrospective cohort studies prospective use both for outbreak investigation and typing for treatment identifying resistant organisms.
Differing virulence and susceptibility
WGS may help us to understand how some microbial strains can cause more serious disease than others (virulence) and describe patterns of antibiotic resistance if we know the genes that encode this problem (Fig. 2). WGS through deeper coverage and speed of analysis offers genetic discrimination between isolates. The use of WGS to study within-host mutation rates provides insights into how microbials adapt and develop resistance mechanisms. The importance of genomic databases pervades as data accrues.
Antimicrobial resistance mechanisms.
The UK Chief Medical Officer's report (UK Government, 2013) on antimicrobial resistance and its position on the National Risk Register stresses the importance of antimicrobial stewardship in the setting of bacterial causes of deaths being a worldwide issue (Fig. 1). On a global scale there is a drive to tackle antimicrobial resistance through the implementation of the Global Action Plan (2020). The British Society for Antimicrobial Chemotherapy (BSAC) has projects that have a particular focus on antimicrobial susceptibility and resistance (BSAC, 2020). Post-identification, identifying clones and Antimicrobial resistance (AMR) are all factors that interlink and determine likely response to treatment. Virulence can be determined by horizontal gene transfer influenced by plasmid insertion through conjugation, transposons and phage insertions, all of which influence and can lead to drug resistance and the production of adverse toxins through adaption.
One such example is through understanding the effect of mobile genetic elements on different hospital-adapted Enterococcus faecalis lineages in the response to vancomycin (sensitivity or resistance).
Another example, C.difficile (2020) develops in individuals who have a predisposition (such as a change in gut flora due to antibiotic therapy) and may develop disease (UK Government 2014a). Genetically distinct isolates have been identified through WGS together with an understanding on transmission patterns which demonstrated that only 1/3 of infections were due to direct transmission from symptomatic cases. Rapid bench top analysis is now possible for C.difficle.
In the case of coronavirus the COVID-19 UK consortium aims to support research to identify host genomic factors that may influence features and severity of disease within an individual. It will also be important to help understand variability such as within ethnic groups. ACE2 receptor, ACEI/AR2 blockers, ethnicity, obesity, co-morbidity and immunosuppression are just some factors that are being studied in the context of virulence.
Vaccine development
Influenza also demonstrates that an effective surveillance programme is fundamental for effective vaccine development. When there is high similarity between influenza viruses in the vaccine and circulating viruses, the vaccine effectiveness estimate is around 50–60%. The late emergence of an antigenic variant IAV (H3N2) in mid-2014 meant it could not be incorporated into the 2014–15 northern hemisphere vaccine. This resulted in lower estimated vaccine effectiveness of 23%, illustrating the importance of accurate prediction of circulating strains, and prompting a consultation in improving influenza vaccine virus selection.
Prevention in the form of administration of influenza vaccine to at-risk groups remains a cornerstone of influenza control. Vaccines for seasonal influenza are developed annually following recommended composition by the WHO as a result of surveillance data. There is an annual immunisation programme in the UK administered through primary care for those in at-risk groups, healthcare workers and carers.
Vaccines have historically been trivalent, including two influenza A and one influenza B strain, or quadrivalent, including two influenza A and B strains. Two types are available: inactivated (administered via injection) and live attenuated (usually administered nasally), the latter is indicated for children and pregnant women. An interim monovalent vaccine has been developed in response to pandemic influenza previously.
Vaccines are traditionally developed and produced through viral propagation in eggs or cell cultures. Current vaccine technologies give a timeline many months, necessitating a decision regarding antigenic composition of the vaccine almost a year before peak influenza activity. The antigenic composition is revised twice annually in light of data from the WHO global influenza surveillance and response system and recommendations issued in February and September.
WGS is a powerful tool potentially of value in influenza surveillance and vaccine development. WGS could provide sequence data more rapidly and in a more timely way, in particular at the end of the influenza season or in the event of a pandemic, hopefully shortening the timeline for deriving suitable egg isolates and reassortments for use as candidate vaccine viruses (Hampson et al., 2015).
Reverse vaccinology (and multigenome reverse vaccinology which incorporates serotypes) describes the process whereby using high throughput sequencing, genes encoded by bacterial pathogens that have a role in immunogenicity are tested for in pre-clinical studies to develop potential vaccines. It may help identify stable parts of the virus – antigens which are less subject to mutation, and which will thereby provide longer-lasting protection than the current vaccines, which rely on rapidly changing epitopes. It was also used to develop a vaccine against group B meningococcus. The usual polysaccharide antigen in group B cross-reacts with human tissue, and so was not antigenic (or risked causing autoimmune disease). ‘Reverse vaccinology’ was used to identify other proteins that might be antigenic and unique to meningococci. The process was extremely slow and expensive.
Vaccine development for coronavirus is happening at the time of writing with an anticipated shorter timescale for development partly as a result of speed of publication of the viral genome sequence.
Summary
Reducing turn around time relies on laboratory infrastructure and accreditations including trained staff and bioinformaticians with necessary access to curated pathogen genomic databases. Nevertheless, we can expect to see genomics play an ever-expanding role in clinical management of infectious diseases such as influenza, MRSA, Salmonella and COVID-19, and in implementation of public health programmes which have direct impact and relevance to us in primary care.
KEY POINTS
This article provides an overview of the principles around pathogens and genomics using case studies in the context of diagnosis and monitoring of infectious disease Genomic technologies play key roles in diagnosis, treatment, vaccine development and surveillance of common infectious diseases such as influenza, TB and MRSA In addition, supporting infrastructures for knowledge-sharing, database development and disease surveillance are essential in combatting infectious disease outbreaks, epidemics and pandemics Referring clinicians need timely reports, including antibiotic sensitivity, to enable real-time clinical utility In general practice the annual vaccination programme around the influenza virus provides a real-time exemplar of the potential of infectious disease genomics Since the middle of the 18th century five pandemics have occurred: 1918 ‘Spanish flu’ pandemic, causing an estimated 20 000 000-40 000 000 deaths worldwide, 1957 ‘Asian flu’ pandemic, 1968 ‘Hong Kong flu’ epidemic, and in 2009 IAV (H1N1) ‘swine flu’ pandemic; In 2019–20 we have witnessed the SARS-CoV-2 or COVID-19 global pandemic
