About COG-UK/Mutation Explorer
The COG-UK/Mutation Explorer (COG-UK/ME) provides information and structural context on mutations and associated variants in the genes encoding SARS-COV-2 proteins that have been identified from sequence data generated by the COVID-19 Genomics (COG-UK) Consortium. We focus on SARS-CoV-2 spike gene mutations of potential or known importance based on epidemiological, clinical and/or experimental observations.
The Mutation Explorer comprises of:
- the designated global variants of concern and their structural contexts
- high frequency individual amino acid replacements, a subset of which may be important
- heatmap of antigenic mutations accumumulating on top of lineage-defining mutations of VOC/VUI
- frequency plots for mutations at specific residue in SARS-CoV-2 ORFs (Mutation Visualiser)
- mutations of potential antigenic significance as indicated by experimental studies: shown to lead to weaker neutralisation of the virus by convalescent plasma from people who have been infected with SARS-CoV-2 and/or demonstrated escape from some monoclonal antibodies (mAbs) that may be given to patients with COVID-19 (Antigenic Information: Antibody Sites)
- mutations in T cell epitopes as indicated by experimental studies (Antigenic Information: T Cell Epitopes).
Data source and processing
The analysis described in this report is based on 651,228 UK-derived genomes after dedeuplication, sequenced by COG-UK: complete data in the MRC-CLIMB database to 02/08/2021, with the latest sequence from 30/07/2021.
A report of the geographic distribution and prevalence of SARS-CoV-2 lineages in general, and global variants of interest, can be found here. Amino acid replacement, insertion and deletion counts for all SARS-CoV-2 genes in the global GISAID database can be found here.
- This report is for information only. The clinical and public health importance of any single mutation, or combination of mutations cannot be determined from sequence data alone.
- Putative evidence for the importance of any single mutation, or combination of mutations can be derived from computational biology and further evaluated by laboratory experiments. Genomic and laboratory evidence then need to be combined with clinical datasets that are designed to allow detection of increased transmissibility, change in disease severity, drug resistance or altered vaccine efficacy. For this reason, surveillance and risk assessment of mutations and variants is a multi-agency process involving UK Public Health Agencies who have access to detailed information on patients and populations, and other groups including NERVTAG (New and Emerging Respiratory Virus Threats Advisory Group).
- COG-UK generates around 10,000 genomes a week, which will rise to 20,000 per week by March 2021. When COVID-19 infection rates are high, not all viruses from infected people will be sequenced and some mutations at low frequency will not be detected, but COG-UK aims to take representative samples from across the UK.
Mutations arise naturally in the SARS-CoV-2 genome as the virus replicates and circulates in the human population. As a result of this on-going process, many thousands of mutations have already arisen in the SARS-CoV-2 genome since the virus emerged in late 2019. As mutations continue to arise, novel combinations of mutations are increasingly observed. The vast majority of mutations have no apparent effect on the virus. Only a very small minority are likely to be important and change the virus in any appreciable way. This could include a change in the ability to infect/transmit between people; a change in disease severity; or a change in the way the virus interacts with the immune system (including the response generated by a vaccine). We pay most attention to mutations in the gene that encodes the Spike protein, which is associated with viral entry into cells and it is relevant in the context of immunity and vaccine efficacy.
- Mutation is used to describe a change of a nucleotide in the virus RNA genome, a subset of which results in a change in amino acid (sometimes referred to as a substitution or replacement), or a mutation can refer to a deletion or insertion event in the virus genome. By convention an amino acid change is written N501Y to denote the wildtype (N, asparagine) and replacement amino acid (Y, tyrosine) at site 501 in the amino acid sequence.
- Viral variant refers to a genetically distinct virus with different mutations to other viruses. Variant can also refer to the founding virus of a cluster/lineage and used to refer collectively to the resulting variants that form the lineage.
- Lineages are assigned combining genetic and, in the case of SARS-CoV-2 due to weak phylogenetic signals, also with epidemiological data. COG-UK uses the nomenclature system introduced by Rambaut et al. (2020), see https://cov-lineages.org.
- VUI is used by Public Health England to indicate Variant Under Investigation.
- VOC is used by Public Health England to indicate Variant of Concern.
Dashboard reports are not advice. They capture research findings which are always necessarily provisional. They are for research use only. Commercial use/resale is not permitted.
COG-UK/ME is developed within and funded by the COVID-19 Genomics UK Consortium by Derek W. Wright, Joseph Hughes, William Harvey, MacGregor Cox, Rachel Colquhoun, Ben Jackson, Andrew Rambaut, Thomas Peacock, David L. Robertson, Alessandro M. Carabelli. COG-UK/ME is based on the CLIMB framework, and maintained by the MRC-University of Glasgow Centre for Virus Research. Follow COG-UK to be notified of updates.
Variants of concern (VOC) and under investigation (VUI) and any other variant by weeks and days
Variant sequence counts are grouped either by week starting on Sunday or by day. The most recent sequence data (approx. the last two weeks) have low sample numbers, so are highlighted with a grey box for the last two weeks of the weekly chart or from the second-to-last Sunday onwards for the daily chart.
Variants of concern (VOC) and under investigation (VUI) detected in the UK data
Download a CSV file, for each variant, containing COG-UK sequence name, sample date, epidemiological week, epidemiological week start date and global lineage. Cumulative UK sequences are filtered by the selected lineage of concern.
Spike protein structure (B.1.1.7)
Spike protein mutations (B.1.1.7)
Spike protein mutations (B.1.617.2)
Spike protein mutations (B.1.351)
Spike protein mutations (P.1)
Spike amino acid replacements detected in the UK data: frequency, nations and date of first detection
Individual amino acid replacements detected in UK genomes are shown (sequences ≥ 5). Neither insertions nor deletions, nor synonymous mutations are included.
NB Number of genomes is not equal to number of COVID-19 cases as data have not been deduplicated.
Download a CSV file, for each amino acid replacement, comprising COG-UK sequence name, sample date, epidemiological week, epidemiological week start date and global lineage. UK sequences are filtered by a 28 day period up to and including the most recent UK sequence date.
Antigenic amino acid replacements in variants of concern (VOC) and variants under investigation (VUI) in addition to their defining mutations
Spike amino acid replacements reported to confer antigenic change relevant to antibodies, detected in the UK data
The table lists those mutations in the spike gene identified in the UK dataset that have been associated with weaker neutralisation of the virus by convalescent plasma from people who have been infected with SARS-CoV-2, and/or some mAbs that may be given to patients with COVID-19 (referred to below as "escape").
There is no evidence at the time of writing for this impacting on the efficacy of current vaccines or the immune response to natural SARS-CoV-2 infection.
- High: Antigenic role of mutation is supported by multiple studies including at least one that reports an effect observed with (post-infection serum) convalescent plasma.
- Medium: Antigenic role of mutation is supported by multiple studies.
- Lower: Mutation is supported by a single study.
Spike protein domain definitions
- SP, signal protein (residues 1-13)
- NTD, N-terminal domain (14-303)
- RBD, receptor-binding domain (331-527) which includes the RBM, receptor-binding motif (437-508)
- FP, fusion peptide (815-834)
- Residues outside of these specific domains are labelled by subunit, S1 (residues 1-685) or S2 (residues 686-1173)
Download a CSV file containing COG-UK sequence name, sample date, epidemiological week, epidemiological week start date and global lineage. Cumulative UK sequences are filtered by the selected amino acid replacement.
Spike amino acid replacements in T cell epitopes, detected in the UK data
T-cell epitope data have been compiled by Dhruv Shah and Thushan de Silva, University of Sheffield.
Predicted binding percentile rank values have been calculated by Morten Nielsen, The Technical University of Denmark.
- WT Percentile Rank Value and Mut Percentile Rank Value: predicted IC50 nM for the corresponding reported restricting allele. Predictions were performed using the NetMHCpan BA 4.1 algorithm, hosted by the IEDB.
- Fold difference indicates Increase/decrease in affinity defined by a two-fold difference in predicted IC50 nM.
- Binding is reported as a percentile rank value (as described here), the lower the value the stronger the binding.
- For HLA-I, values less then 2 are binders and values less than 0.5 strong binders.
- For HLA-II, values less then 5 are binders and values less than 1 strong binders.
T cell epitope sequence viewer
Move the slider to see sequence logs showing amino acid replacements in any epitope that overlaps on a specific position in the spike protein sequence. Each letter represents an amino acid replacement present in a specific epitope. The number below the sequence logo shows the position relative to the start position of the epitope. The height of a letter gives a measure of frequency of a mutation, whereas colour indicates amino acid chemistry. Frequencies are normalised within each epitope on a scale of entropy (0 to 4.3 bits). The wild-type epitope sequence and the start and end positions of the epitope are displayed above each sequence logo.
We thank Wagih, Omar. ggseqlogo: a versatile R package for drawing sequence logos. Bioinformatics (2017) .
Amino acid mutations reported to confer resistance to antiviral therapies, detected in the UK data
The table lists those mutations in the SARS-CoV-2 genome identified in the UK dataset that have been associated with resistance of the virus to antiviral treatments. There is variation in the detail of the viral assays between the different studies displayed here.