Machine learning tool identifies new COVID-19 variants earlier

machine learning

A new machine learning tool called VariantSpark has been developed by CSIRO that acts as an early warning signal to identify dangerous COVID-19 variants. This could be adopted as an international standard of disease surveillance. 

This is a faster and more comprehensive way to identify emerging and dangerous variants, as it analyses the DNA of the whole variant. This is an improvement to the current method, which monitors changes to the spike protein. 

There are more than 11 genes within the COVID-19 virus, and these interact with the human immune system in different ways. By looking beyond the spike protein, researchers can better predict how a new variant might behave inside the human body. 

It is hoped that the research could help inform an early warning system that can determine which variants will be the deadliest to humans. 

CSIRO scientist, Dr Denis Bauer, said up until now, the method of tracking new variants was to look for genetic changes in variants that are currently being monitored, such as Delta and Omicron. 

“By harnessing the capability of a powerful machine-learning tool we developed, called VariantSpark, we were able to analyse the genomes of 10,000 COVID-19 samples, which is the largest number of samples ever analysed in this way,” Bauer said.  

“Our approach was able to identify variants that could be monitored a week before they were flagged by health organisations – and a week is a long time when you’re trying to outsmart a pandemic. We can also apply this approach to other viruses – in fact it has the potential to become the international standard of disease surveillance.” 

The tool was programmed to provide hourly updates, providing the potential for information to be quickly shared with public health decision makers and to prepare hospitals for increases in demand. 

“VariantSpark analyses the entire genome of a virus and can account for small changes that on their own may not seem significant, but when combined with other small changes can influence the way the virus behaves,” Bauer said. 

“The power of data and technology is vital to preparing hospital systems for the ongoing impact of the COVID-19 pandemic, as well as future pandemics.” 

CSIRO worked with RONIN, whose cloud-based system supported the analysis and Intel on the study, which is the largest of its kind in the world. 

“One of the biggest challenges in learning about new health phenomena is managing large amounts of information, which may consist of millions of data points, as is the case when you’re looking at genetic data,” RONIN principal bioinformatician Dr Parice Brandies said. 

It is hoped this new approach will eventually be used to develop vaccines to prevent future variants and pandemics.