The coronavirus pandemic has been the first global crisis in which data has played a major role in the response of governments, agencies and the healthcare sector. Here we take a look at the multiple ways in which data science is helping to tackle the pandemic.

Data sharing

Open source data communities are providing datasets to help scientists test theories and develop models. For instance, i2b2 is a data warehouse containing COVID-specific data which enables scientists and statisticians across the world to investigate healthcare data. The international 4CE consortium also publishes healthcare data and analytical tools to enable researchers to collaborate on projects and check their findings. Research groups are also using Oxford University’s OxCOVID19 database, which draws daily data from sources worldwide and also uses modelling and machine learning techniques to predict the future course of the pandemic.

Many of us have provided a huge amount of personal data to organisations during the pandemic, from check-ins in pubs and bars to vaccination certification apps. During a pandemic it is crucial that public trust is not undermined, making experts in data ethics more important than ever. Data ethicists help to ensure that data is used fairly and in accordance with all data protection regulations. According to the Information Commissioner’s Office, this “means high standards of governance and accountability to ensure compliance with data protection principles, including transparency, fairness, data minimisation and storage limitation, and utilising a ‘data protection by design’ approach as part of their planning”.

Policy development

In many countries, there has been a very close feedback loop between data, lockdowns and social distancing policy. Government legislation and guidelines have been informed, to a greater or lesser extent depending on the country, by scientists basing their recommendations on the latest data. For instance, at the time of writing in June 2021, the UK Government is basing its easing of England’s lockdown on four tests. These tests incorporate the latest statistics from NHS England on infection rates and the rates of hospitalisations and deaths, complied using notifications and survey data from Public Health England and the Office for National Statistics.

Public education

The COVID-19 pandemic has seen citizens as well as governments debating multiple aspects of the situation. The latest Ofcom survey found that eight out of ten people access news about COVID-19 at least once per day. Given this urge for information, a range of sources have been developed by data scientists to help the public understand the data fully.

For example, statisticians at the Office of National Statistics (ONS) have created an interactive tool that enables the public to explore the latest data on the pandemic. It combines data from the ONS itself with sources ranging from the NHS and Public Health England to the REACT Study.

Another example is the Guardian newspaper’s COVID world map. This regularly updated set of maps and graphs shows clearly trends in the virus across the world, include case numbers and the implementation of vaccines. The user can use the tool to focus on individual countries.

Predictive modelling

As mentioned earlier, datasets such as the OxCOVID19 database are modelling the future spread of the pandemic. Other studies are honing in on specific datasets. For instance, researchers at Penn State University in the US are using mobile phone data to build networks of citizens’ movements, creating a very detailed model of the spread of the virus. These large datasets can then be used to create public health strategies based on predicted caseloads. According to Professor Duncan Watts, “This type of epidemiological modeling is a dramatic leap forward. Before, researchers would create these complex, agent-based models, but they would have to use indirect data like airline traffic or school attendance. Now, you have data on real people moving around. You can see what they’re actually doing and how their behavior changes when lockdown policies go into place. It’s drastically improving our ability to model the spread of disease and could have profound consequences for future pandemic response.”

During the early stages of the pandemic, some of the most accurate forecasting came from a Massachusetts Institute of Technology (MIT) graduate with no formal training in epidemiology. 26-year-old Youyang Gu spent a week building his own COVID-19 mortality predictor while living with his parents in California. His model soon started generating more accurate results than models from established institutions such as Imperial College London or the Institute for Health Metrics and Evaluation in Seattle. “His model was the only one that seemed sane,” Jeremy Howard, a data expert at the University of San Francisco, told Bloomberg. “The other models were shown to be nonsense time and again, and yet there was no introspection from the people publishing the forecasts or the journalists reporting on them. Peoples’ lives were depending on these things, and Youyang was the one person actually looking at the data and doing it properly.”

Diagnosis and treatment

Data scientists have also helped to develop tools for diagnosis and treatment for COVID-19. The Massachusetts Institute of Technology has built an algorithm that uses artificial intelligence to identify people with COVID-19 only by the sound of their coughs. The system has a 98.5% success rate. It was developed using the MIT lab’s collection of 70,000 audio samples, each of which contains a number of coughs. 2,500 of those are from patients with COVID-19.

MIT have also developed a machine learning system to identify existing drugs that could be repurposed to treat COVID-19 in elderly patients. The project identified RIPK1 as a target gene/protein for potential treatments. The researchers then identified a series of drugs, already approved for cancer, that act on RIPK1 and may therefore have the potential to treat COVID-19. The team plan to share their findings with the pharmaceutical industry.

Throughout the pandemic, clinicians have had to make complex and difficult decisions around mortality risk and the treatments provided. An international team led by Stefan Bauer of the Max Planck Institute for Intelligent Systems and Patrick Schwab, formerly of Roche, have harnessed machine learning to identify patterns in COVID-19 patients’ risk factors. Their algorithm has been trained to predict an individual patient’s mortality risk by analysing the data of thousands of patients from around the world. Their system can correctly predict a patient’s risk of dying up to eight days in advance in 95 out of 100 cases, helping clinicians to make informed decisions about treatment.

In conclusion

Throughout the pandemic, governments and health agencies have relied on data to make informed decisions. Fast analysis of large datasets, coupled with the predictive possibilities of artificial intelligence, has been crucial. But particularly where healthcare is involved, ensuring that the use of data is secure and transparent is more important than ever.

    Did you like this article? Get our new articles in your inbox with our occasional email newsletter.

    We will never share your details with anyone else, except that your data will be processed securely by EmailOctopus ( in a third country, and you can unsubscribe with one click at any time. Our privacy policy: