In an interview, Professor Dr. (Univ. Simón Bolívar) Maria-Esther Vidal discusses the use of big data technologies in the health sector
You lead the “Scientific Data Management” research group at TIB – Leibniz Information Centre for Science and Technology. You focus your research on how big data technologies can be used in the health sector to improve health care. What exactly are you researching?
The amount of available big data has grown drastically in the last decade, and it is expected a faster growth rate in the coming years. Specifically, in the biomedical domain, there are a wide variety of methods, e.g. liquid biopsies, medical images, and genome sequencing, that produce large volumes of data from where new biological biomarkers can be discovered. Biomarkers are biological characteristics or medical signs that can be measured in tissue or blood and can indicate the incidence of a disease or the effects of a treatment.
The outcomes of the analysis of heterogeneous biomedical data are the building blocks of precise diagnostics and effective treatments. However, biomedical data may suffer from various complexity issues – volume, variety, and veracity – which demand novel techniques for query processing and knowledge discovery to ensure accurate insights and conscientious decisions. I investigate computational methods for tackling the challenges imposed by the complexity issues of big data. In particular, I work on the definition of novel computational approaches able to exploit knowledge encoded in big data and efficiently solve critical tasks, such as big data integration, query processing at scale, and knowledge mining and discovery.
In concrete terms, this currently means two projects: BigMedilytics – Big Data for Medical Analytics, and IASiS – Big Data for Precision Medicine. Also, ImPROVit was accepted recently. Please briefly describe what these projects are about.
The projects iASiS, BigMedilytics, and ImPROVit all focus on aim at developing knowledge-driven computational tools or frameworks that enable the transformation of big data into actionable knowledge for the support of precision medicine; each of these projects tackle different health problems.
In iASiS, we’re focusing on two life-threatening diseases: lung cancer and Alzheimer’s. The goal is to develop a knowledge graph that integrates clinical data related to patients suffering from these two diseases, and exploit innovative machine-learning methods to predict survival time and treatment efficacy. Similarly, BigMedilytics deals with big data, but the main objective is to develop technologies in the healthcare sector to deliver low-cost, high-quality care to increase healthcare productivity and the market share of big data providers in oncology, cardiology, radiology, and hospital logistics. Both projects are supported by the European Union’s Horizon 2020 research and innovation programme.
ImPROVit is funded by the Volkswagen Foundation and the Niedersächsisches Ministerium für Wissenschaft und Kultur (Lower Saxony Ministry of Science and Culture) with the cooperation of partners from the Medical School of Hannover (MMH); Twincore, the Centre for Experimental and Clinical Infection Research, the Helmholtz Zentrum für Infektionsforschung (HZI), and TIB. In ImPROVit, our goal is to improve the profiling of the individual immune system of a patient in order to understand the effects of vaccination, infectious disease, and transplantation. Big data, knowledge graphs, and machine learning provide the basis for achieving these research challenges.
What roles does TIB have in these projects and what are some of the unique challenges?
In these projects, we lead the development of knowledge-driven frameworks able to integrate heterogeneous sources, e.g. clinical records, sequencing data, scientific publications, and pharmacologic data, into a knowledge graph. These frameworks rely on ontologies to describe the meaning of the integrated data. Additionally, we investigate query processing methods for knowledge exploration, as well as knowledge discovery techniques for uncovering unknown patterns and associations. Our goal is to identify the characteristics of a patient that facilitate precise diagnostics and the prescription of effective treatments. These characteristics include a large number of phenotypic and genomic features extracted from medical records and genome sequencing; the challenge is to devise the most suitable machine-learning methods for identifying and using the most relevant features to accurately predict a treatment outcome. Additionally, our techniques support the management of patients during their treatment, follow-ups, and final period of life, and help to reduce health costs.
One point that quickly comes up when dealing with health data: What dangers – such as regarding data protection – exist and how can it be guaranteed that this data will not be misused?
Dealing with clinical and sequencing data requires the definition of privacy policies to be enforced during the access and management of this data. The General Data Protection Regulation (GDPR) is an important EU law that protects the privacy of European citizens; in iASiS and BigMedilytics, we are supervised by ethical advisors that are guiding us in how to follow the GDPR. In addition, we count on the advice and guidance of our data protection officers at TIB, and we have developed computational methods that protect and strengthen data privacy. In this way, we can ensure that clinical data is not misused, and is only utilised according to the consents given by the corresponding patients.
Let’s take a look into the future: How will big data technologies have changed medicine and healthcare in ten years’ time?
Precision medicine relies on the observation that individuals differ genetically, and as a consequence, the efficacy of general treatments may be negatively affected by individual genetic variants. Tailoring treatments to the individual demands the analysis of specific genes in order to determine the eligibility of a patient for a given medication. However, patients may have different mutations that may not even be related to a diagnosed disease, but may still reduce the effectiveness of treatments or the quality of life as a consequence of the adverse effects of a treatment. A massive volume of available data needs to be processed and integrated in order to identify these patterns in the individual characteristics of a patient. Big data technologies and knowledge graphs will provide the basis for managing and mining this enormous mass of data, and enable the development of new paradigms where computational methods, physicians, and patients will be in a loop of holistic diagnosis and prescription of effective treatments. Thus, in the future, we can expect that life-threatening diseases that cause the death of millions of people every year will be treated using individualised treatments that maximise patients’ chances of survival and quality of life.