Data Science with the Open Research Knowledge Graph

by Kamel Fadel and Markus Stocker

The Open Research Knowledge Graph (ORKG) team spends considerable thought and time in developing user-friendly approaches that support creating content, i.e. structured scholarly knowledge. This is indeed an important task. Another important task is to develop and demonstrate approaches that use content to enable data science, i.e. new research, with the ORKG.

We briefly present a recent effort that demonstrates how data from multiple ORKG comparisons can be integrated to learn something new.

Let’s start with the end product. Figure 1 visualizes the contagiousness and deadliness of SARS-CoV-2 relative to numerous other infectious diseases. The so-called “Microbe scope” was inspired by a The Guardian article on how Ebola compares to other infectious diseases, which also links the required (third-party) data. We map SARS-CoV-2 using data of two ORKG comparisons: SARS-CoV-2 basic reproduction number and case fatality rate. These comparisons tabulate relevant data extracted from the scholarly literature and are useful data sources for our little data science experiment.

Fig. 1: The contagiousness and deadliness of SARS-CoV-2 relative to other infectious diseases.

You may ask, the comparisons are great but how do you create this kind of visualization?

Easy.

The ORKG Python library supports reading ORKG content directly into native Python data structures (specifically, pandas DataFrame). This can be done with the following three lines of Python code

from orkg import ORKG
orkg = ORKG(host=’https://orkg.org/orkg‘,
       simcomp_host=’https://orkg.org/orkg/simcomp‘)
df = orkg.contributions.compare_dataframe(comparison_id='[…]‘)

Now you can manipulate the DataFrame df as needed. Since we have basic reproduction number and case fatality rate estimates in ORKG from dozens of papers, we first compute mean values and then use a plotting library to create the final visualization.

ORKG makes reuse of scholarly knowledge in downstream data science a lot easier. Communities can decide which essential scholarly knowledge should be published in machine actionable form with the ORKG and curate the published content to ensure it is current and of high quality so that the information we traditionally bury in text can be reused more easily. As such, ORKG is an important and timely step towards FAIR scholarly knowledge.