The German National Library of Science and Technology (TIB) has set itself the goal of promoting the use and dissemination of its collections. Against this backdrop, TIB publishes the authoritative as well as time-based, automatically generated metadata of scientific videos from the TIB AV Portal as linked open data. By publishing this metadata, TIB is offering a new service involving the provision of datasets for further use in standard RDF format.
What is linked open data?
Linked open data (LOD) describes linked, structured data published under open access licences – which means that it can be reused. This data can be identified by Uniform Resource Identifiers (URI), which enable resources to be linked and queried. URIs are much more stable than Uniform Resource Locators (URLs). The flexible, machine-readable Resource Description Framework (RDF) – the standard for describing metadata – is used to translate data into code. The RDF model is based on a triple structure, where a subject and an object are related to a predicate. This way, LOD enables metadata to be linked and enriched, which means that related media can be located in the web of data, and that data can be reused.
Why use linked open data?
The advantage of using LOD is that structured, machine-readable data can be provided and reused by third parties. As a result, libraries can draw even greater attention to their collections on the internet. LOD also enables users to submit better and more efficient search requests and to locate and link relevant information. In contrast to linked data, linked open data is published under open access licence terms, making it freely available.
Many institutions already enable users to download datasets in RDF format. The German National Library (DNB), for example, makes title data from the DNB and standard data from the Integrated Authority File (GND) available to libraries and other customers in its Data Shop. The European Library, an online portal that offers access to the collections of the 48 National Libraries of Europe and numerous research libraries, makes available 95 million bibliographical title data, including a subset of around 20 million bibliographical title data from 34 British partner libraries.
The Europeana, the European virtual library of the scientific and cultural heritage of European history, has made metadata available under a free licence since October 2012. The collection includes 20 million items – not only texts, but also photos, videos and sound documents.
Linked open data in the TIB AV Portal
By developing the AV Portal (av.tib.eu), TIB and the Hasso Plattner Institute have developed a user-centered platform for scientific films. The portal offers free access to high-quality computer visualisations, simulations, experiments, interviews as well as recordings of lectures and conferences from the fields of science and technology. The AV Portal’s automatic video analysis includes not only structural analysis (scene recognition), but also text, audio and image analysis. Automatic indexing by the TIB AV Portal describes the videos at the segment level, enabling pinpoint searches to be made within videos. Films are allocated a Digital Object Identifier (DOI), which means they can be referenced clearly. Individual film segments are allocated a Media Fragment Identifier (MFID), which enables the video to be referenced to the second, and cited.
TIB now also makes available the metadata and thumbnails of audiovisual materials in RDF format at https://av.tib.eu/opendata, use of which involved agreeing upon a CC0 1.0 Universal licence. Please note that some of the data was generated automatically, which means it may be incomplete or inaccurate. In future, datasets will be updated every three months.
In addition, users can take a tutorial on the above-mentioned website. This tutorial gives a brief overview of the structures of datasets in the TIB AV Portal. An explanation is given of how datasets can be imported into an RDF database and queried using SPARQL.
For more information about the topic explored in this blog post, see also Paloma Marín Arraiza’s “Scientific Audiovisual Materials and Linked Open Data”:
Margret Plank, Head of the Competence Centre for Non-Textual Materials
Sandra Simon, Subject Librarian in training