TIB at WikidataCon: Part 1

Reflecting on questions of sustainability, growing the ecosystem of decentralized data repositories and ensuring knowledge equity

Introduction

This year WikidataCon marked the 9th birthday of Wikidata: “a free, collaborative, multilingual knowledge base with a focus on verifiability” [1]. The biennial conference took place online across all timezones between October 29-31st, opening up participation to a global audience. The conference included 142 sessions, roughly 80 hours of programming and over 700 unique visitors who checked into the event platform Venueless [2]. Beyond the numbers, this conference marks the growth of Wikidata into a mature product – part of the family of applications developed and maintained by the Wikimedia Movement – as well as the growth of a dedicated community of “project shapers”, “gardeners”, and “re-users” [3].

Shortly before the opening of the conference, Wikimedia Germany (the primary maintainers of Wikidata) and the Wikimedia Foundation published updated documents for their 2021 Strategy regarding the development of Linked Open Data within the Wikimedia movement and the vision for the development of Wikidata, their flagship LOD platform, as well as Wikibase – the underlying software which can enable a decentralized ecosystem of LOD data repositories to grow. The strategy documents focus on several key areas that were reflected in the programming of the conference as well. Below we provide a short overview of these.

Diagramme showing an ecosystem of decentralized Wikibase knowledge bases.
A view of the Wikimedia Linked Open Data web. Credit: Dan Shick (WMDE) / CC-BY-SA 4.0

Focus on services

There is a strong thread throughout the strategy documents as well as the conference programming that focuses on the scalable and sustainable provision of knowledge services. This includes the acknowledgement that making data in Wikidata easy to find and re-use with a high degree of trust in its quality relies on a range of additional tools and interfaces that need to easily connect with Wikidata via new and improved APIs. Sessions in the conference that focused on this topic, included:

Another key aspect of the focus on services is the scalability of the current query service that Wikidata provides (WDQS), which has been under significant strain as the knowledge graph has grown over the past years. In the spirit of openness, the members of the technical teams of Wikidata and the Search Platform at Wikimedia offered an overview of current issues and a view for the future on how they plan to manage the risks of rapid scaling and system overload in two dedicated conference sessions. Besides short-term solutions, one of the key strategies for longer term scalability that was discussed was decentralization and federation across multiple data stores.

Last but not least, reliable service provision requires sustainable tool ecosystem management – a particular challenge to large open source software movements relying on a high degree of self-initiative and volunteer labour. A dedicated panel session brought together the perspectives of tool developers, maintainers, volunteers and WMF officials around the same (virtual) table at the conference to discuss this issue. A day before the session, a member of the tool development community published a related blog post analysing the current challenges facing WMF and its tool environment, and proposed relevant mitigation tactics, including the focus on collaboration and harnessing the contributions of non-technical volunteer support:

It takes a village to raise a tool − and various specialties ranging from product ownership, design, development, operations, testing, QA, security, documentation… −  yet more often than not, a single person is behind a tool. ~ Jean-Frédéric [4]

2x2 matrix diagram for prioritizing tool support needs in the Wikimedia ecosystem
2×2 matrix for prioritizing tool support needs, drafted by Andrew Lih and shared during the sustainable tool ecosystem management panel session.

Focus on equity 

Sustainability was indeed the main theme of the conference, but sustainability was discussed also in the context of a parallel initiative: Reimagining Wikidata from the margins [5]. This year, besides a focus on the technical, the new strategy documents and the conference as a whole had an explicitly social focus, too − acknowledging the various inequities endemic to all open movements that rely on contributions from volunteers with access to technical skills, digital literacy, financial means and leisure time, among other forms of social privilege. What this meant in practice was that the conference was co-organized in partnership with the Wiki Movimento Brasil and there were many sessions aimed explicitly at representation of a diversity of national, ethnic and linguistic backgrounds, for example:

These sessions aimed to amplify a plurality of voices traditionally marginalized by the domination of organisations and communities from (primarily) North America and Western Europe in the decision-making and data (re)use policies and practices around Wikidata and the Wikimedia movement in general. Crucially, the conference engaged with the question of equity beyond simply the issue of representation. The opening keynote ‘Decolonizing Wikidata: why does knowledge justice matter for structured data’ was delivered by Anasuya Sengupta, an Indian feminist activist, scholar, and long-time Wikimedian. Throughout the keynote and in subsequent sessions, Sengupta provided a nuanced analysis of the state of the Wikimedia movement, the call to decolonize, and the need to move away from universalizing ideas around what a global knowledge base should look like. A clear message throughout these thought-provoking sessions was the need to focus on decentralization, and to allow for an interlinked − but also non-universalizing − ecosystem of plural community knowledge bases and plural ontologies to be sustained.

The ideas of: 1) decentralization, 2) sustainability through broad community engagement, and 3) recognition of the importance of bringing together diverse perspectives to the movement as a whole, and the development of software tools like Wikidata and Wikibase in particular; were all highlighted throughout the second and third day of the conference with the community tracks spanning 10 different topics including: Sustainability, GLAM, Education and Science, and more [6].

Focus on Wikibase track

Of particular significance to our work at the Open Science Lab at TIB were the GLAM and Education and Science tracks, as well as the track dedicated to Wikibase. OSL’s researcher Lozana Rossenova, serving as Wikibase community manager for NFDI4Culture, was invited by Wikimedia Germany to co-curate and help facilitate the programme for the Wikibase track. The programme for this track provided an opportunity to learn more about the latest research-led and institutional projects featuring Wikibase; get inspiration from diverse use-cases; and learn more about latest developments in the tool ecosystem around Wikibase. The track featured an introduction to the Wikibase Stakeholder Group, a new cross-institutional effort – including TIB – which was established to secure further development and long-term sustainability of Wikibase and related extensions. Furthermore, a presentation by Adam Shorland (Tech Lead for Wikidata and Wikibase at Wikimedia Germany) and Sam Alipio (Product Manager for Wikibase Ecosystem at Wikimedia Germany) announced a new service launching in 2022 – wikibase.cloud, which will aim to fulfill the need to easily deploy and manage cloud-based services for independent Wikibase users. At TIB, we will be working closely with the team at Wikimedia Germany to evaluate how wikibase.cloud can help meet the needs of our research partners in ongoing programs at OSL and NFDI4Culture.

OSL team members participated in 3 presentations on the final day of the conference – Sunday, October 31st, in the context of the Wikibase and Education and Science tracks. Learn more about the presentations in the second part of this blog post.

 

Endnotes

[1] Source: https://meta.wikimedia.org/wiki/LinkedOpenData/Strategy2021/Wikidata

[2] Stats provided by Léa Lacroix, Community Engagement Coordinator at Wikimedia Germany.

[3] Source: https://meta.wikimedia.org/wiki/LinkedOpenData/Strategy2021/Wikidata

[4] Berthelot, Jean-Frédéric. 2021. “Where is the technical volunteer support in the Wikiverse?” Available from: https://commonists.wordpress.com/2021/10/29/where-is-the-technical-volunteer-support-in-the-wikiverse/

[5] Source: https://www.wikidata.org/wiki/Wikidata:Reimagining_Wikidata_from_the_margins

[6] Source: https://www.wikidata.org/wiki/Wikidata:WikidataCon_2021/Program/Day_2_and_3_-_Community_tracks

Dr. Lozana Rossenova ist Mitarbeiterin im Open Science Lab der TIB und arbeitet im Projekt NFDI4Culture in den Bereichen Datenanreicherung und Entwicklung von Wissensgraphen. // Dr Lozana Rossenova is currently based at the Open Science Lab at TIB, and works on the NFDI4Culture project, in the task areas for data enrichment and knowledge graph development.