Scientific software – there’s still a long way to go!
A post-conference report on the Second Conference on Non-Textual Information “Software and Services for Science (S3)” – 80 experts gathered in Hannover on 10 and 11 May 2017 to discuss the challenges of scientific software development, operation and reuse.
“Nowadays, texts, whether in printed or electronic form, are no longer the only source of knowledge and information. Non-textual materials, such as audiovisual media, research data and software, are steadily growing in importance in research and teaching,” stated Barbara Hartung from Lower Saxony Ministry for Science and Culture (MWK), Chair of the TIB Foundation Council, welcoming the 80 participants of the Second Conference on Non-Textual Information. Hartung expressed her hope that, over the next two days, the Leibnizhaus Conference Centre and Guest Residence in Hannover would become a place for talks and discussions about the challenges facing researchers and experts from infrastructure facilities such as libraries arising from this change. Scientific software was the focus of the conference under the title “Software and Services for Science (S3),” hosted jointly by the Technische Informationsbibliothek (TIB) – German National Library of Science and Technology and its partners ZB MED – Information Centre for Life Sciences and ZBW – Leibniz Information Centre for Economics on 10 and 11 May 2017.
“At this conference, we will be addressing topics such as the sustainability and referencing of scientific software, as well as trends in programming practice, legal aspects and software sharing,” stated Irina Sens, Interim Director of TIB, summarising the wide range of topics on the agenda. “Make use of the two days to engage in interesting discussions and intense exchange, whether during the breaks or at the evening get-together at the Old Town Hall in Hannover,” she urged the attendees. In addition to addressing the conference participants, Sens had another special guest to welcome at the start of the event: Sören Auer, who will become the new Director of TIB in July 2017. Auer, who currently works at the University of Bonn and at the Fraunhofer Institute for Intelligent Analysis and Information Systems IAIS, delivered a lecture on Big Data on the second day of the conference. “We are delighted that TIB will be able to work on new, innovative research topics such as data science, digital libraries and open knowledge with Professor Auer, as of July. There are a great many ideas for cooperation, for example with the L3S Research Centre,” declared Sens.
“The past few years show quite clearly that there is a development towards non-textual materials such as videos and research data in science,” said Wolfgang Nejdl, Director of the L3S Research Centre, which had provided support to TIB in the area of programme planning for the conference. For example, for years now, DataCite has been helping researchers to make their research data citable and permanently accessible by allocating Digital Object Identifiers (DOI), he explained. “One of the challenges facing libraries is to solve the issue of how to deal with scientific software in research,” stated Nejdl. The conference presentations then also centred around finding answers to questions concerning scientific software and presenting potential approaches for dealing with software in research – as well as identifying new challenges.
Sustainable infrastructure for software
Edzer Pebesma (University of Münster) opened the English-language conference with his keynote “Incentives and rewards in scientific software communities”. In his talk, he addressed the topic of incentives and reward systems for scientists involved in the development and provision of software, using the example of the programming language R, an open source programming language for statistical computing. The core set of packages of R can be expanded by additional packages for resolving specific statistical issues; there are currently more than 10,000 such packages that have been created by 8,000 different authors. Every time the programming language R is amended, however, authors must update their packages, which are freely available with metadata and dependencies in the R repository CRAN (Comprehensive R Archive Network) so as to ensure that the relevant packages are up-to-date. As a result, R gives academics the possibility to reuse software programs that have already been programmed by other developers for research purposes, and to use existing R packages. “The R community is a shining example of the sustainable infrastructure of a software program,” Pebesma concluded.
Research software from a scientific and legal perspective
In his lecture “What is good scientific practice for research software?”, Konrad U. Förstner (University of Würzburg; chair of the Alliance of Science Organisations ad-hoc working group Scientific Software) focused on the symbiosis of science and technology: of course, software is a tool in science, but it can also be a research result. Today, the strength and growth of research is essentially linked to software. It is therefore important to secure the quality, accessibility and citability of scientific software. Förstner urged: “Good scientific practice must also be applicable to scientific software.”
Nikolaus Forgó (Leibniz Universität Hannover) shed light on the legal aspects of software under the heading “Legal requirements for software sharing and collaborations”. His key area of emphasis was on the relevant paragraphs of the German Copyright Act (UrhG). He stressed that software is frequently developed collaboratively and that the aspect of authorship must be broached before entering into such collaborative activities in order to avoid legal disputes at a later stage. In this context, Forgó also advocated better communication and collaboration between legal experts and developers.
Sustainable infrastructures for scientific software
In his lecture “Managing research software from the perspective of a scientific infrastructure provider”, Timo Borst (ZBW – Leibniz Information Centre for Economics, Kiel) presented an infrastructure facility’s perspective of scientific software. Scientific software should be part of Open Science; how research software is dealt with will be a central topic for infrastructure facilities such as ZBW in the future. The question is only which aspects the management of research software will cover – examples could include the dissemination, acceptance or representation of different versions of scientific software.
In his presentation “Solid scenarios for sustainable software”, Patrick J. C. Aerts (Data Archiving and Networked Services (DANS); Netherlands eScience Center) emphasised the fact that software and research data ought to have the same status, since they are both various forms of scientific output. When dealing with scientific software, he believes that various aspects play a role: for example, it makes sense to consider which software programs are worth preserving and how software code can be maintained. He called for clear guidelines on the future handling and development of software code. In order to promote the sustainability of software, Aerts advocated the application of FAIR principles to scientific software – it should be findable, accessible, interoperable and reusable.
Sustainable access to scientific software
Daniel S. Katz (University of Illinois Urbana-Champaign, USA) gave a talk on “Software citation: a cornerstone of software-enabled research”, in which he explored the issue of how to cite scientific software. Within a FORCE11 work group, rules were drawn up on the citation of software. One of the aims of these citation guidelines is to improve the recognition of software as a research result across all disciplines, as well as associated software citations – ideally using Digital Object Identifiers (DOI). Many unresolved issues need to be discussed in this connection, such as how to deal with different versions of software programs.
In his presentation “Workflows for assigning and tracking DOIs for scientific software”, Martin Fenner (DataCite, Hannover) described the tasks yet to be resolved in connection with scientific software: besides the fundamental question of how to define scientific software, there are also issues concerning missing metadata, a lack of archives, and software programs having different versions. He advocated allocating a Digital Object Identifier to software, which would enable the software and any digital objects linked to it (for example, publications and contributor identifications such as ORCID) to be easier to research and find.
The presentation “Software as a first-class citizen in web archives” by Helge Holzmann (L3S Research Centre, Hannover) involved how to make accessible the countless types of information that are stored in web archives nowadays. One method is the Wayback Machine, which can be used to retrieve different versions of websites. However, it would be desirable not to have URLs with timestamps, but the objects themselves – such as scientific software. The “Tempas TimePortal”, a web service developed in the context of the Scientific Information Service Mathematics, offers users the option of displaying the status of a software program at a certain point in time – such as the date on which a scientific article was published – in the past.
After numerous informative talks, the first day of the conference finally came to a close with a get-together at the Old Town Hall in Hannover. In the course of the evening, participants had the opportunity to network, making the most of the ample time available to discuss the day’s presentations in a historic environment.
Impact and sustainability thanks to accessible, reusable and open software
“BigDataEurope – The collaborative creation of an open software platform for researchers addressing Europe’s societal challenges” was the title of the presentation given by Sören Auer (University of Bonn; Fraunhofer Institute for Intelligent Analysis and Information Systems IAIS). Starting off the second day of the conference, Auer’s talk centred around the management and analysis of Big Data, i.e. large, complex volumes of data. Big Data opens up numerous possibilities in areas such as healthcare, energy and transport, as well as concerning climate change – such possibilities may help us tackle societal challenges. The problem is the difficulty often involved in integrating Big Data in the aforementioned areas. This is where the collaborative platform “BigDataEurope” comes into play: the aim of this platform, led by Fraunhofer IAIS, is to provide support to European companies and institutions of these communities in using and working with Big Data; to show how Big Data can be exploited for their purposes; to make it easier to access Big Data applications and to integrate them into existing work processes, promoting the more widespread use of Big Data in different areas.
In his presentation “Software sustainability – guidelines for the selfish scientist”, Neil Chue Hong (Software Sustainability Institute, UK) stated that researchers often spare no thought about how they deal with software, and do not consider themselves programmers. He called for a rethink towards greater recognition of software sharing and for an improvement in programming skills among researchers. After all: “Better software = better research”.
In his talk “How to tidy up the jungle of mathematical models. A prerequisite for sustainable research software”, Thomas Koprucki (Weierstrass Institute for Applied Analysis and Stochastics – WIAS, Berlin) explored the topic of mathematical modelling and simulation (MMS), which is fundamental to scientific work in many specialist areas nowadays. He stated that mathematical modelling and simulation are just as much a part of a scholar’s research results as software and research data, with the result that there is also need for an infrastructure in this area to improve the sustainability of MMS.
The lecture “Jupyter and IPython facilitating open access and reproducible research” by Benjamin Ragan-Kelley (Simula Research Laboratory; Jupyter, Norway) focused on the open source web application “Jupyter Notebook”, which is used to create and share documents interactively. Such documents could contain code, or else text, visualisations and calculations. Besides its interactive aspects, what also makes Jupyter special is the fact that the notebook documents store representations of all content, also including, for example, entries of calculations or explanatory text, enabling the scientific approach to be reproduced.
“Blockchain for science and knowledge creation: An intro and overview” was the title of the presentation given by Sönke Bartling (Alexander von Humboldt Institute for Internet and Society (HIIG), Berlin), which explored how blockchain technology can be used as a decentralised and transparent data register for Open Science. In addition to the primary task of collecting data, all other steps in the research cycle (for instance, data analysis and its clear identification) could be undertaken in a blockchain system. In this way, for example, data that is processed could be clearly assignable by means of a timestamp. In addition, the blockchain environment facilitates the provision of computer programs, known as smart contracts, that can perform defined applications – thanks to decentralised access to data sources, they also enable sensitive parts of data relevant to personal protection laws to be assessed without creating a link to a person, for example.
James Littlejohn (Edinburgh Napier University, UK) addressed the three pillars of blockchain – cryptography, efficiency and behaviour – in his presentation “Dsensor.org peer to peer science”. He demonstrated how blockchain technology can be used to make academic research open and transparent (“to keep science honest”). Using the example of a live demonstration for pharmaceutical products, he connected a software program for treating excess cholesterol to the corresponding database within a blockchain environment. He showed that blockchain technology may help scholars to make academic research issues more comprehensible, enabling blockchains to be used to address the problem of the lack of reproducibility of many scientific issues.
Scientific software – there’s still a long way to go
The conference offered a forum where scientists and experts from infrastructure facilities could share information about requirements, expectations and their needs concerning scientific software in practice. The presentations on the various aspects of scientific software clearly showed that a lot of work still has to be done in this area: e.g. establishing rules for dealing with scientific software; creating infrastructures for storing software; clarifying copyright issues regarding the development of software; recognising software code as scientific output; and using blockchain technology in science. “Non-textual materials such as software play an important role in science; the topic has by no means been explored comprehensively,” exclaimed Irina Sens in her closing statement on the second day of the conference. “A number of issues could be discussed at a Third Conference on Non-Textual Information,” she stated, announcing the possibility of a continuation of the series of conferences.
The lectures delivered at the conference are available in TIB’s AV-Portal.
More Information about the conference: www.nontextualinformation2017.de.