Publication and Citation of Scientific Software with Persistent Identifiers
Software has become an integral part of science, yet software is not properly integrated into the scientific discourse. We will look at the requirements for software publication, code archives, and persistent identification of software. Are you interested in getting involved? Join us at one of these events.
The Missing Link
Findings presented in papers are based on data and once in a while they come along with data – but uncommonly with software. Since several years it is standard practice that data is professionally published in scientific context either together with papers or on its own. In contrary, this is not standard practice with the related software. But findings are not only based on raw data, they are also based on data obtained in analyses most likely supported by software. So the software used to gain findings play a crucial role in the scientific work. However, software is rarely seen publishable in terms of scientific publications although it is the link between the findings presented in papers and the data the findings are based on. Thus researchers may not reproduce the findings without the software which is in conflict with the principle of reproducibility in natural sciences. Although software is made available in various fashion, i.e. primarily promoted by solutions originated in the free and open source software movement, the provision lacks solutions serving researchers’ needs regarding software used in a scientific context. So the making available of software in kind of scientific publications would importantly fix the missing link between findings, data and software, and would foster their interplay.
Disciplinary journals require that articles discuss scientific problems. But software is often seen only as a contribution to the solution of a question or problem, and not as an independent contribution to science. This means that authors of software must first find a question to motivate the publication in a desired journal. A direct release of software in kind of scientific publications is not possible. Thus the scientific achievements of software and its contributions to sciences are poorly perceived and hardly measurable. The resulting gap in interdisciplinary communication regarding scientific software might be closed by software publications in new types of journals, by a common understanding of how to handle scientific software with defined processes, an by commonly accepted and adopted metrics . Thus software, which accounts for an increasingly prominent space in research and which has become an indispensable commodity especially in natural sciences, could be valued and assessed as a contribution to science.
Scientific software development often implies that the software and code is not written for others to use. So the code is kept and maintained on own computers and servers. If the code grows or groups work together code repositories and version control systems are set up. In many cases these systems then are only available for internal use. Due to lack of access by external users, they are usually not reachable from the outside. But reuse mainly happens informally or anonymously, even in sciences. Moreover, scientists use existing software and code, i.e., from open source software repositories, but only few contribute their code back into the repositories. However, for cooperation and reuse of software, there is already a number of software platforms such as SourceForge and GitHub, which are used already by scientists for the provision of scientific software code. Unfortunately, these platforms fulfil partly scientific needs to serve software and code in a scientific context as part of the scientific work and scientific tradition. It is unclear, if these platforms can be augmented for scientific purposes or whether special repositories must be created that foster not only open source but also open science. This includes that subsequent users are able to run the code, e.g. by the provision of sufficient documentation, sample data sets, tests and comments which in turn can be proven by adequate and qualified reviews. But this assumes that scientist learn to write and release code and software as they learn to write and publish papers.
For many programming scientists the treatment of source code, e.g. with code design, version control, documentation, and testing is associated with additional work that is not covered in the primary research task. But for the development of reusable software to safeguard traceability this scientific work has to be planned and supported accordingly. This includes the adoption of processes following the software development life cycle. Furthermore, the adoption of software engineering rules and best practices have to be recognized and accepted as part of the scientific performance. Most scientists have little incentive to improve code and do not publish code either with their papers or self-contained because software engineering habits are rarely practised by faculty and research facility staff, postdocs, doctoral and graduate students and thus undergraduate students. Software engineering skills are not passed on to followers as for paper writing skill. Thus it is often felt that the software or code produced is not publishable. But the quality of software and its source code has a decisive influence on the quality of research results obtained and their traceability. So establishing best practices from software engineering not only adopted but also adapted to serve scientific needs is crucial for the success of software publications.