This year GitHub, FigShare and Mozilla ScienceLab introduced the minting of DOIs for software. It’s worth to mention that Zenodo is part of the game with less marketing rumble. Probably the latest blog post from Arfon Smith (GitHub) will boost the number of software publications so that software may become more citable, thus may become part of the scientific tradition and may address open reproducibility issues.
So it’s one of the best things this year that key players like GitHub, FigShare, Mozilla ScienceLab, and Zenodo allow the labelling of software with DOIs. The recognition of this topic gets a different dimension. That’s great!
However, interests matter, especially commercial interests. GitHub has recognized that the amount of scientific software hosted on GitHub has reached a critical level. So researchers are a target group that may become valuable customers, if not already. So Arfon Smith, known for co-founding Zooniverse and thus known to researchers and to the interested public alike, is taking care of GitHub’s fitness for the scientific community. This happens in alliance with other key players, FigShare and Mozilla ScienceLab, while making decent but recognized marketing in the scientific community. Obviously there is no reason to complain since commercial interests support sciences with various business models already. In this case GitHub and FigShare / Zenodo should be seen as a new type of publisher concentrating on other types of publishable material different from plain text and reviewed papers. This type of business is welcome if it supports the scientific community. Time will tell if these commercial interests address researchers’ needs and provide solutions for current problems in sciences. Open Access journals and new publishing approaches as practices by PLOS and PeerJ have come to life due to the problems encountered in the past. We shouldn’t repeat this hurtful and long-winded process by cementing half-baked solutions for the publication of software.
So it’s time to mention that the current solution has drawbacks which may become serious problems if not solved in the due course.
1. GitHub sticks DOIs to code copies via FigShare or Zenodo. GitHub should do this by itself. So why is GitHub doing this via FigShare and Zenodo? Eventually GitHub wants to save some money for minting DOIs. But more importantly GitHub gets rid of responsibilities associated with DOIs in the scientific world. Thus they are allowed to act freely without any commitments to the scientific community. This leads to serious problems, at least for researchers sooner or later. Furthermore, DOIs should refer to code freezes and revisions in a completely different way, other than just pointing to detached copies in FigShare and Zenodo so far being dead-ends with no way out.
2. GitHub, FigShare, and Zenodo stick DOIs to code copies. Again, plain code copies! It would be no difference to zip code from repositories or from file systems to publish it then as ‘data’ with a DOI. There is absolutely no difference to a static data publication just because it is labelled with ‘software’.
3. The connection between frozen code copies and the further developed software isn’t addressed sufficiently It’s important to find the way back to the ‘original’ that has been copied and then further to new versions or eventually forks, branches, etc. Probably this could be addressed much better if the DOI would be minted to the original revision instead to a copy. However, for the current code copy solution this could be addressed with proper metadata and linking. This is done insufficiently so far, e.g. by linking the main repository without version information.
4. DOIs on software imply valuable publications. This isn’t the case here. A DOI is just an identifier enabling citability. The quality issue is not addressed so far.
5. This means, that the topic of software publications in terms of valuable publications is not addressed either. But especially this topic costs a lot of efforts. It’s about breaking with traditional processes and expanding them, so that it’s possible to publish software sufficiently and seriously in scientific context. Minting a DOI to a code copy without any quality control is not a serious publication.
6. The software or code is not citable. Check samples, e.g. Dynsim, Nimbus, Scythe, and ask yourself if the code has been copied directly from GitHub to mint a DOI. Then check the ‘Cite this’ and ‘Export’ sections what information is provided at all. You may even use DataCite’s crossref to check if the software is citable. You may recognize that the version or revision of the software is missing along with other information. Using these snippets for citations of these kinds in text papers will produce problems most likely.
7. It isn’t clear which metadata is set automatically. Publishers and library experts of the DOI world should check what information should be set automatically and what manually. Especially which fields should be set in addition to the limited fields already offered. In addition, a comprehensive guideline, clear recommendations, or best practices would help so that each DOI-fication of code can be processed with guidance. Normally DOIs are minted by publishers so that submitters don’t care much about the DOI world and library aspects. This is different in the process of transferring GitHub repository copies to FigShare or Zenodo. The metadata is curated by the submitter itself or automatically.
8. Without proper metadata for detached copies – with almost no connection to its living original and not embedded in its ecosystem – software can’t cite software. Referring to third-party libraries and mentioning dependencies used to enable others to run the code is just half of the game. This information has an indispensable value so that not only citations in papers are counted but also how often software is used by other software. Just check CRAN with the information on reverse dependencies, imports and linkings allowing to understand quickly which packages are used quite often in other packages and thus are of importance for the specific package in its ecosystem. Debian provides this information in RDF along with packages. So citation of software and other citable items in software may lead to important metrics in the future.
9. As mentioned before, metrics aren’t addressed properly, e.g. how software is used by other software and referenced in papers. However, it’s not clear which metrics may be used for evaluation purposes. But available solutions and new ideas just wait for their implementation to gain experiences in the field of software publications. Thus software could be acknowledged in researchers’ publication lists similar to high-ranking paper publications.
10. Software DOIs are not used to their full potential. DOIs follow a well defined format. This format just has to be extended a bit to follow the footprints of software and its tree of life, e.g. 10.6084/figshare.fidgit.0-0-3 or 10.4321/98765.fidgit.0-0-4. Citations in other software and in papers could be looked up by simple searches with wildcards to gain detailed insights.
Keep it up!
However, in summary the work done so far by GitHub, FigShare, Mozilla ScienceLab, and Zenodo is awesome! It has been an important and overdue step. But that’s just the first step on a long way to go. The current state is half-baked and shouldn’t be cemented. Please keep it up!
Also, you may cross-check how others perform in minting DOIs for software, e.g. Purdue’s HUBzero based nanoHUB minting DOIs for software since a while.
Special thanks go to Kaitlin’s and Arfon’s input at the BD2K and CW14 workshops and to Jure’s inspiring post.