Reusability
Note
These standards are designed to facilitate reuse of model code which in principle supports reproducibility claims and verification of model results. Comments and suggestions are welcomed, and will be carefully considered by the OMF Working Groups and Membership. The standards goals and minimum implementation standards aim to capture concerns and practices among the members of OMF. Individual application domains may extend these standards to capture additional context relevant to their domain.Overview of Reusability Standards
In this document we adopt the reuse terminology defined in the FAIR Principles for Research Software.
Reusability implicitly includes usability and focuses on the ability of humans and machines to execute, inspect, and understand the software so that it can be modified, built upon, or incorporated into other software.
Goals for Reusability Standards
A reusable computational model can be executed, understood, modified, built upon, or incorporated into other software.
Minimal Reusability Standards
A minimal set of guidelines that can be adopted by journals to ensure that submitted publications meet baseline reproducibility and reusability requirements.
Reusable computational models must:
- meet OMF minimal standards for Accessibility and Documentation
- have a clear and accessible open source, OSI approved license
- include detailed metadata that facilitate reuse (e.g., input and output semantics, data types, units)
- include detailed provenance on authorship and contributions
- provide qualified information on all software and system dependencies with versions (operating system, software and system libraries)
- provide clear instructions on how to execute the software
Ideal Reusability Standards
In order to meet the ideal standards, computational models should:
- favor open file formats for data inputs and outputs (e.g., CSV, netCDF, geoJSON, Parquet, Feather)
- provide durable containerization recipes (i.e., archival quality container images)
- include relevant output analyses, data pipelines, and/or workflows
- include metadata on related research outputs (publications, other software, relationship)
- use continuous integration services that run automated tests on the software
- for software with large compute or data requirements, representative input data samples along with sampling methodology
- provide additional community established domain specific standards
Cyberinfrastructure and Tools to Support Reusability Standards
Build Docker images from research code:
- stencila/dockta https://github.com/stencila/dockta
- ReproZip https://www.reprozip.org/
- SciUnit https://github.com/scidash/sciunit
- binder https://mybinder.org/
- repo2docker https://repo2docker.readthedocs.io (used by binder)
Computational Archives:
OMF may consider developing scaffolding for common modeling frameworks that reduce friction of adoption
- examples: https://github.com/uwescience/shablona and https://github.com/geodynamics/software_template
- GitHub bot that can help improve compliance with minimal / ideal standards
- cookiecutter project structure that supports best practices for reproducibility and reusability (e.g., Cookiecutter Data Science)
Examples and References
- Lorena Barba’s reproducible workflow for computational fluid dynamics https://github.com/barbagroup/cloud-repro
- https://carpentries-incubator.github.io/good-enough-practices/
- http://www.practicereproducibleresearch.org/
- Software Deposit Guidelines from SSI
- Proposed Standards for Peer-Reviewed Publication of Computer Code
- TODO: find or build example codebases that meet minimal and ideal standards
Issues / Errata
Dependencies on commercial / closed source products are fine so long as they are clearly qualified with version and operating system e.g., MATLAB R2016b (Windows 10), AnyLogic 8.7 (Windows 10), ArcGIS 10.8.1 (macOS 10.15), NetLogo 6.2.1 (Ubuntu 20.04LTS)