This is an update to a previous post, creating an ontology project.
Version 1.1.2 of the ODK is available on GitHub.
The Ontology Development Kit (ODK; formerly ontology-starter-kit) provides a way of creating an ontology project ready for pushing to GitHub, with a number of features in place:
- A Makefile that specifies your release workflow, including building imports, creating reports and running tests
- Continuous integration: A .travis.yml file that configures Travis-CI to check any Pull Requests using ROBOT
- A standard directory layout that makes working with different projects easier and more predictable
- Standardized documentation and additional file artifacts
- A procedure for releasing your ontologies using the GitHub release mechanism
The overall aim is to borrow as much from modern software engineering practice as possible and apply to the ontology development lifecycle.
The basic idea is fairly simple: a template folder contains a canonical repository layout, this is copied into a target area, with template variables substituted for user-supplied ones.
Some recent improvements include:
- Upgrade to the new ROBOT v1.1.0
- Use of Docker
- Inclusion of Design Pattern templates
- Standard set of SPARQL queries
- Minor feature improvements such as interactive mode
I will focus here on the adoption of Docker within the ODK. Most users of the ODK don’t really need to know much about Docker – just that they have to install it, and it runs their ontology workflow inside a container. This has multiple advantages – ontology developers don’t need to install a suite of semi-independent tools, and execution of workflows becomes more predictable and easier to debug, since the environment is standardized. I will provide a bit more detail here for people who are interested.
What is Docker?
From Wikipedia: Docker is a program that performs operating-system-level virtualization also known as containerization. Docker can run containers on your machine, where each container bundles its own tools and environments.
Docker containers: from Docker 101
A common use case for Docker is deploying services. In this particular case we’re not deploying a service but are instead using Docker as a means of providing and controlling a standard environment with various command line tools.
The ODK Docker container
- A standard unix environment, including GNU Make
- ROBOT v1.1.0 (Java)
- Dead Simple OWL Design Patterns (DOSDP) tools v0.9.0 (Scala)
- Associated python tooling for DOSDPs (Python3)
- OWLTools (for older workflows) (Java)
- The odk seed script (perl)
There are a few different scenarios in which an odkfull container is executed
- As a one-time run when setting up a repository using seed-via-docker.sh (which wraps a script that does the actual work)
- After initial setup and pushing to GitHub, ontology developers may wish to execute parts of the workflow locally – for example, extracting an import module after adding new seeds for external ontology classes
- Travis-CI uses the same container used by ontology developers
- Embedding within a larger pipeline
Setting up a repo
Will initiate the process of making a new repo, depositing the results in the target/ folder. This is all done within a container. The seed process will generate a workflow in the form of a Makefile, and then run that workflow, all in the container. The final step of pushing the repo to GitHub is currently done by the user directly in their own environment, rather than from within the container.
Running parts of the workflow
Note that repos built from the odk will include a one-line script in the src/ontology folder* called “run.sh”. This is a simple wrapper for running the docker container. (if you built your repo from an earlier odk, then you can simply copy this script).
Now, instead of typing
The ontology developer can now type
./run.sh make test
The former requires the user has all the relevant tooling installed (which at least requires Xcode on OS-X, which not all curators have). The latter will only require Docker.
Note that the .travis.yml file generated will be configured to run the travis job in an odkfull container. If you generated your repo using an earlier odk, you can manually adapt your existing travis file.
Is this the right approach?
Docker may seem like quite heavyweight for something like running an ontology pipeline. Before deciding on this path, we did some tests on some volunteers in GO who were not familiar with Docker. These editors had a need to rebuild import files frequently, and having everyone install their own tools has not worked out so well in the past. Preliminary results seem to indicate the editors are happy with this approach.
It may be the case that in future more can be triggered directly from within Protege. Or some ontology environments such as Tawny-OWL are powerful enough to do everything from one tool chain. But for now the reality is that many ontology workflows require a fairly heterogeneous set of tools to operate, and there is frequently a need to use multiple languages, which complicates the install environment. Docker provides a nice way to unify this.
We’ll put this into practice at ICBO this week, in the Phenotype Ontology and OBO workshops.
Thanks to the many developers and testers: David Osumi-Sutherland, Nico Matentzoglu, Jim Balhoff, Eric Douglass, Marie-Angelique Laporte, Rebecca Tauber, James Overton, Nicole Vasilevsky, Pier Luigi Buttigieg, Kim Rutherford, Sofia Robb, Damion Dooley, Citlalli Mejía Almonte, Melissa Haendel, David Hill, Matthew Lange.
More help and testers wanted! See: https://github.com/INCATools/ontology-development-kit/issues
* Footnote: the Makefile has always lived in the src/ontology folder, but the build process requires the whole repo, so the run.sh wrapper maps two levels up. It looks a little odd, but it works. In future if there is demand we may switch the Makefile to being in the root folder.