New version of Ontology Development Kit – now with Docker support

This is an update to a previous post, creating an ontology project.

Version 1.1.2 of the ODK is available on GitHub.

The Ontology Development Kit (ODK; formerly ontology-starter-kit) provides a way of creating an ontology project ready for pushing to GitHub, with a number of features in place:

  • A Makefile that specifies your release workflow, including building imports, creating reports and running tests
  • Continuous integration: A .travis.yml file that configures Travis-CI to check any Pull Requests using ROBOT
  • A standard directory layout that makes working with different projects easier and more predictable
  • Standardized documentation and additional file artifacts
  • A procedure for releasing your ontologies using the GitHub release mechanism

The overall aim is to borrow as much from modern software engineering practice as possible and apply to the ontology development lifecycle.

The basic idea is fairly simple: a template folder contains a canonical repository layout, this is copied into a target area, with template variables substituted for user-supplied ones.

Some recent improvements include:

I will focus here on the adoption of Docker within the ODK. Most users of the ODK don’t really need to know much about Docker – just that they have to install it, and it runs their ontology workflow inside a container. This has multiple advantages – ontology developers don’t need to install a suite of semi-independent tools, and execution of workflows becomes more predictable and easier to debug, since the environment is standardized. I will provide a bit more detail here for people who are interested.

What is Docker?

From Wikipedia: Docker is a program that performs operating-system-level virtualization also known as containerizationDocker can run containers on your machine, where each container bundles its own tools and environments.
Docker architecture

Docker containers: from Docker 101

A common use case for Docker is deploying services. In this particular case we’re not deploying a service but are instead using Docker as a means of providing and controlling a standard environment with various command line tools.

The ODK Docker container

The ODK docker container, known as odkfull is available from obolibrary organization on Dockerhub. It comes bundled with a number of tools, as of the latest release:

  • A standard unix environment, including GNU Make
  • ROBOT v1.1.0  (Java)
  • Dead Simple OWL Design Patterns (DOSDP) tools v0.9.0 (Scala)
  • Associated python tooling for DOSDPs (Python3)
  • OWLTools (for older workflows) (Java)
  • The odk seed script (perl)

There are a few different scenarios in which an odkfull container is executed

  • As a one-time run when setting up a repository using seed-via-docker.sh (which wraps a script that does the actual work)
  • After initial setup and pushing to GitHub, ontology developers may wish to execute parts of the workflow locally – for example, extracting an import module after adding new seeds for external ontology classes
  • Travis-CI uses the same container used by ontology developers
  • Embedding within a larger pipeline

Setting up a repo

Typing

./seed-via-docker.sh

Will initiate the process of making a new repo, depositing the results in the target/ folder. This is all done within a container. The seed process will generate a workflow in the form of a Makefile, and then run that workflow, all in the container. The final step of pushing the repo to GitHub is currently done by the user directly in their own environment, rather than from within the container.

Running parts of the workflow

Note that repos built from the odk will include a one-line script in the src/ontology folder* called “run.sh”. This is a simple wrapper for running the docker container. (if you built your repo from an earlier odk, then you can simply copy this script).

Now, instead of typing

make test

The ontology developer can now type

./run.sh make test

The former requires the user has all the relevant tooling installed (which at least requires Xcode on OS-X, which not all curators have). The latter will only require Docker.

Travis execution

Note that the .travis.yml file generated will be configured to run the travis job in an odkfull container. If you generated your repo using an earlier odk, you can manually adapt your existing travis file.

Is this the right approach?

Docker may seem like quite heavyweight for something like running an ontology pipeline. Before deciding on this path, we did some tests on some volunteers in GO who were not familiar with Docker. These editors had a need to rebuild import files frequently, and having everyone install their own tools has not worked out so well in the past. Preliminary results seem to indicate the editors are happy with this approach.

It may be the case that in future more can be triggered directly from within Protege. Or some ontology environments such as Tawny-OWL are powerful enough to do everything from one tool chain.  But for now the reality is that many ontology workflows require a fairly heterogeneous set of tools to operate, and there is frequently a need to use multiple languages, which complicates the install environment. Docker provides a nice way to unify this.

We’ll put this into practice at ICBO this week, in the Phenotype Ontology and OBO workshops.

Acknowledgments

Thanks to the many developers and testers: David Osumi-Sutherland, Nico Matentzoglu, Jim Balhoff, Eric Douglass, Marie-Angelique Laporte, Rebecca Tauber, James Overton, Nicole Vasilevsky, Pier Luigi Buttigieg, Kim Rutherford, Sofia Robb, Damion Dooley, Citlalli Mejía Almonte, Melissa Haendel, David Hill, Matthew Lange.

More help and testers wanted! See: https://github.com/INCATools/ontology-development-kit/issues

Footnotes

* Footnote: the Makefile has always lived in the src/ontology folder, but the build process requires the whole repo, so the run.sh wrapper maps two levels up. It looks a little odd, but it works. In future if there is demand we may switch the Makefile to being in the root folder.

Advertisements

Debugging Ontologies using OWL Reasoning. Part 1: Basics and Disjoint Classes axioms

This is the first part in a series on pragmatic techniques for debugging ontologies. See also part 2

All software developers are familiar with the concept of debugging, a process for finding faults in a program. The term ‘bug’ has been used in engineering since the 19th century, and was used by Grace Hopper to describe a literal bug gumming up the works of the Mark II computer. Since then, debugging and debugging tools have become ubiquitous in computing, and the modern software developer is fortunate enough to have a large array of tools and techniques at their disposal. These include unit tests, assertions and interactive debuggers.

original bug
The original bug

Ontology development has many parallels with software development, so it’s reasonable to assume that debugging techniques from software can be carried over to ontologies. I’ve previously written about use of continuous integration in ontology development, and it is now standard to use Travis to check pull requests on ontologies. Of course, there are important differences between software and ontology development. Unlike typical computer programs, ontologies are not executed, so the concept of an interactive debugger stepping through an execution sequence doesn’t quite translate to ontologies. However, there are still a wealth of tooling options for ontology developers, many of which are under-used.

There is a great deal of excellent academic material on the topic of ontology debugging; see for example the 2013 and 2014 proceedings of the excellently named Workshop on Debugging Ontologies and Ontology Mappings (WoDOOM), or the seminal Debugging OWL Ontologies. However, many ontology developers may not be aware of some of the more basic ‘blue collar’ techniques in use for ontology debugging.

Using OWL Reasoning and disjointness axioms to debug ontologies

In my own experience one of the most effective means of finding problems in ontologies is through the use of OWL reasoning. Reasoning is frequently used for automated classification, and this is supported in tools such as ROBOT through the reason command. In addition to classification, reasoning can also be used to debug an ontology, usually by inferring if the ontology is incoherent. The term ‘incoherent’ isn’t a value judgment here; it’s a technical term for an ontology that is either inconsistent or contains unsatisfiable classes, as described in this article by Robert Stevens, Uli Sattler and Phillip Lord.

A reasoner will not find bugs without some help from you, the ontology developer.

Screen Shot 2018-08-02 at 5.22.05 PM

You have to impart some of your own knowledge of the domain into the ontology in order for incoherency to be detected. This is usually done by adding axioms that constrain the space of what is possible. The ontogenesis article has a nice example using red blood cells and the ‘only’ construct. I will give another example using the DisjointClasses axiom type; in my experience, working on large inter-related ontologies disjointness axioms are one of the most effective ways of finding bugs (and has the added advantage of being within the profile of OWL understood by Elk).

Let’s take the following example, a slice of an anatomical ontology dealing with cranial nerves. The underlying challenge here is the fact that the second cranial nerve (the optic nerve) is not in fact a nerve as it is part of the central nervous system (CNS), whereas true nerves as part of the peripheral nervous system (PNS). This seeming inconsistency has plagued different anatomy ontologies.

Ontology: <http://example.org>
Prefix: : <http://example.org/>
ObjectProperty: part_of
Class: CNS
Class: PNS
Class: StructureOfPNS EquivalentTo: part_of some PNS
Class: StructureOfCNS EquivalentTo: part_of some CNS
DisjointClasses: StructureOfPNS, StructureOfCNS
Class: Nerve SubClassOf: part_of some PNS
Class: CranialNerve SubClassOf: Nerve
Class: CranialNerveII SubClassOf: CranialNerve, part_of some CNS

cns-pns-disjoint

You may have noted this example uses slightly artificial classes of the form “Structure of X”. These are not strictly necessary, we’ll return to this when we discuss Generic Class Inclusion (GCI) axioms in a future part.

If we load this into Protege, switch on the reasoner, we will see that CranialNerveII shows up red, indicating it is unsatisfiable, rendering the ontology incoherent. We can easily find all unsatisfiable classes under the ‘Nothing’ builtin class on the inferred hierarchy view. Clicking on the ‘?’ button will make Protege show an explanation, such as the following:

Screen Shot 2018-08-02 at 5.28.59 PM

This shows all the axioms that lead to the conclusion that CranialNerveII is unsatisfiable. At least one of these axioms must be wrong (for example, the assumption that all cranial nerves are nerves may be terminologically justified, but could be wrong here; or perhaps it is the assumption that CN II is actually a cranial nerve; or we may simply want to relax the constraint and allow spatial overlap between peripheral and central nervous system parts). The ontology developer can then set about fixing the ontology until it is coherent.

Detecting incoherencies as part of a workflow

Protege provides a nice way of finding ontology incoherencies, and of debugging them by examining explanations. However, it is still possible to accidentally release an incoherent ontology, since the ontology editor is not compelled to check for unsatisfiabilities in Protege prior to saving. It may even be possible for an incoherency to be inadvertently introduced through changes to an upstream dependency, for example, by rebuilding an import module.

Luckily, if you are using ROBOT to manage your release process, then it should be all but impossible for you to accidentally release an incoherent ontology. This is because the robot reason command will throw an error if the ontology is incoherent. If you are using robot as part of a Makefile-based workflow (as configured by the ontology starter kit) then this will block progression to the next step, as ROBOT returns with a non-zero exit code when performing a reasoner operation on an incoherent ontology. Similarly, if you are using Travis-CI to vet pull requests or check the current ontology state, then the travis build will automatically fail if an incoherency is encountered.

robot-workflow

ROBOT reason flow diagram. Exiting with system code 0 indicates success, non-zero failure.

Running robot reason on our example ontology yields:

$ robot reason -r ELK -i cranial.omn
ERROR org.obolibrary.robot.ReasonerHelper - There are 1 unsatisfiable classes in the ontology.
ERROR org.obolibrary.robot.ReasonerHelper -     unsatisfiable: http://example.org/CranialNerveII

Generating debug modules – incoherent SLME

Large ontologies can strain the limits of the laptop computers usually used to develop ontologies. It can be useful to make something analogous to a ‘core dump’ in software debugging — a standalone minimal component that can be used to reproduce the bug. This is a module extract (using a normal technique like SLME) seeded by all unsatisfiable classes (there may be multiple). This provides sufficient axioms to generate all explanations, plus additional context.

I use the term ‘unsatisfiable module’ for this artefact. This can be done using the robot reason command with the “–debug-unsatisfiable” option.

In our Makefiles we often have a target like this:

debug.owl: my-ont.owl
        robot reason -i  $< -r ELK -D $@

If the ontology is incoherent then “make debug.owl” will make a small-ish standalone file that can be easily shared and quickly loaded in Protege for debugging. The ontology will be self-contained with no imports – however, if the axioms come from different ontologies in an import chain, then each axiom will be annotated with the source ontology, making it easier for you to track down the problematic import. This can be very useful for large ontologies with multiple dependencies, where there may be different versions of the same ontology in different import chains. 

Coming up

The next article will deal with the case of detecting unwanted equivalence axioms in ontologies, and future articles in the series will deal with practical tips on how best to use disjointness axioms and other constraints in your ontologies.

Carry on reading: Part 2, Unintentional Entailed Equivalence

Further Reading

Acknowledgments

Thanks to Nico Matentzoglu for comments on a draft of this post.