Ontologies and Continuous Integration
February 16, 2012 4 Comments
Wikipedia describes http://en.wikipedia.org/wiki/Continuous_integration as follows:
In software engineering, continuous integration (CI) implements continuous processes of applying quality control — small pieces of effort, applied frequently. Continuous integration aims to improve the quality of software, and to reduce the time taken to deliver it, by replacing the traditional practice of applying quality control after completing all development.
This description could – or should – apply equally well to ontology engineering, especially in contexts such as the OBO Foundry, where ontologies are becoming increasingly interdependent.
Jenkins is a web based environment for running integration checks. Sebastian Bauer, in Peter Robinson’s group had the idea of adapting Jenkins for performing ontology builds rather than software builds (in fact he used Hudson, but the differences between Hudson and Jenkins are minimal). He used OORT as the tool to build the ontology — Oort takes in one or more ontologies in obo or owl, runs some non-logical and logical checks (via your choice of reasoner) and then “compiles” downstream ontologies in obo and owl formats. Converting to obo takes care of a number of stylistic checks that are non-logical and wouldn’t be caught by a reasoner (e.g. no class can have more than one text definition).
We took this idea and built our own Jenkins ontology build environment, adding ontologies that were of relevance to the projects we were working on. This turned out to be extraordinarily easy – Jenkins is very easy to install and configure, help is always just a mouse click away.
Here’s a screenshot of the main Jenkins dashboard. Ontologies have a blue ball if the last build was successful, a red ball otherwise. The weather icon is based on the “outlook” – lots of successful builds in a row gets a sunny icon. Every time an ontology is committed to a repository it triggers a build (we try and track the edit version of the ontology rather than the release version, so that we can provide direct feedback to the ontology developer). Each job can be customized – for example, if ontology A depends on ontology B, you might want to trigger a build of A whenever a new version of B is committed, allowing you to be forewarned if something in B breaks A.
Below is a screenshot for the configuration settings for the go-taxon build – this is used to check if there are violations on the GO taxon constraints (dx.doi.org/10.1016/j.jbi.2010.02.002). We also include an external ontology of disjointness axioms (for various reasons its hard to include this in the main GO ontology). You can include any shell commands you like – in principle it would be possible to write a jenkins plugin for building ontologies using Oort, but for now you have to be semi-familiar with the command line and the Oort command line options:
Often when a job fails the Oort output can be a little cryptic – generally the protocol is to do detailed investigation using Protege and a reasoner like HermiT to track down the exact problem.
The basic idea is very simple, but works extremely well in practice. Whilst it’s generally better to have all checks performed directly in the editing environment, this isn’t always possible where multiple interdependent ontologies are concerned. The Jenkins environment we’ve built has proven popular with ontology developers, and we’d be happy to add more ontologies to it. It’s also fairly easy to set up yourself, and I’d recommend doing this for groups developing or using ontologies in a mission crticial way.
I uploaded some slides on ontologies and continuous integration to slideshare.
The article Continuous Integration of Open Biological Ontology Libraries is available on the Bio-Ontologies SIG KBlog site.