New version of Ontology Development Kit – now with Docker support

This is an update to a previous post, creating an ontology project.

Version 1.1.2 of the ODK is available on GitHub.

The Ontology Development Kit (ODK; formerly ontology-starter-kit) provides a way of creating an ontology project ready for pushing to GitHub, with a number of features in place:

  • A Makefile that specifies your release workflow, including building imports, creating reports and running tests
  • Continuous integration: A .travis.yml file that configures Travis-CI to check any Pull Requests using ROBOT
  • A standard directory layout that makes working with different projects easier and more predictable
  • Standardized documentation and additional file artifacts
  • A procedure for releasing your ontologies using the GitHub release mechanism

The overall aim is to borrow as much from modern software engineering practice as possible and apply to the ontology development lifecycle.

The basic idea is fairly simple: a template folder contains a canonical repository layout, this is copied into a target area, with template variables substituted for user-supplied ones.

Some recent improvements include:

I will focus here on the adoption of Docker within the ODK. Most users of the ODK don’t really need to know much about Docker – just that they have to install it, and it runs their ontology workflow inside a container. This has multiple advantages – ontology developers don’t need to install a suite of semi-independent tools, and execution of workflows becomes more predictable and easier to debug, since the environment is standardized. I will provide a bit more detail here for people who are interested.

What is Docker?

From Wikipedia: Docker is a program that performs operating-system-level virtualization also known as containerizationDocker can run containers on your machine, where each container bundles its own tools and environments.
Docker architecture

Docker containers: from Docker 101

A common use case for Docker is deploying services. In this particular case we’re not deploying a service but are instead using Docker as a means of providing and controlling a standard environment with various command line tools.

The ODK Docker container

The ODK docker container, known as odkfull is available from obolibrary organization on Dockerhub. It comes bundled with a number of tools, as of the latest release:

  • A standard unix environment, including GNU Make
  • ROBOT v1.1.0  (Java)
  • Dead Simple OWL Design Patterns (DOSDP) tools v0.9.0 (Scala)
  • Associated python tooling for DOSDPs (Python3)
  • OWLTools (for older workflows) (Java)
  • The odk seed script (perl)

There are a few different scenarios in which an odkfull container is executed

  • As a one-time run when setting up a repository using seed-via-docker.sh (which wraps a script that does the actual work)
  • After initial setup and pushing to GitHub, ontology developers may wish to execute parts of the workflow locally – for example, extracting an import module after adding new seeds for external ontology classes
  • Travis-CI uses the same container used by ontology developers
  • Embedding within a larger pipeline

Setting up a repo

Typing

./seed-via-docker.sh

Will initiate the process of making a new repo, depositing the results in the target/ folder. This is all done within a container. The seed process will generate a workflow in the form of a Makefile, and then run that workflow, all in the container. The final step of pushing the repo to GitHub is currently done by the user directly in their own environment, rather than from within the container.

Running parts of the workflow

Note that repos built from the odk will include a one-line script in the src/ontology folder* called “run.sh”. This is a simple wrapper for running the docker container. (if you built your repo from an earlier odk, then you can simply copy this script).

Now, instead of typing

make test

The ontology developer can now type

./run.sh make test

The former requires the user has all the relevant tooling installed (which at least requires Xcode on OS-X, which not all curators have). The latter will only require Docker.

Travis execution

Note that the .travis.yml file generated will be configured to run the travis job in an odkfull container. If you generated your repo using an earlier odk, you can manually adapt your existing travis file.

Is this the right approach?

Docker may seem like quite heavyweight for something like running an ontology pipeline. Before deciding on this path, we did some tests on some volunteers in GO who were not familiar with Docker. These editors had a need to rebuild import files frequently, and having everyone install their own tools has not worked out so well in the past. Preliminary results seem to indicate the editors are happy with this approach.

It may be the case that in future more can be triggered directly from within Protege. Or some ontology environments such as Tawny-OWL are powerful enough to do everything from one tool chain.  But for now the reality is that many ontology workflows require a fairly heterogeneous set of tools to operate, and there is frequently a need to use multiple languages, which complicates the install environment. Docker provides a nice way to unify this.

We’ll put this into practice at ICBO this week, in the Phenotype Ontology and OBO workshops.

Acknowledgments

Thanks to the many developers and testers: David Osumi-Sutherland, Nico Matentzoglu, Jim Balhoff, Eric Douglass, Marie-Angelique Laporte, Rebecca Tauber, James Overton, Nicole Vasilevsky, Pier Luigi Buttigieg, Kim Rutherford, Sofia Robb, Damion Dooley, Citlalli Mejía Almonte, Melissa Haendel, David Hill, Matthew Lange.

More help and testers wanted! See: https://github.com/INCATools/ontology-development-kit/issues

Footnotes

* Footnote: the Makefile has always lived in the src/ontology folder, but the build process requires the whole repo, so the run.sh wrapper maps two levels up. It looks a little odd, but it works. In future if there is demand we may switch the Makefile to being in the root folder.

Advertisements

owljs – a javascript library for OWL hacking

owljs ia a javascript library for doing stuff with OWL. It’s available from github:

https://github.com/cmungall/owljs

Whilst it attempts to following CommonJS, you currently have to use RingoJS  (a Rhino engine) as it makes use of JVM calls to the OWLAPI

owl plus rhino equals fun

 

Why javascript?

Why javascript you may ask? Isn’t that a hacky language run in browsers? In fact javascript is increasingly used on the server side as well as in browsers, as can be seen in the success of node.js. With Java 8 incorporating Nashorn as a scripting engine, it looks like javascript on the server side is here to stay.

Why not just use java? Java can be very verbose and is not the ideal language for writing short ontology processing scripts, especially with the OWL API.

There are a number of other languages better suited to scripting and declarative programming in general, many of which run on the JVM. This includes

  • Groovy – a popular choice for interfacing with the OWL API
  • The Armed Bear flavor of Common Lisp, as used in LSW2.
  • Clojure, a variant of lisp, as used in Phil Lord’s powerful Tawny-OWL framework.
  • Scala, a superbly elegant functional programming language used to great effect in Jim Balhoff’s beautifully elegant scowl.
  • Iron Python – a popular choice for interfacing with the Brain. And of course, Python is the de facto language for bioinformatics these days

There are also offerings in non-JVM languages such as my own posh – in addition most languages provide some kind of RDF library, but this can often be low level for working in OWL.

I decided to write a javascript library for a number of reasons. Our group already produces a lot of javascript code, most of which can be run on the server. For example, the golr libraries used in the AmiGO 2 codebase are CommonJS, as are those used for the Monarch API. Thse APIs all access ontologies through services (and can thus be run on a non-JVM javascript engine like node), and we would not make these APIs depend on a JVM. However, the ability to go the other way is useful – in a powerful ontology processing environment that offers all the features of the OWL API, being able to access all kinds of bioinformatics data through ready-made javascript APIs.

Another reason is that JSON is ubiquitous, and having your data format be a subset of the language has some major advantages.

Plus, after an initial period of ambivalence, I have grown to really like javascript. It’s just functional enough to do some cool things.

What can you do with it?

I hope to provide some specific examples later on this blog. For now, take a look at the docs on github. Major features are:

Stay tuned for more information!

 

 

 

 

Perl library for OWL hacking

I would recommend using a JVM language plus the OWL API for doing programmatic processing of OWL.

NOT perl.

If you really insist on perl, and you don’t mind insane magical AUTOLOAD heavy modules with no documentation:

https://github.com/cmungall/owlhack

Unlike many modules, this doesn’t attempt to map some RDF monster into OWL axioms. It takes in a very simple JSON format and provides a very slim layer on top of that. Unfortunately there isn’t a standard JSON for OWL, so owlhack uses a custom translation as provided by OWLTools. This is a very generic axiom-oriented lispy rendering of OWL functional syntax.

Currently I’m using this module for tasks such as generating ad-hoc chunks of markdown derived from the ontology. The resulting md can then be pasted into github tracker postings, or used to generate html.

There’s also a “sed” script that comes with the library that’s useful for performing perl “s/” operations on annotation values.

It’s all a bit hacky, kind of an OWL replacement for https://github.com/cmungall/obo-scripts

Caveat emptor!

Migration of Gene Ontology bridging ontologies to OWL

In the GO, we’ve historically maintained a number of experimental extensions to the ontology as obo-format bridging files in the infamous “scratch” folder. Sometimes sets of these migrate into the main ontology (the regulation logical definitions, as well as the occurs in ones started this way). This has always been harder for axioms that point to external ontologies like chebi (although with this change it should be simpler).

These “bridging ontologies” fall into two main categories:

  • logical definitions [1]
  • taxon constraints[2]

As part of a general move to make more direct use of OWL in GO, we have moved the primary location of some of the bridging ontologies to the new SVN repository, with the primary format being OWL. Some of  See: the Ontology extensions page on the GO wiki.

This has a number of advantages:

  • the bridging axioms can be edited directly in protege, in the context of the GO and the external ontology, using an importer ontology
  • we can make use of more expressive class expressions

We will provide translations back to obo but these may be lossy.

Eventually these will move into the main ontology, but there are issues regarding import chains, MIREOTing and public releases that need resolved. For now you have to explicitly import these ontologies plus any dependent ontologies. Note that this is made easier by a combination of svn:externals and catalog xml files – see tips.

Check out the repository like this:

cd
svn co svn+ssh://ext.geneontology.org/share/go/svn/trunk go-trunk
cd go-trunk/ontology/extensions

This should result in a directory structure like this:

ontology/
  external/
    cell-ontology/
    ncbitaxon-slim/
    ..
  editors/
    gene_ontology_write.obo
  extensions/
    x-FOO.owl
    x-FOO-importer.owl
    ...
    catalog-v001.xml

The importer ontologies are designed to be loaded into Protege or programatically via OWLTools (use the new “–use-catalog” option with OWLTools).

Here is an example importer file – it contains no axioms of its own, it exists as a parent to other ontologies:

    <owl:Ontology rdf:about="http://purl.obolibrary.org/obo/go/extensions/x-metazoan-anatomy-importer.owl">
        <owl:imports rdf:resource="http://purl.obolibrary.org/obo/go.owl"/>
        <owl:imports rdf:resource="http://purl.obolibrary.org/obo/ro.owl"/>
        <owl:imports rdf:resource="http://purl.obolibrary.org/obo/uberon/composite-metazoan.owl"/>
        <owl:imports rdf:resource="http://purl.obolibrary.org/obo/go/extensions/x-cell.owl"/>
        <owl:imports rdf:resource="http://purl.obolibrary.org/obo/go/extensions/x-metazoan-anatomy.owl"/>
    </owl:Ontology>

Note that most of these will be fetched from the svn externals directory (if you load from svn, rather than the web).

After loading, you should see something like:
protege import chain screenshot
See the wiki for more details.

The bridging axioms have been under-used in the GO, expect more use of them, as part of ontology development and downstream applications over the coming year!

[1]Mungall, C. J., Bada, M., Berardini, T. Z., Deegan, J., Ireland, A., Harris, M. A., Hill, D. P., and Lomax, J. (2011). Cross-product extensions of the Gene Ontology. Journal of Biomedical Informatics 44, 80-86.PMC dx.doi.org/10.1016/j.jbi.2010.02.002

[2] Deegan Née Clark, J. I., Dimmer, E. C., and Mungall, C. J. (2010). Formalization of taxon-based constraints to detect inconsistencies in annotation and ontology development. BMC Bioinformatics 11, 530. http://www.biomedcentral.com/1471-2105/11/530

Phenotype ontologies on googlecode

For the PATO project we’ve set up a repository on googlecode to collect phenotype ontologies and various bridging axioms:

http://code.google.com/p/phenotype-ontologies

This aggregates together the main phenotype ontologies, together with logical definitions bridging ontologies, as defined in

Mungall, C. J., Gkoutos, G. V., Smith, C. L., Haendel, M. A., Lewis, S. E., and Ashburner, M. (2010). Integrating phenotype ontologies across multiple species. Genome Biology 11, R2. Available at: http://dx.doi.org/10.1186/gb-2010-11-1-r2

You can access the aggregated ontology via this PURL:

http://purl.obolibrary.org/obo/upheno/uberpheno-subq-importer.owl

It may be slow to open this via the web. If you have the phenotype-ontologies repository checked out, you can open the file from the filesystem – external ontologies will be obtained via svn:externals.

I recommend using Elk as the reasoner, others will be too slow with the combination of HP, MP, FMA, MA, PATO, etc. Unfortunately Elk doesn’t yet allow DL queries or explanations of inferences.

The above ontology uses a slightly modified version of the definitions described in the Genome Biology paper – instead of modeling each phenotype as a single quality (e.g. redness of nose), we now model them as aggregates of phenotypes. This tends to work better for HPO, which has many composite phenotypes.

Note also that we’re using a hacked version of the uberon bridging axioms to ZFA, MA and FMA – we treat these as precise equivalents rather than taxonomic equivalents. This is necessary as we mix uberon in with the species ontologies in the logical definitions.