GO annotation origami: Folding and unfolding class expressions

With the introduction of Gene Association Format (GAF) v2, curators are no longer restricted to pre-composed GO terms – they can use a limited form of anonymous OWL Class Expressions of the form:

GO_Class AND (Rel_1 some V_1) AND (Rel_2 some V2)

The set of relationships is specified in column 16 of the GAF file.

However, many tools are not capable of using class expressions – they discard the additional information leaving only the pre-composed GO_Class.

Using OWLTools it is possible to translate a GAF-v2 set of associations and an ontology O to an equivalent GAF-v1 set of associations plus an analysis ontology O-ext. The analysis ontology O-ext contains the set of anonymous class expressions folded into named classes, together with equivalence axioms, and pre-reasoned into a hierarchy using Elk.

See http://code.google.com/p/owltools/wiki/AnnotationExtensionFolding

For example, given a GO annotation of a gene ‘geneA’:

gene: geneA
annotation_class:  GO:0006915 ! apoptosis
annotation_extension: occurs_in(CL:0000700) ! dopaminergic neuron

The folding process will generate a class with a non-stable URI, automatic label and equivalence axiom:

Class: GO/TEMP_nnnn
  Annotations: label "apoptosis and occurs_in some dopaminergic neuron"
  EquivalentTo: 'apoptosis' and occurs_in some 'dopaminergic neuron'
  SubClassOf: 'neuron apoptosis'

This class will automatically be placed in the hierarchy using the reasoner (e.g. under ‘neuron apoptosis’). For the reasoning step to achieve optimal results, the go-plus-dev.owl version should be used (see new GO documentation). A variant of this step is to perform folding to find a more specific subclass that the one used for direct annotation.

The reverse operation – unfolding – is also possible.  For optimal results, this relies on Equivalent Classes axioms declared in the ontology, so make sure to use the go-plus-dev.owl. Here an annotation to a pre-composed complex term (eg neuron apoptosis) is replaced by an annotation to a simpler GO term (eg apoptosis) with column 16 filled in (e.g. occurs_in(neuron).

The folding operation allows legacy tools to take some advantage of GO annotation extensions by generating an ‘analysis ontology’ (care must be taken in how this is presented to the user, if at all). Ideally more tools will use OWL as the underlying ontology model and be able to handle c16 annotations directly, ultimately requiring less pre-coordination in the GO.


Elk disjoint hack

Elk is a blindingly fast EL++ reasoner. Unfortunately, it doesn’t yet support the full EL++ profile – in particular it lacks disjointness axioms. This is unfortunate, as these kinds of axioms are incredibly useful for integrity checking. See the methods section of the Uberon paper for some details on how partwise disjointness axioms were created.

However, Elk does support intersection and equivalence. This means we should be able to perform a translation:

DisjointClasses(x1, x2, …, xn) ⇒
EquivalentClasses(owl:Nothing IntersectionOf(xi xj)) for all i<j<=n

I asked about this on the Elk mail list – see  Satisfiability checking and DisjointClasses axioms

The problem is that whilst Elk supports intersection and equivalence, it doesn’t support Nothing. This means that there may be corner cases in which it doesn’t work.

Proper disjointness support may be coming in the next version Elk, but it’s been a few months so I decided to go ahead and implement the above translation in OWLTools (also available in Oort).

If we have an ontology such as foo.owl:

Ontology: <http://example.org/x.owl>

Class: :reasoner
Class: :animal
  DisjointWith: :reasoner

Class: :elk
  SubClassOf: :reasoner, :animal

We can translate it using owltools:

owltools foo.owl --translate-disjoints-to-equivalents -o file://`pwd`/foo-x.owl

Remeber, ordering of arguments is significant in owltools -make sure you translate *after* the ontology is loaded.

And then load this into Protege and reason over it using Elk. As expected, “elk” is unsatisfiable:

You can also do the checking directly in owltools:

owltools foo.owl --translate-disjoints-to-equivalents --run-reasoner -r elk -u

The “-u” option will check for unsatisfiable classes and exit with a nonzero code if any are found, allowing this to be used within a CI system like Jenkins (see this previous post).

You can also use this transform within Oort (command line version only):

ontology-release-runner --translate-disjoints-to-equivalents --reasoner elk foo.owl

Remember, there are corner cases where this translation will not work. Nevertheless, this can be useful as part of an “early warning” system, backed up by slower guaranteed checks running in the background with HermiT or some other reasoner.

Perhaps the ontologies I work with have a simpler structure, but so far I have found this strategy to be successful, identifying subtle part-disjointness problems, and not giving any false positives. There don’t appear to be any scalability problems, with Elk being its usual zippy self even when uberon is loaded with ncbitaxon/taxslim and taxon constraints translated into Nothing-axioms (~3000 disjointness axioms).