GO annotation origami: Folding and unfolding class expressions
March 31, 2013 Leave a comment
With the introduction of Gene Association Format (GAF) v2, curators are no longer restricted to pre-composed GO terms – they can use a limited form of anonymous OWL Class Expressions of the form:
GO_Class AND (Rel_1 some V_1) AND (Rel_2 some V2)
The set of relationships is specified in column 16 of the GAF file.
However, many tools are not capable of using class expressions – they discard the additional information leaving only the pre-composed GO_Class.
Using OWLTools it is possible to translate a GAF-v2 set of associations and an ontology O to an equivalent GAF-v1 set of associations plus an analysis ontology O-ext. The analysis ontology O-ext contains the set of anonymous class expressions folded into named classes, together with equivalence axioms, and pre-reasoned into a hierarchy using Elk.
For example, given a GO annotation of a gene ‘geneA’:
gene: geneA annotation_class: GO:0006915 ! apoptosis annotation_extension: occurs_in(CL:0000700) ! dopaminergic neuron
The folding process will generate a class with a non-stable URI, automatic label and equivalence axiom:
Class: GO/TEMP_nnnn Annotations: label "apoptosis and occurs_in some dopaminergic neuron" EquivalentTo: 'apoptosis' and occurs_in some 'dopaminergic neuron' SubClassOf: 'neuron apoptosis'
This class will automatically be placed in the hierarchy using the reasoner (e.g. under ‘neuron apoptosis’). For the reasoning step to achieve optimal results, the go-plus-dev.owl version should be used (see new GO documentation). A variant of this step is to perform folding to find a more specific subclass that the one used for direct annotation.
The reverse operation – unfolding – is also possible. For optimal results, this relies on Equivalent Classes axioms declared in the ontology, so make sure to use the go-plus-dev.owl. Here an annotation to a pre-composed complex term (eg neuron apoptosis) is replaced by an annotation to a simpler GO term (eg apoptosis) with column 16 filled in (e.g. occurs_in(neuron).
The folding operation allows legacy tools to take some advantage of GO annotation extensions by generating an ‘analysis ontology’ (care must be taken in how this is presented to the user, if at all). Ideally more tools will use OWL as the underlying ontology model and be able to handle c16 annotations directly, ultimately requiring less pre-coordination in the GO.