GO annotation origami: Folding and unfolding class expressions

With the introduction of Gene Association Format (GAF) v2, curators are no longer restricted to pre-composed GO terms – they can use a limited form of anonymous OWL Class Expressions of the form:

GO_Class AND (Rel_1 some V_1) AND (Rel_2 some V2)

The set of relationships is specified in column 16 of the GAF file.

However, many tools are not capable of using class expressions – they discard the additional information leaving only the pre-composed GO_Class.

Using OWLTools it is possible to translate a GAF-v2 set of associations and an ontology O to an equivalent GAF-v1 set of associations plus an analysis ontology O-ext. The analysis ontology O-ext contains the set of anonymous class expressions folded into named classes, together with equivalence axioms, and pre-reasoned into a hierarchy using Elk.

See http://code.google.com/p/owltools/wiki/AnnotationExtensionFolding

For example, given a GO annotation of a gene ‘geneA’:

gene: geneA
annotation_class:  GO:0006915 ! apoptosis
annotation_extension: occurs_in(CL:0000700) ! dopaminergic neuron

The folding process will generate a class with a non-stable URI, automatic label and equivalence axiom:

Class: GO/TEMP_nnnn
  Annotations: label "apoptosis and occurs_in some dopaminergic neuron"
  EquivalentTo: 'apoptosis' and occurs_in some 'dopaminergic neuron'
  SubClassOf: 'neuron apoptosis'

This class will automatically be placed in the hierarchy using the reasoner (e.g. under ‘neuron apoptosis’). For the reasoning step to achieve optimal results, the go-plus-dev.owl version should be used (see new GO documentation). A variant of this step is to perform folding to find a more specific subclass that the one used for direct annotation.

The reverse operation – unfolding – is also possible.  For optimal results, this relies on Equivalent Classes axioms declared in the ontology, so make sure to use the go-plus-dev.owl. Here an annotation to a pre-composed complex term (eg neuron apoptosis) is replaced by an annotation to a simpler GO term (eg apoptosis) with column 16 filled in (e.g. occurs_in(neuron).

The folding operation allows legacy tools to take some advantage of GO annotation extensions by generating an ‘analysis ontology’ (care must be taken in how this is presented to the user, if at all). Ideally more tools will use OWL as the underlying ontology model and be able to handle c16 annotations directly, ultimately requiring less pre-coordination in the GO.

 

Querying for connections between the GO and FMA

Can we query for connections between FMA and GO? This should be
possible by using a combination of

  • GO
  • Uberon
  • FMA
  • Axioms linking GO and Uberon (x-metazoan-anatomy)
  • Axioms linking FMA and Uberon (uberon-to-fma)

This may seem like more components than is necessary. However,
remember that GO is a multi-species ontology, and “heart development”
in GO covers not only vertebrate hearts, but also (perhaps
controversially) drosophila “hearts”. In contrast, the FMA class for
“heart” represents a canonical adult human heart. This is why we have
to go via Uberon, which covers similar taxonomic territory to GO. The
uberon class called “heart” covers all hearts.

GO to metazoan anatomical structures

http://purl.obolibrary.org/obo/go/extensions/x-metazoan-anatomy.owl contains axioms of the form:


'heart  EquivalentTo 'anatomical structure morphogenesis' and
'results in morphogenesis of' some uberon:heart

(note that sub-properties of ‘results in developmental progression of’
are used here)

Generic metazoan anatomy to FMA

http://purl.obolibrary.org/obo/uberon/bridge/uberon-bridge-to-fma.owl contains axioms of the form:


fma:heart EquivalentTo uberon:heart and part_of some 'Homo sapiens'

GO to FMA

Note that there is no existential dependence between go ‘heart
development’ and fma:heart. This is as it should be – if there were no
human hearts then there would still be heart development
processes. This issue is touched in Chimezie Ogbuji‘s presentation at DILS 2012.

This lack of existential dependence has consequences for querying
connections. An OWL query for:

?p SubClassOf ‘results in developmental progression of’ some ?u

Will return GO-Uberon connections only.

We must perform a join in order to get what we want:

?p SubClassOf ‘results in developmental progression of’ some ?u,
?a SubClassOf ?u,
?a part_of some ‘Homo sapiens’

Actually executing this query is not straightforward. Ideally we would
have a way of using OWL syntax, such as the above. To get complete
results, either EL++ or RL reasoning is required. In the next post I’ll present some possible options for issuing this query.

Taxon constraints in OWL

A number of years ago, the Gene Ontology database included such curiosities as:

  • A slime mold gene that had a function in fin morphogenesis
  • Chicken genes that were involved in lactation

These genes would be pretty fascinating, if they actually existed. Unfortunately, these were all annotation errors, arising from a liberal use of inference by sequence similarity.

We decided to adopt a formalism specified by Wacek Kusnierczyk[1], in which we placed taxon constraints on classes in the ontology, and used these to detect annotation errors[2].

The taxon constraints make use of two relations:

 

You can see examples of usage in GO either in QuickGO (e.g. lactation) , or by opening the x-taxon-importer.owl ontology in Protege. This ontology is used in the GO Jenkins environment to detect internal consistencies in the ontology.

The same relations are also in use in another multi-species ontology, Uberon[3].

 

In uberon, the constraints are used for ontology consistency checking, and to provide taxon subsets – for example, aves-basic.owl, which excludes classes such as mammary gland, pectoral fin, etc.

Semantics of the shortcut relations

In the Deegan et al paper we described a rule-based procedure for using the taxon constraint relations. This has the advantage of being scalable over large taxon ontologies and large gene association sets. But a better approach is to encode this directly as owl axioms and use a reasoner. For this we need to use OWL axioms directly, and we need to choose a particular way of representing a taxonomy.

Both relations make use of a class-based representation of a taxonomy such as ncbitaxon.owl or a subset such as taxslim.owl.

We can treat the taxon constraint relations as convenient shortcut relations which ‘expand’ to OWL axioms that capture the intended semantics in terms of a standard ObjectProperty “in_organism”. For now we leave in_organism undefined, but the basic idea is that for anatomical structures and cell components “in_organism” is the part_of parent that is an organism, whereas for processes it is the organism that encodes the gene products that execute the process.

In fact there are two ways to expand to the “in_organism” class axioms:

The more straightforward way:

?X only_in_taxon ?Y ===> ?X SubClassOf in_organism only ?Y
?X never_in_taxon ?Y ===> ?X SubClassOf in_organism only not ?Y

To achieve the desired entailments, it is necessary for sibling taxa to be declared disjoint (e.g. Eubacteria DisjointWith Eukaryota). Note that these disjointness axioms are not declared in the default NCBITaxon translation.

A different way which has the advantage of staying within the OWL2-EL subset:

?X only_in_taxon ?Y ===> ?X SubClassOf in_organism some ?Y
?X never_in_taxon ?Y ===> ?X DisjointWith in_organism some ?Y

This requires all sibling nodes (A,B) in the NCBI taxonomy to have a
General Axiom:

in_organism some ?A DisjointWith in_organism some ?B

These general axioms are automatically generated and available in taxslim-disjoint-over-in-taxon.owl

Taxon groupings

GO also makes use of taxon groupings – these include new classes such as “prokaryotes” which are defined using UnionOf axioms.. They are available in go-taxon-groupings.owl.

Taxon modules

One of the uses of taxon constraints is to build taxon-specific subsets of ontologies. This will be covered in a future post.

References

  1. Waclaw Kusnierczyk (2008) Taxonomy-based partitioning of the Gene Ontology, Journal of Biomedical Informatics
  2. Deegan Née Clark, J. I., Dimmer, E. C., and Mungall, C. J. (2010). Formalization of taxon-based constraints to detect inconsistencies in annotation and ontology development. BMC Bioinformatics 11, 530
  3. Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E., and Haendel, M. A. (2012) Uberon, an integrative multi-species anatomy ontology Genome Biology 13, R5. http://genomebiology.com/2012/13/1/R5

Migration of Gene Ontology bridging ontologies to OWL

In the GO, we’ve historically maintained a number of experimental extensions to the ontology as obo-format bridging files in the infamous “scratch” folder. Sometimes sets of these migrate into the main ontology (the regulation logical definitions, as well as the occurs in ones started this way). This has always been harder for axioms that point to external ontologies like chebi (although with this change it should be simpler).

These “bridging ontologies” fall into two main categories:

  • logical definitions [1]
  • taxon constraints[2]

As part of a general move to make more direct use of OWL in GO, we have moved the primary location of some of the bridging ontologies to the new SVN repository, with the primary format being OWL. Some of  See: the Ontology extensions page on the GO wiki.

This has a number of advantages:

  • the bridging axioms can be edited directly in protege, in the context of the GO and the external ontology, using an importer ontology
  • we can make use of more expressive class expressions

We will provide translations back to obo but these may be lossy.

Eventually these will move into the main ontology, but there are issues regarding import chains, MIREOTing and public releases that need resolved. For now you have to explicitly import these ontologies plus any dependent ontologies. Note that this is made easier by a combination of svn:externals and catalog xml files – see tips.

Check out the repository like this:

cd
svn co svn+ssh://ext.geneontology.org/share/go/svn/trunk go-trunk
cd go-trunk/ontology/extensions

This should result in a directory structure like this:

ontology/
  external/
    cell-ontology/
    ncbitaxon-slim/
    ..
  editors/
    gene_ontology_write.obo
  extensions/
    x-FOO.owl
    x-FOO-importer.owl
    ...
    catalog-v001.xml

The importer ontologies are designed to be loaded into Protege or programatically via OWLTools (use the new “–use-catalog” option with OWLTools).

Here is an example importer file – it contains no axioms of its own, it exists as a parent to other ontologies:

    <owl:Ontology rdf:about="http://purl.obolibrary.org/obo/go/extensions/x-metazoan-anatomy-importer.owl">
        <owl:imports rdf:resource="http://purl.obolibrary.org/obo/go.owl"/>
        <owl:imports rdf:resource="http://purl.obolibrary.org/obo/ro.owl"/>
        <owl:imports rdf:resource="http://purl.obolibrary.org/obo/uberon/composite-metazoan.owl"/>
        <owl:imports rdf:resource="http://purl.obolibrary.org/obo/go/extensions/x-cell.owl"/>
        <owl:imports rdf:resource="http://purl.obolibrary.org/obo/go/extensions/x-metazoan-anatomy.owl"/>
    </owl:Ontology>

Note that most of these will be fetched from the svn externals directory (if you load from svn, rather than the web).

After loading, you should see something like:
protege import chain screenshot
See the wiki for more details.

The bridging axioms have been under-used in the GO, expect more use of them, as part of ontology development and downstream applications over the coming year!

[1]Mungall, C. J., Bada, M., Berardini, T. Z., Deegan, J., Ireland, A., Harris, M. A., Hill, D. P., and Lomax, J. (2011). Cross-product extensions of the Gene Ontology. Journal of Biomedical Informatics 44, 80-86.PMC dx.doi.org/10.1016/j.jbi.2010.02.002

[2] Deegan Née Clark, J. I., Dimmer, E. C., and Mungall, C. J. (2010). Formalization of taxon-based constraints to detect inconsistencies in annotation and ontology development. BMC Bioinformatics 11, 530. http://www.biomedcentral.com/1471-2105/11/530