GO annotation origami: Folding and unfolding class expressions

With the introduction of Gene Association Format (GAF) v2, curators are no longer restricted to pre-composed GO terms – they can use a limited form of anonymous OWL Class Expressions of the form:

GO_Class AND (Rel_1 some V_1) AND (Rel_2 some V2)

The set of relationships is specified in column 16 of the GAF file.

However, many tools are not capable of using class expressions – they discard the additional information leaving only the pre-composed GO_Class.

Using OWLTools it is possible to translate a GAF-v2 set of associations and an ontology O to an equivalent GAF-v1 set of associations plus an analysis ontology O-ext. The analysis ontology O-ext contains the set of anonymous class expressions folded into named classes, together with equivalence axioms, and pre-reasoned into a hierarchy using Elk.

See http://code.google.com/p/owltools/wiki/AnnotationExtensionFolding

For example, given a GO annotation of a gene ‘geneA’:

gene: geneA
annotation_class:  GO:0006915 ! apoptosis
annotation_extension: occurs_in(CL:0000700) ! dopaminergic neuron

The folding process will generate a class with a non-stable URI, automatic label and equivalence axiom:

Class: GO/TEMP_nnnn
  Annotations: label "apoptosis and occurs_in some dopaminergic neuron"
  EquivalentTo: 'apoptosis' and occurs_in some 'dopaminergic neuron'
  SubClassOf: 'neuron apoptosis'

This class will automatically be placed in the hierarchy using the reasoner (e.g. under ‘neuron apoptosis’). For the reasoning step to achieve optimal results, the go-plus-dev.owl version should be used (see new GO documentation). A variant of this step is to perform folding to find a more specific subclass that the one used for direct annotation.

The reverse operation – unfolding – is also possible.  For optimal results, this relies on Equivalent Classes axioms declared in the ontology, so make sure to use the go-plus-dev.owl. Here an annotation to a pre-composed complex term (eg neuron apoptosis) is replaced by an annotation to a simpler GO term (eg apoptosis) with column 16 filled in (e.g. occurs_in(neuron).

The folding operation allows legacy tools to take some advantage of GO annotation extensions by generating an ‘analysis ontology’ (care must be taken in how this is presented to the user, if at all). Ideally more tools will use OWL as the underlying ontology model and be able to handle c16 annotations directly, ultimately requiring less pre-coordination in the GO.

 

Perl library for OWL hacking

I would recommend using a JVM language plus the OWL API for doing programmatic processing of OWL.

NOT perl.

If you really insist on perl, and you don’t mind insane magical AUTOLOAD heavy modules with no documentation:

https://github.com/cmungall/owlhack

Unlike many modules, this doesn’t attempt to map some RDF monster into OWL axioms. It takes in a very simple JSON format and provides a very slim layer on top of that. Unfortunately there isn’t a standard JSON for OWL, so owlhack uses a custom translation as provided by OWLTools. This is a very generic axiom-oriented lispy rendering of OWL functional syntax.

Currently I’m using this module for tasks such as generating ad-hoc chunks of markdown derived from the ontology. The resulting md can then be pasted into github tracker postings, or used to generate html.

There’s also a “sed” script that comes with the library that’s useful for performing perl “s/” operations on annotation values.

It’s all a bit hacky, kind of an OWL replacement for https://github.com/cmungall/obo-scripts

Caveat emptor!

Querying for connections between the GO and FMA

Can we query for connections between FMA and GO? This should be
possible by using a combination of

  • GO
  • Uberon
  • FMA
  • Axioms linking GO and Uberon (x-metazoan-anatomy)
  • Axioms linking FMA and Uberon (uberon-to-fma)

This may seem like more components than is necessary. However,
remember that GO is a multi-species ontology, and “heart development”
in GO covers not only vertebrate hearts, but also (perhaps
controversially) drosophila “hearts”. In contrast, the FMA class for
“heart” represents a canonical adult human heart. This is why we have
to go via Uberon, which covers similar taxonomic territory to GO. The
uberon class called “heart” covers all hearts.

GO to metazoan anatomical structures

http://purl.obolibrary.org/obo/go/extensions/x-metazoan-anatomy.owl contains axioms of the form:


'heart  EquivalentTo 'anatomical structure morphogenesis' and
'results in morphogenesis of' some uberon:heart

(note that sub-properties of ‘results in developmental progression of’
are used here)

Generic metazoan anatomy to FMA

http://purl.obolibrary.org/obo/uberon/bridge/uberon-bridge-to-fma.owl contains axioms of the form:


fma:heart EquivalentTo uberon:heart and part_of some 'Homo sapiens'

GO to FMA

Note that there is no existential dependence between go ‘heart
development’ and fma:heart. This is as it should be – if there were no
human hearts then there would still be heart development
processes. This issue is touched in Chimezie Ogbuji‘s presentation at DILS 2012.

This lack of existential dependence has consequences for querying
connections. An OWL query for:

?p SubClassOf ‘results in developmental progression of’ some ?u

Will return GO-Uberon connections only.

We must perform a join in order to get what we want:

?p SubClassOf ‘results in developmental progression of’ some ?u,
?a SubClassOf ?u,
?a part_of some ‘Homo sapiens’

Actually executing this query is not straightforward. Ideally we would
have a way of using OWL syntax, such as the above. To get complete
results, either EL++ or RL reasoning is required. In the next post I’ll present some possible options for issuing this query.

ubermouth

Jim Balhoff has written a nice image depiction plugin for Protege4. Here it is in action showing uberon’s mouth.

uberon mouths

screenshot of uberon/depictions.owl using image depiction plugin

The plugin assumes that images are represent as individuals of type foaf:depicts some <Class>. For example:

Individual: wc:thumb/0/06/Mouth_illustration-Otis_Archives.jpg/180px-Mouth_illustration-Otis_Archives.jpg
Types: foaf:depicts some :UBERON_0000165

The plugin is available from github. You can try it on the uberon depictions owl file, http://purl.obolibrary.org/obo/uberon/depictions.owl.

Images are stored in a somewhat hacky way in uberon right now – as xrefs. There is a hacky way to translate them into the correct OWL – in future they will be stored directly with explicit OWL semantics. We will also include additional metadata about the image; for example (with IDs replaced by labels):

Individual: wc:180px-Mouth_illustration-Otis_Archives.jpg
Types: depicts some ('mouth' and part_of some 'Homo sapiens')
Annotations: description "Medical illustration of a human mouth by Duncan Kenneth Winter. Part of an unpublished manuscript on medical illustration written by Winter."

Individual: uberon/images/lamprey_sucker_rosava_3238889218.jpg
Types: depicts some ('mouth' and part_of some Petromyzontida)

Jim’s plugin makes use of the reasoner, so these species-specific depictions would show up in the generic uberon “mouth” class (unfortunately Elk0.2 doesn’t support individuals, and a fast reasoner like Elk is required for Uberon – however, Elk0.3, due very soon, should support individuals).

Many of the images in uberon were derived automatically by dbpedia SPARQL queries and may not have been verified. Whilst probably SFW, some of the depictions may be a little racy, so exercise caution whilst poking around the nether regions! The images in wikipedia are obviously human centric – it would be nice to have more sources for other animals. If anyone knows any sources that would be easy to mark up let me know.

Elk disjoint hack

Elk is a blindingly fast EL++ reasoner. Unfortunately, it doesn’t yet support the full EL++ profile – in particular it lacks disjointness axioms. This is unfortunate, as these kinds of axioms are incredibly useful for integrity checking. See the methods section of the Uberon paper for some details on how partwise disjointness axioms were created.

However, Elk does support intersection and equivalence. This means we should be able to perform a translation:

DisjointClasses(x1, x2, …, xn) ⇒
EquivalentClasses(owl:Nothing IntersectionOf(xi xj)) for all i<j<=n

I asked about this on the Elk mail list – see  Satisfiability checking and DisjointClasses axioms

The problem is that whilst Elk supports intersection and equivalence, it doesn’t support Nothing. This means that there may be corner cases in which it doesn’t work.

Proper disjointness support may be coming in the next version Elk, but it’s been a few months so I decided to go ahead and implement the above translation in OWLTools (also available in Oort).

If we have an ontology such as foo.owl:

Ontology: <http://example.org/x.owl>

Class: :reasoner
Class: :animal
  DisjointWith: :reasoner

Class: :elk
  SubClassOf: :reasoner, :animal

We can translate it using owltools:

owltools foo.owl --translate-disjoints-to-equivalents -o file://`pwd`/foo-x.owl

Remeber, ordering of arguments is significant in owltools -make sure you translate *after* the ontology is loaded.

And then load this into Protege and reason over it using Elk. As expected, “elk” is unsatisfiable:

You can also do the checking directly in owltools:

owltools foo.owl --translate-disjoints-to-equivalents --run-reasoner -r elk -u

The “-u” option will check for unsatisfiable classes and exit with a nonzero code if any are found, allowing this to be used within a CI system like Jenkins (see this previous post).

You can also use this transform within Oort (command line version only):

ontology-release-runner --translate-disjoints-to-equivalents --reasoner elk foo.owl

Remember, there are corner cases where this translation will not work. Nevertheless, this can be useful as part of an “early warning” system, backed up by slower guaranteed checks running in the background with HermiT or some other reasoner.

Perhaps the ontologies I work with have a simpler structure, but so far I have found this strategy to be successful, identifying subtle part-disjointness problems, and not giving any false positives. There don’t appear to be any scalability problems, with Elk being its usual zippy self even when uberon is loaded with ncbitaxon/taxslim and taxon constraints translated into Nothing-axioms (~3000 disjointness axioms).

 

Taxon constraints in OWL

A number of years ago, the Gene Ontology database included such curiosities as:

  • A slime mold gene that had a function in fin morphogenesis
  • Chicken genes that were involved in lactation

These genes would be pretty fascinating, if they actually existed. Unfortunately, these were all annotation errors, arising from a liberal use of inference by sequence similarity.

We decided to adopt a formalism specified by Wacek Kusnierczyk[1], in which we placed taxon constraints on classes in the ontology, and used these to detect annotation errors[2].

The taxon constraints make use of two relations:

 

You can see examples of usage in GO either in QuickGO (e.g. lactation) , or by opening the x-taxon-importer.owl ontology in Protege. This ontology is used in the GO Jenkins environment to detect internal consistencies in the ontology.

The same relations are also in use in another multi-species ontology, Uberon[3].

 

In uberon, the constraints are used for ontology consistency checking, and to provide taxon subsets – for example, aves-basic.owl, which excludes classes such as mammary gland, pectoral fin, etc.

Semantics of the shortcut relations

In the Deegan et al paper we described a rule-based procedure for using the taxon constraint relations. This has the advantage of being scalable over large taxon ontologies and large gene association sets. But a better approach is to encode this directly as owl axioms and use a reasoner. For this we need to use OWL axioms directly, and we need to choose a particular way of representing a taxonomy.

Both relations make use of a class-based representation of a taxonomy such as ncbitaxon.owl or a subset such as taxslim.owl.

We can treat the taxon constraint relations as convenient shortcut relations which ‘expand’ to OWL axioms that capture the intended semantics in terms of a standard ObjectProperty “in_organism”. For now we leave in_organism undefined, but the basic idea is that for anatomical structures and cell components “in_organism” is the part_of parent that is an organism, whereas for processes it is the organism that encodes the gene products that execute the process.

In fact there are two ways to expand to the “in_organism” class axioms:

The more straightforward way:

?X only_in_taxon ?Y ===> ?X SubClassOf in_organism only ?Y
?X never_in_taxon ?Y ===> ?X SubClassOf in_organism only not ?Y

To achieve the desired entailments, it is necessary for sibling taxa to be declared disjoint (e.g. Eubacteria DisjointWith Eukaryota). Note that these disjointness axioms are not declared in the default NCBITaxon translation.

A different way which has the advantage of staying within the OWL2-EL subset:

?X only_in_taxon ?Y ===> ?X SubClassOf in_organism some ?Y
?X never_in_taxon ?Y ===> ?X DisjointWith in_organism some ?Y

This requires all sibling nodes (A,B) in the NCBI taxonomy to have a
General Axiom:

in_organism some ?A DisjointWith in_organism some ?B

These general axioms are automatically generated and available in taxslim-disjoint-over-in-taxon.owl

Taxon groupings

GO also makes use of taxon groupings – these include new classes such as “prokaryotes” which are defined using UnionOf axioms.. They are available in go-taxon-groupings.owl.

Taxon modules

One of the uses of taxon constraints is to build taxon-specific subsets of ontologies. This will be covered in a future post.

References

  1. Waclaw Kusnierczyk (2008) Taxonomy-based partitioning of the Gene Ontology, Journal of Biomedical Informatics
  2. Deegan Née Clark, J. I., Dimmer, E. C., and Mungall, C. J. (2010). Formalization of taxon-based constraints to detect inconsistencies in annotation and ontology development. BMC Bioinformatics 11, 530
  3. Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E., and Haendel, M. A. (2012) Uberon, an integrative multi-species anatomy ontology Genome Biology 13, R5. http://genomebiology.com/2012/13/1/R5

A response to “Unintended consequences of existential quantifications in biomedical ontologies”

In Unintended consequences of existential quantifications in biomedical ontologies, Boeker et al attempt to

…scrutinize the OWL-DL releases of OBO ontologies to assess whether their logical axioms correspond to the meaning intended by their authors

The authors examine existential restriction axioms in a number of ontologies (whose source is in obo-format) and rate them according to the correspondence between the semantics and the presumed author intent. They claim:

  • usability issues with OBO ontologies
  • lack of ontological commitment for several common terms
  • proliferation of domain-specific relations
  • numerous assertions which do not properly describe the underlying biological reality, or are ambiguous and difficult to interpret.

The proposed solution:

The solution is a better anchoring in upper ontologies and a restriction to relatively few, well defined relation types with given domain and range constraints

I think this is an interesting paper, and have great respect for all the authors involved. However, I find some of the claims to be suspect and need countered. I do think the paper shows that we need much better ontology and ontology-technology documentation from the obo foundry effort (which I am a part of); however, I think the authors have read far too much into the lack of documentation and consequently muddy the issues on a number of matters.

The initial misunderstanding is presented at the start of the paper:

This extract asserts the relationship part_of between the terms ankle and hindlimb in OBO format.

[Term]
id: MA:0000043
name: ankle
relationship: part of MA:0000026 ! hindlimb

This assertion does not commit to a semantics in terms of the real world entities which are denoted by the terms. It does not allow us to infer that, e.g., all hindlimbs have ankles, or all ankles are part of a hindlimb. Descriptions at this level require some kind of ontological interpretation for the OBO syntax in terms of OWL axioms, as OWL axioms are explicitly quantified

In fact this is incorrect. There is an ontological interpretation for the OBO syntax in terms of OWL axioms (which the authors provide, falsely stating that it is “one such interpretation”):

Ankle subClassOf part_of some Hindlimb

The authors provide links to official documentation confirming that this is the correct interpretation. They then go on to say:

Our mouse limb example could therefore be alternatively translated into at least the following three OWL expressions:

(i) Ankle subClassOf part_of some Hindlimb

(ii) Ankle subClassOf part_of exactly 1 Hindlimb

(iii) Ankle subClassOf part_of only Hindlimb

In fact there is some legitimate confusion over interpretation of relations due to the impedance mismatch between the treatment of time in the 2005 Relations Ontology paper and what is possible in OWL. But positing additional unwarranted interpretations just muddies the waters. In fact, regardless of the time issue, the RO 2005 paper is quite clear that the relations used should be read in an all-some fashion (ie interpretation (i)). This is consistent with what the Goldbreich/Horrocks translation and its current successor the obof1.4 specification, all of which are cited by the authors.

This claimed lack of a standard interpretation informs the main thesis advanced by the authors: the translation of obo-format relationships to existential restrictions is not always what ontology authors intend. In fact they are testing for something stronger, specifically the claim that every such translated existential restriction implies existential dependence, where this is defined:

x dependsG for its existence upon Fs = df

Necessarily, x exists only if some F exists

It is worth noting that dependence claim they are testing is a strong one, is stronger than anything in the OWL semantics and would be violated by a number of other ontologies, many in OWL such as the NCIt due to the prevalence of “may_do_X” type relations. This is a subtle point that may escape the casual reader of the paper.

The authors examined axioms in a number of ontologies and evaluated them to see whether there were uses of existential restrictions where this strong dependence claim is not justified. Their test set included ontologies from the OBO library as well as a number of external support ontologies (aka “cross product” ontologies). Most of these ontologies currently use obo-format as their source. They did not invite external domain experts, and they did not check their results with the authors of the ontologies.

The authors provide examples where they believe there are unintended consequences of existential restrictions, based on this strong interpretation. Many of the examples they provide are problematic, as I will illustrate.

They provide this example from the GO:

“Interkinetic nuclear migration SubClassOf

part_of some Cell proliferation in forebrain

The ontological dependence expressed by this assertion is that there are no interkinetic nuclear migration processes without a corresponding cell proliferation in forebrain process. This is obviously false, since interkinetic nuclear migration is a very fundamental cell process, which is not limited to forebrains. An easy fix to this error is the inversion of the expression by using the inverse relationship:

Cell proliferation in forebrain subclassOf

has_part some Interkinetic nuclear migration”

In fact, the GO editors are well aware of the all-some interpretation, they did intend to say that all instances of IKNM are in a forebrain, this is clear from the textual definition (I have highlighted the relevant part):

[Term]
id: GO:0022027
name: interkinetic nuclear migration
def: “The movement of the nucleus of the ventricular zone cell between the apical and the basal zone surfaces. Mitosis occurs when the nucleus is near the apical surface, that is, the lumen of the ventricle.” [GO_REF:0000021, GOC:cls, GOC:dgh, GOC:dph, GOC:jid, GOC:mtg_15jun06]
is_a: GO:0051647 ! nucleus localization
relationship: part_of GO:0021846 ! cell proliferation in forebrain

The mistake GO has made is giving the class a misleadingly generic label. This kind of thing is not unheard of in the GO – a class is given a label that has a specific meaning to one community when in fact the label is used more generally by a wider community. This is not to understate this kind of mistake – it’s actually quite serious (annotators are meant to always read the definition but unfortunately this rule isn’t always followed). However, the problem is entirely terminological and not in any way related interpretations of the relationship tag or existential quantification. The creators of this class really did intend to restrict the location to the forebrain (This was confirmed by one of the GO editors listed as provenance for the definition).

The authors are on safer ground with their analysis of structural relations such as has_parent_hydride in CHEBI. I don’t have such a problem here, but it would have been useful to see the claims tested. Can we use a reasoner to determine an inconsistency in the ontology (supplemented with additional axioms) using a reasoner? It seems that the problem is less in the computational properties of the existential restriction, and more in the existential dependence claim (which, remember, is stronger than what is claimed by the OWL semantics).

They also cover what they perceive to be a problem with the use of existential restrictions in conjunction with what BFO calls “realizables”:

A statement such as

Anisotropine methylbromide subclassOf has_role some Anti-ulcer drug

in ChEBI asserts that each and every anisotropine methylbromide molecule has the role of an anti-ulcer drug. However, this role may never be realized for a particular molecule instance, since that molecule may play a different role in the treatment of a different disease, or play no role at all. It is thus problematic to assert an existential dependence between the molecule and the realization of the role (in the treatment of an ulcer)

This is a reasonable philosophical analysis. But are there actually any negative consequences for a user of the ontology or for reasoning? Does it lead to any incorrect inferences? I’m not convinced that an existential restriction is so wrong here. The problems uncovered with this example are really to do with some obscure conditions on bfo roles (ie all roles are realized – if roles were like dispositions this would not be a problem) and to be fair on the CHEBI people they might not have been aware of that when they made the axiom (BFO needs better more user-friendly documentation).

The same “problem” is uncovered with some of the GO MF cross products, but this time the mistake lies with the authors. They say:

This is particularly apparent in the Gene Ontology molecular function ontology. For example, the statement

tRNA sulfurtransferase subClassOf

has_input some Transfer RNA

asserts a dependency of every instance of tRNA sulfurtransferase on some instance of Transfer RNA. Functions include the possibility that the bearer of a function is never involved in any process that realizes the function, thus may never have input molecules. This kind of error predominates in the Cross Product sample, especially in the cross product ‘GO Molecular Function X ChEBI’. Interrater agreement was low here because of two conflicting positions: (1) the assertion is false, because functions can remain unrealized, or (2) the assertion is true, but the categorization as a function is false, as implied by the suffix “activity”.

In fact this latter interpretation (2) is the correct one. The term in the GO is “tRNA sulfurtransferase activity“. Now, the authors do have a good point here – the ontological commitment of GO towards BFO was unclear here (this is now made more explicit with an ontology of bridging axioms that make “catalytic acitivity” a subclass of bfo:process – but note this is still controversial with the BFO people). With the correct intepretation, the authors statement that “This kind of error predominates in the Cross Product sample” is not supported. The authors have simply jumped to the conclusion that everything in GO MF must be a BFO function based on the name of the ontology (which was named by biologists, not philosophers) and extrapolated from this that the ontology is full of errors, in particular unintended consequences of existential restrictions.

Interestingly, whilst focusing on the inconsequential problem of violation of existential dependence, they missed the real unintended consequences. In fact there is a lurking closed world assumption with all of these GO MF logical definitions, and OWL is open world! Each of the reactions that’s defined in terms of inputs and outputs should explicitly state the cardinality of all participants, and in addition there needs to be a closure axiom to say there are no additional inputs or outputs! Unlike the philosophical problem of positing existential dependence between a function and a (potential) continuant (which is not even a problem here, as the reactions are intended to be interpreted as bfo:processes),  this gives results that are empirically wrong! So there was a real serious example of unintended consequences that were missed.

It’s important to bear in mind that some of these cross-product files are separate from the main ontology, not fully official not of as high quality as the main ontology. The GO BP chebi ones are maintained by the GO editors and are high quality, but the definitions for GO MF reactions were created by me, semi-automatically, and not as high quality. This draws attention to the need for better documentation here – if the paper had simply criticized the lack of documentation and clear commitment to BFO they would have been spot on, but instead they use this to make spurious claims.

The authors are on shaky ground with anatomy ontologies again:

Time dependencies

These are commonly expressed in ontologies encoding development or other time-dependent processes. Kinds of participation in such time dependent processes can be difficult to pin down as can the exact ontological dependence between the process and the material entities. The start and end relations are intending to express just such time dependencies to do with the development of anatomical structures.

Pharyngeal endoderm subClassOf

end some

Pharyngula:Prim-15 Roof plate rhombomere 5

subClassOf

start some Segmentation:10-13 somites

However, the stages of development mentioned may not be complete before the material entity comes fully into existence. They also may not be complete when the material entity stops existing. It is difficult to claim a processual entity (which extends over time) is ontologically necessary for a material entity to exist (the claim of existential dependence) unless the material entity was a clear output of this process. The solution here is, again, to substitute existential restriction by value restriction, such as

Pharyngeal endoderm subClassOf

end only Pharyngula:Prim-15

It’s difficult to see a real problem here. So what if the stages are not completed? The authors’ problem is now with the generated OWL but with the strong existential dependence claim: “It is difficult to claim a processual entity (which extends over time) is ontologically necessary for a material entity to exist (the claim of existential dependence) unless the material entity was a clear output of this process“. To which I would reply: so what? The authors are testing a claim that is too strong. The OWL is correct, and the OWL does not make any claims about existential dependence, that claim is in the authors’ minds. It’s difficult to see any practical problems with the OWL representation of the ZFA relations here. If the problem is purely philosophical, this should be published in a philosophical journal.

Furthermore, the supposed correct solution to this non-problem is terrible: using a universal restriction means fixing at a single artificial level of granularity for the stages (i.e we can’t have a property chain end o part_of -> end, which would lead to incorrect inferences).

Again, there are some real problems with some ontologies hidden here:

  • the start and end relations are undefined. There are a number of people working on this, but admittedly it’s lame there is no standard solution in place yet. We should at least have some better documentation for start/end.
  • there are a lot of hidden assumptions in OBO ontologies regarding how applicable each ontology is for the full range of variation found in nature. The FMA has a whole story about “canonical anatomy” here. For many biological ontologies there’s a shared understanding between authors and users that the ontology represents a simplified reference picture, and in fact there may be the occasional zebrafish pharyngeal endoderm that ends a bit after or a bit before the prim-15 stage. We the OBO Foundry could be doing more to ensure this is explicit

If the authors had highlighted this I would have agreed wholeheartedly and apologized for the current state of affairs. However, this doesn’t really have anything to do with “unintended consequences of existential quantifications”. It’s just a plain case of lack of documentation (not that this is excusable, but the point is that the paper is not titled “lack of documentation in biomedical ontologies”)

Finally, the authors also include a discussion of the use of existential restrictions in conjunction with relations such as lacks_part. This part is fairly reasonable but most of it has been said in other publications. There are actually some subtleties here when it comes to phenotype ontologies, but this is best addressed in a separate post. There is a solution partly in place now involving shortcut relations, but this wasn’t mature when the authors wrote the paper, so fair enough.

Overall I wasn’t convinced by these results. The results were not externally validated (this would have  been easy to do – for example, by contacting the ontology authors or pointing out the error on a tracker) and relied on subjective opinions of the authors (even then they largely did not agree). In addition, the relationships were being tested for existential dependence, and it’s no surprise that continuant-stage relationships don’t conform to this, nor is this a problem.

Based on these results, the authors go on to conclude:

Our scrutiny of the OBO Foundry candidate ontologies and cross products yielded a relatively high proportion of inappropriate usages of simple logical constructors. Only focusing on the proper use of existential restriction in class definitions, we found up to 23% of unintended consequences in these constructions. Many Foundry ontologies are widely used throughout the biomedical domain, and therefore such a high error rate seems surprising.

[my emphasis]. To a casual reader the “23%” sounds terrible. But remember:

  • the authors made mistakes in their evaluation – e.g. with GO
  • the authors over-interpreted in many cases, leading to inflation of numbers
  • in the case of roles, the problem is really in adhering to the BFO definition
  • the unintended consequences are largely philosophical consequences regarding existential dependence rather than consequences that would manifest computationally in reasoning.

The last sentence is telling:

Many Foundry ontologies are widely used throughout the biomedical domain, and therefore such a high error rate seems surprising

Indeed, it would be surprising if this were the case. The ZFA is used every day to power gene expression queries on the ZFIN website. Why haven’t any of their users cottoned on these errors? It’s a mystery. Or perhaps not. Perhaps the authors are seeing errors when there are in fact none.

Most of the existential restrictions are in fact, contrary to what the authors claim, intended to be existential restrictions. In some cases, such as the “smoking may cause cancer” type examples, the problems only exist on a philosophical level, and even then if you make certain philosophical assumptions. Saying smoking “SubClassOf may_cause some cancer” would be an example of an unintended consequence according to the authors, because it implies that every instance of smoking is existentially dependent on some instance of cancer, which is philosophically problematic (to some). Nevertheless, it’s well known many people working with OWL ontologies use this idiom because there’s no modal operator in OWL, it’s practical and gives the desired inferences. See What causes pneumonia for more discussion.

The authors go on:

We hypothesize that the main and only reason why this has little affected the usefulness of these ontologies up to now is due to their predominant use as controlled vocabularies rather than as computable ontologies. Misinterpretations of this sort can cause unforeseeable side effects once these ontologies are used for machine reasoning, and the use of logic-based reasoning based on biomedical ontologies is increasing with the advent of intelligent tools surrounding the adoption of the OWL language.

In fact it would be easy to test this hypothesis. If it were true, then it should be possible to add biologically justifiable disjointness axioms to the ontology and then use a reasoner to find the unsatisfiable classes that arise from the purported incorrect use of existential restrictions. It is a shame the authors did not take this empirical approach and instead opted for a more subjective ranking approach.

In fact the transition from weakly axiomatized ontologies to strongly axiomatized ones is happening, and this is uncovering a lot of problems through reasoning. But the problems being uncovered are generally not due to unintended consequences of existential quantifications. The authors widely miss the mark on their evaluation of the problem.

But the authors do end with an excellent point:

Another problem that hindered our experiments is the unclear ontological commitment of many classes and relations in OBO ontologies, which makes it nearly impossible to reach consensus about the truth-value of many of their axioms. This involves not only ambiguities in ontological interpretation of the classes included in the ontologies but also the proliferation of relations which were poorly defined. To address this shortcoming, ontologies can rely on more expressive languages and axiom systems in which the intended semantics of the relations used are constrained, as is done for the OBO relation ontology

The only objection I have there is to point out that most OBO ontologies don’t use a proliferation of relations – the authors are referring to some of the cross-product extensions here. But point taken – some relations need better definitions,  the cross product files are of variable quality and known issues should be documented.

If this were the thesis off the paper I would have less of a problem. However, the paper makes a stronger claim, namely that 23% of the existential restrictions are wrong and should be changed to some other logical constructor (with the implication that this is due to ambiguities in obo-format). Two (hopefully unintended) consequences of this paper are muddying the waters on the semantics of obo-format and spreading misinformation about the quality of relational statements in the OBO library.

This needs to be countered:

  • The official semantics of OBO-Format are such that every relationship tag is by default interpreted as a subclass of an existential restriction. This can be overridden in some circumstances, but in practice is rarely done, see http://oboformat.org/
  • If something is not clear, ask on the obo-format list rather than basing a paper on a misunderstanding
  • Most obo-format ontology authors have sufficient understanding of the all-some semantics such that you should trust the OWL that comes out the other end.
  • If you don’t trust it, then report the problem to the ontology via their tracker.
  • If you think you’ve uncovered systematic errors in either the underlying ontology or in the translation to OWL, verify there really is an error using the appropriate mechanism (e.g. trackers) before jumping to conclusions and writing a paper falsely claiming a 23% error rate. There are in fact many problems with many ontologies, but unintended consequences of existential quantification is are not among them, except in a small number of cases (e.g. CHEBI), which have yet to be shown to cause any harm, but nevertheless need better documentation.