A response to “Unintended consequences of existential quantifications in biomedical ontologies”

In Unintended consequences of existential quantifications in biomedical ontologies, Boeker et al attempt to

…scrutinize the OWL-DL releases of OBO ontologies to assess whether their logical axioms correspond to the meaning intended by their authors

The authors examine existential restriction axioms in a number of ontologies (whose source is in obo-format) and rate them according to the correspondence between the semantics and the presumed author intent. They claim:

  • usability issues with OBO ontologies
  • lack of ontological commitment for several common terms
  • proliferation of domain-specific relations
  • numerous assertions which do not properly describe the underlying biological reality, or are ambiguous and difficult to interpret.

The proposed solution:

The solution is a better anchoring in upper ontologies and a restriction to relatively few, well defined relation types with given domain and range constraints

I think this is an interesting paper, and have great respect for all the authors involved. However, I find some of the claims to be suspect and need countered. I do think the paper shows that we need much better ontology and ontology-technology documentation from the obo foundry effort (which I am a part of); however, I think the authors have read far too much into the lack of documentation and consequently muddy the issues on a number of matters.

The initial misunderstanding is presented at the start of the paper:

This extract asserts the relationship part_of between the terms ankle and hindlimb in OBO format.

[Term]
id: MA:0000043
name: ankle
relationship: part of MA:0000026 ! hindlimb

This assertion does not commit to a semantics in terms of the real world entities which are denoted by the terms. It does not allow us to infer that, e.g., all hindlimbs have ankles, or all ankles are part of a hindlimb. Descriptions at this level require some kind of ontological interpretation for the OBO syntax in terms of OWL axioms, as OWL axioms are explicitly quantified

In fact this is incorrect. There is an ontological interpretation for the OBO syntax in terms of OWL axioms (which the authors provide, falsely stating that it is “one such interpretation”):

Ankle subClassOf part_of some Hindlimb

The authors provide links to official documentation confirming that this is the correct interpretation. They then go on to say:

Our mouse limb example could therefore be alternatively translated into at least the following three OWL expressions:

(i) Ankle subClassOf part_of some Hindlimb

(ii) Ankle subClassOf part_of exactly 1 Hindlimb

(iii) Ankle subClassOf part_of only Hindlimb

In fact there is some legitimate confusion over interpretation of relations due to the impedance mismatch between the treatment of time in the 2005 Relations Ontology paper and what is possible in OWL. But positing additional unwarranted interpretations just muddies the waters. In fact, regardless of the time issue, the RO 2005 paper is quite clear that the relations used should be read in an all-some fashion (ie interpretation (i)). This is consistent with what the Goldbreich/Horrocks translation and its current successor the obof1.4 specification, all of which are cited by the authors.

This claimed lack of a standard interpretation informs the main thesis advanced by the authors: the translation of obo-format relationships to existential restrictions is not always what ontology authors intend. In fact they are testing for something stronger, specifically the claim that every such translated existential restriction implies existential dependence, where this is defined:

x dependsG for its existence upon Fs = df

Necessarily, x exists only if some F exists

It is worth noting that dependence claim they are testing is a strong one, is stronger than anything in the OWL semantics and would be violated by a number of other ontologies, many in OWL such as the NCIt due to the prevalence of “may_do_X” type relations. This is a subtle point that may escape the casual reader of the paper.

The authors examined axioms in a number of ontologies and evaluated them to see whether there were uses of existential restrictions where this strong dependence claim is not justified. Their test set included ontologies from the OBO library as well as a number of external support ontologies (aka “cross product” ontologies). Most of these ontologies currently use obo-format as their source. They did not invite external domain experts, and they did not check their results with the authors of the ontologies.

The authors provide examples where they believe there are unintended consequences of existential restrictions, based on this strong interpretation. Many of the examples they provide are problematic, as I will illustrate.

They provide this example from the GO:

“Interkinetic nuclear migration SubClassOf

part_of some Cell proliferation in forebrain

The ontological dependence expressed by this assertion is that there are no interkinetic nuclear migration processes without a corresponding cell proliferation in forebrain process. This is obviously false, since interkinetic nuclear migration is a very fundamental cell process, which is not limited to forebrains. An easy fix to this error is the inversion of the expression by using the inverse relationship:

Cell proliferation in forebrain subclassOf

has_part some Interkinetic nuclear migration”

In fact, the GO editors are well aware of the all-some interpretation, they did intend to say that all instances of IKNM are in a forebrain, this is clear from the textual definition (I have highlighted the relevant part):

[Term]
id: GO:0022027
name: interkinetic nuclear migration
def: “The movement of the nucleus of the ventricular zone cell between the apical and the basal zone surfaces. Mitosis occurs when the nucleus is near the apical surface, that is, the lumen of the ventricle.” [GO_REF:0000021, GOC:cls, GOC:dgh, GOC:dph, GOC:jid, GOC:mtg_15jun06]
is_a: GO:0051647 ! nucleus localization
relationship: part_of GO:0021846 ! cell proliferation in forebrain

The mistake GO has made is giving the class a misleadingly generic label. This kind of thing is not unheard of in the GO – a class is given a label that has a specific meaning to one community when in fact the label is used more generally by a wider community. This is not to understate this kind of mistake – it’s actually quite serious (annotators are meant to always read the definition but unfortunately this rule isn’t always followed). However, the problem is entirely terminological and not in any way related interpretations of the relationship tag or existential quantification. The creators of this class really did intend to restrict the location to the forebrain (This was confirmed by one of the GO editors listed as provenance for the definition).

The authors are on safer ground with their analysis of structural relations such as has_parent_hydride in CHEBI. I don’t have such a problem here, but it would have been useful to see the claims tested. Can we use a reasoner to determine an inconsistency in the ontology (supplemented with additional axioms) using a reasoner? It seems that the problem is less in the computational properties of the existential restriction, and more in the existential dependence claim (which, remember, is stronger than what is claimed by the OWL semantics).

They also cover what they perceive to be a problem with the use of existential restrictions in conjunction with what BFO calls “realizables”:

A statement such as

Anisotropine methylbromide subclassOf has_role some Anti-ulcer drug

in ChEBI asserts that each and every anisotropine methylbromide molecule has the role of an anti-ulcer drug. However, this role may never be realized for a particular molecule instance, since that molecule may play a different role in the treatment of a different disease, or play no role at all. It is thus problematic to assert an existential dependence between the molecule and the realization of the role (in the treatment of an ulcer)

This is a reasonable philosophical analysis. But are there actually any negative consequences for a user of the ontology or for reasoning? Does it lead to any incorrect inferences? I’m not convinced that an existential restriction is so wrong here. The problems uncovered with this example are really to do with some obscure conditions on bfo roles (ie all roles are realized – if roles were like dispositions this would not be a problem) and to be fair on the CHEBI people they might not have been aware of that when they made the axiom (BFO needs better more user-friendly documentation).

The same “problem” is uncovered with some of the GO MF cross products, but this time the mistake lies with the authors. They say:

This is particularly apparent in the Gene Ontology molecular function ontology. For example, the statement

tRNA sulfurtransferase subClassOf

has_input some Transfer RNA

asserts a dependency of every instance of tRNA sulfurtransferase on some instance of Transfer RNA. Functions include the possibility that the bearer of a function is never involved in any process that realizes the function, thus may never have input molecules. This kind of error predominates in the Cross Product sample, especially in the cross product ‘GO Molecular Function X ChEBI’. Interrater agreement was low here because of two conflicting positions: (1) the assertion is false, because functions can remain unrealized, or (2) the assertion is true, but the categorization as a function is false, as implied by the suffix “activity”.

In fact this latter interpretation (2) is the correct one. The term in the GO is “tRNA sulfurtransferase activity“. Now, the authors do have a good point here – the ontological commitment of GO towards BFO was unclear here (this is now made more explicit with an ontology of bridging axioms that make “catalytic acitivity” a subclass of bfo:process – but note this is still controversial with the BFO people). With the correct intepretation, the authors statement that “This kind of error predominates in the Cross Product sample” is not supported. The authors have simply jumped to the conclusion that everything in GO MF must be a BFO function based on the name of the ontology (which was named by biologists, not philosophers) and extrapolated from this that the ontology is full of errors, in particular unintended consequences of existential restrictions.

Interestingly, whilst focusing on the inconsequential problem of violation of existential dependence, they missed the real unintended consequences. In fact there is a lurking closed world assumption with all of these GO MF logical definitions, and OWL is open world! Each of the reactions that’s defined in terms of inputs and outputs should explicitly state the cardinality of all participants, and in addition there needs to be a closure axiom to say there are no additional inputs or outputs! Unlike the philosophical problem of positing existential dependence between a function and a (potential) continuant (which is not even a problem here, as the reactions are intended to be interpreted as bfo:processes),  this gives results that are empirically wrong! So there was a real serious example of unintended consequences that were missed.

It’s important to bear in mind that some of these cross-product files are separate from the main ontology, not fully official not of as high quality as the main ontology. The GO BP chebi ones are maintained by the GO editors and are high quality, but the definitions for GO MF reactions were created by me, semi-automatically, and not as high quality. This draws attention to the need for better documentation here – if the paper had simply criticized the lack of documentation and clear commitment to BFO they would have been spot on, but instead they use this to make spurious claims.

The authors are on shaky ground with anatomy ontologies again:

Time dependencies

These are commonly expressed in ontologies encoding development or other time-dependent processes. Kinds of participation in such time dependent processes can be difficult to pin down as can the exact ontological dependence between the process and the material entities. The start and end relations are intending to express just such time dependencies to do with the development of anatomical structures.

Pharyngeal endoderm subClassOf

end some

Pharyngula:Prim-15 Roof plate rhombomere 5

subClassOf

start some Segmentation:10-13 somites

However, the stages of development mentioned may not be complete before the material entity comes fully into existence. They also may not be complete when the material entity stops existing. It is difficult to claim a processual entity (which extends over time) is ontologically necessary for a material entity to exist (the claim of existential dependence) unless the material entity was a clear output of this process. The solution here is, again, to substitute existential restriction by value restriction, such as

Pharyngeal endoderm subClassOf

end only Pharyngula:Prim-15

It’s difficult to see a real problem here. So what if the stages are not completed? The authors’ problem is now with the generated OWL but with the strong existential dependence claim: “It is difficult to claim a processual entity (which extends over time) is ontologically necessary for a material entity to exist (the claim of existential dependence) unless the material entity was a clear output of this process“. To which I would reply: so what? The authors are testing a claim that is too strong. The OWL is correct, and the OWL does not make any claims about existential dependence, that claim is in the authors’ minds. It’s difficult to see any practical problems with the OWL representation of the ZFA relations here. If the problem is purely philosophical, this should be published in a philosophical journal.

Furthermore, the supposed correct solution to this non-problem is terrible: using a universal restriction means fixing at a single artificial level of granularity for the stages (i.e we can’t have a property chain end o part_of -> end, which would lead to incorrect inferences).

Again, there are some real problems with some ontologies hidden here:

  • the start and end relations are undefined. There are a number of people working on this, but admittedly it’s lame there is no standard solution in place yet. We should at least have some better documentation for start/end.
  • there are a lot of hidden assumptions in OBO ontologies regarding how applicable each ontology is for the full range of variation found in nature. The FMA has a whole story about “canonical anatomy” here. For many biological ontologies there’s a shared understanding between authors and users that the ontology represents a simplified reference picture, and in fact there may be the occasional zebrafish pharyngeal endoderm that ends a bit after or a bit before the prim-15 stage. We the OBO Foundry could be doing more to ensure this is explicit

If the authors had highlighted this I would have agreed wholeheartedly and apologized for the current state of affairs. However, this doesn’t really have anything to do with “unintended consequences of existential quantifications”. It’s just a plain case of lack of documentation (not that this is excusable, but the point is that the paper is not titled “lack of documentation in biomedical ontologies”)

Finally, the authors also include a discussion of the use of existential restrictions in conjunction with relations such as lacks_part. This part is fairly reasonable but most of it has been said in other publications. There are actually some subtleties here when it comes to phenotype ontologies, but this is best addressed in a separate post. There is a solution partly in place now involving shortcut relations, but this wasn’t mature when the authors wrote the paper, so fair enough.

Overall I wasn’t convinced by these results. The results were not externally validated (this would have  been easy to do – for example, by contacting the ontology authors or pointing out the error on a tracker) and relied on subjective opinions of the authors (even then they largely did not agree). In addition, the relationships were being tested for existential dependence, and it’s no surprise that continuant-stage relationships don’t conform to this, nor is this a problem.

Based on these results, the authors go on to conclude:

Our scrutiny of the OBO Foundry candidate ontologies and cross products yielded a relatively high proportion of inappropriate usages of simple logical constructors. Only focusing on the proper use of existential restriction in class definitions, we found up to 23% of unintended consequences in these constructions. Many Foundry ontologies are widely used throughout the biomedical domain, and therefore such a high error rate seems surprising.

[my emphasis]. To a casual reader the “23%” sounds terrible. But remember:

  • the authors made mistakes in their evaluation – e.g. with GO
  • the authors over-interpreted in many cases, leading to inflation of numbers
  • in the case of roles, the problem is really in adhering to the BFO definition
  • the unintended consequences are largely philosophical consequences regarding existential dependence rather than consequences that would manifest computationally in reasoning.

The last sentence is telling:

Many Foundry ontologies are widely used throughout the biomedical domain, and therefore such a high error rate seems surprising

Indeed, it would be surprising if this were the case. The ZFA is used every day to power gene expression queries on the ZFIN website. Why haven’t any of their users cottoned on these errors? It’s a mystery. Or perhaps not. Perhaps the authors are seeing errors when there are in fact none.

Most of the existential restrictions are in fact, contrary to what the authors claim, intended to be existential restrictions. In some cases, such as the “smoking may cause cancer” type examples, the problems only exist on a philosophical level, and even then if you make certain philosophical assumptions. Saying smoking “SubClassOf may_cause some cancer” would be an example of an unintended consequence according to the authors, because it implies that every instance of smoking is existentially dependent on some instance of cancer, which is philosophically problematic (to some). Nevertheless, it’s well known many people working with OWL ontologies use this idiom because there’s no modal operator in OWL, it’s practical and gives the desired inferences. See What causes pneumonia for more discussion.

The authors go on:

We hypothesize that the main and only reason why this has little affected the usefulness of these ontologies up to now is due to their predominant use as controlled vocabularies rather than as computable ontologies. Misinterpretations of this sort can cause unforeseeable side effects once these ontologies are used for machine reasoning, and the use of logic-based reasoning based on biomedical ontologies is increasing with the advent of intelligent tools surrounding the adoption of the OWL language.

In fact it would be easy to test this hypothesis. If it were true, then it should be possible to add biologically justifiable disjointness axioms to the ontology and then use a reasoner to find the unsatisfiable classes that arise from the purported incorrect use of existential restrictions. It is a shame the authors did not take this empirical approach and instead opted for a more subjective ranking approach.

In fact the transition from weakly axiomatized ontologies to strongly axiomatized ones is happening, and this is uncovering a lot of problems through reasoning. But the problems being uncovered are generally not due to unintended consequences of existential quantifications. The authors widely miss the mark on their evaluation of the problem.

But the authors do end with an excellent point:

Another problem that hindered our experiments is the unclear ontological commitment of many classes and relations in OBO ontologies, which makes it nearly impossible to reach consensus about the truth-value of many of their axioms. This involves not only ambiguities in ontological interpretation of the classes included in the ontologies but also the proliferation of relations which were poorly defined. To address this shortcoming, ontologies can rely on more expressive languages and axiom systems in which the intended semantics of the relations used are constrained, as is done for the OBO relation ontology

The only objection I have there is to point out that most OBO ontologies don’t use a proliferation of relations – the authors are referring to some of the cross-product extensions here. But point taken – some relations need better definitions,  the cross product files are of variable quality and known issues should be documented.

If this were the thesis off the paper I would have less of a problem. However, the paper makes a stronger claim, namely that 23% of the existential restrictions are wrong and should be changed to some other logical constructor (with the implication that this is due to ambiguities in obo-format). Two (hopefully unintended) consequences of this paper are muddying the waters on the semantics of obo-format and spreading misinformation about the quality of relational statements in the OBO library.

This needs to be countered:

  • The official semantics of OBO-Format are such that every relationship tag is by default interpreted as a subclass of an existential restriction. This can be overridden in some circumstances, but in practice is rarely done, see http://oboformat.org/
  • If something is not clear, ask on the obo-format list rather than basing a paper on a misunderstanding
  • Most obo-format ontology authors have sufficient understanding of the all-some semantics such that you should trust the OWL that comes out the other end.
  • If you don’t trust it, then report the problem to the ontology via their tracker.
  • If you think you’ve uncovered systematic errors in either the underlying ontology or in the translation to OWL, verify there really is an error using the appropriate mechanism (e.g. trackers) before jumping to conclusions and writing a paper falsely claiming a 23% error rate. There are in fact many problems with many ontologies, but unintended consequences of existential quantification is are not among them, except in a small number of cases (e.g. CHEBI), which have yet to be shown to cause any harm, but nevertheless need better documentation.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: