A response to “Unintended consequences of existential quantifications in biomedical ontologies”

In Unintended consequences of existential quantifications in biomedical ontologies, Boeker et al attempt to

…scrutinize the OWL-DL releases of OBO ontologies to assess whether their logical axioms correspond to the meaning intended by their authors

The authors examine existential restriction axioms in a number of ontologies (whose source is in obo-format) and rate them according to the correspondence between the semantics and the presumed author intent. They claim:

  • usability issues with OBO ontologies
  • lack of ontological commitment for several common terms
  • proliferation of domain-specific relations
  • numerous assertions which do not properly describe the underlying biological reality, or are ambiguous and difficult to interpret.

The proposed solution:

The solution is a better anchoring in upper ontologies and a restriction to relatively few, well defined relation types with given domain and range constraints

I think this is an interesting paper, and have great respect for all the authors involved. However, I find some of the claims to be suspect and need countered. I do think the paper shows that we need much better ontology and ontology-technology documentation from the obo foundry effort (which I am a part of); however, I think the authors have read far too much into the lack of documentation and consequently muddy the issues on a number of matters.

The initial misunderstanding is presented at the start of the paper:

This extract asserts the relationship part_of between the terms ankle and hindlimb in OBO format.

[Term]
id: MA:0000043
name: ankle
relationship: part of MA:0000026 ! hindlimb

This assertion does not commit to a semantics in terms of the real world entities which are denoted by the terms. It does not allow us to infer that, e.g., all hindlimbs have ankles, or all ankles are part of a hindlimb. Descriptions at this level require some kind of ontological interpretation for the OBO syntax in terms of OWL axioms, as OWL axioms are explicitly quantified

In fact this is incorrect. There is an ontological interpretation for the OBO syntax in terms of OWL axioms (which the authors provide, falsely stating that it is “one such interpretation”):

Ankle subClassOf part_of some Hindlimb

The authors provide links to official documentation confirming that this is the correct interpretation. They then go on to say:

Our mouse limb example could therefore be alternatively translated into at least the following three OWL expressions:

(i) Ankle subClassOf part_of some Hindlimb

(ii) Ankle subClassOf part_of exactly 1 Hindlimb

(iii) Ankle subClassOf part_of only Hindlimb

In fact there is some legitimate confusion over interpretation of relations due to the impedance mismatch between the treatment of time in the 2005 Relations Ontology paper and what is possible in OWL. But positing additional unwarranted interpretations just muddies the waters. In fact, regardless of the time issue, the RO 2005 paper is quite clear that the relations used should be read in an all-some fashion (ie interpretation (i)). This is consistent with what the Goldbreich/Horrocks translation and its current successor the obof1.4 specification, all of which are cited by the authors.

This claimed lack of a standard interpretation informs the main thesis advanced by the authors: the translation of obo-format relationships to existential restrictions is not always what ontology authors intend. In fact they are testing for something stronger, specifically the claim that every such translated existential restriction implies existential dependence, where this is defined:

x dependsG for its existence upon Fs = df

Necessarily, x exists only if some F exists

It is worth noting that dependence claim they are testing is a strong one, is stronger than anything in the OWL semantics and would be violated by a number of other ontologies, many in OWL such as the NCIt due to the prevalence of “may_do_X” type relations. This is a subtle point that may escape the casual reader of the paper.

The authors examined axioms in a number of ontologies and evaluated them to see whether there were uses of existential restrictions where this strong dependence claim is not justified. Their test set included ontologies from the OBO library as well as a number of external support ontologies (aka “cross product” ontologies). Most of these ontologies currently use obo-format as their source. They did not invite external domain experts, and they did not check their results with the authors of the ontologies.

The authors provide examples where they believe there are unintended consequences of existential restrictions, based on this strong interpretation. Many of the examples they provide are problematic, as I will illustrate.

They provide this example from the GO:

“Interkinetic nuclear migration SubClassOf

part_of some Cell proliferation in forebrain

The ontological dependence expressed by this assertion is that there are no interkinetic nuclear migration processes without a corresponding cell proliferation in forebrain process. This is obviously false, since interkinetic nuclear migration is a very fundamental cell process, which is not limited to forebrains. An easy fix to this error is the inversion of the expression by using the inverse relationship:

Cell proliferation in forebrain subclassOf

has_part some Interkinetic nuclear migration”

In fact, the GO editors are well aware of the all-some interpretation, they did intend to say that all instances of IKNM are in a forebrain, this is clear from the textual definition (I have highlighted the relevant part):

[Term]
id: GO:0022027
name: interkinetic nuclear migration
def: “The movement of the nucleus of the ventricular zone cell between the apical and the basal zone surfaces. Mitosis occurs when the nucleus is near the apical surface, that is, the lumen of the ventricle.” [GO_REF:0000021, GOC:cls, GOC:dgh, GOC:dph, GOC:jid, GOC:mtg_15jun06]
is_a: GO:0051647 ! nucleus localization
relationship: part_of GO:0021846 ! cell proliferation in forebrain

The mistake GO has made is giving the class a misleadingly generic label. This kind of thing is not unheard of in the GO – a class is given a label that has a specific meaning to one community when in fact the label is used more generally by a wider community. This is not to understate this kind of mistake – it’s actually quite serious (annotators are meant to always read the definition but unfortunately this rule isn’t always followed). However, the problem is entirely terminological and not in any way related interpretations of the relationship tag or existential quantification. The creators of this class really did intend to restrict the location to the forebrain (This was confirmed by one of the GO editors listed as provenance for the definition).

The authors are on safer ground with their analysis of structural relations such as has_parent_hydride in CHEBI. I don’t have such a problem here, but it would have been useful to see the claims tested. Can we use a reasoner to determine an inconsistency in the ontology (supplemented with additional axioms) using a reasoner? It seems that the problem is less in the computational properties of the existential restriction, and more in the existential dependence claim (which, remember, is stronger than what is claimed by the OWL semantics).

They also cover what they perceive to be a problem with the use of existential restrictions in conjunction with what BFO calls “realizables”:

A statement such as

Anisotropine methylbromide subclassOf has_role some Anti-ulcer drug

in ChEBI asserts that each and every anisotropine methylbromide molecule has the role of an anti-ulcer drug. However, this role may never be realized for a particular molecule instance, since that molecule may play a different role in the treatment of a different disease, or play no role at all. It is thus problematic to assert an existential dependence between the molecule and the realization of the role (in the treatment of an ulcer)

This is a reasonable philosophical analysis. But are there actually any negative consequences for a user of the ontology or for reasoning? Does it lead to any incorrect inferences? I’m not convinced that an existential restriction is so wrong here. The problems uncovered with this example are really to do with some obscure conditions on bfo roles (ie all roles are realized – if roles were like dispositions this would not be a problem) and to be fair on the CHEBI people they might not have been aware of that when they made the axiom (BFO needs better more user-friendly documentation).

The same “problem” is uncovered with some of the GO MF cross products, but this time the mistake lies with the authors. They say:

This is particularly apparent in the Gene Ontology molecular function ontology. For example, the statement

tRNA sulfurtransferase subClassOf

has_input some Transfer RNA

asserts a dependency of every instance of tRNA sulfurtransferase on some instance of Transfer RNA. Functions include the possibility that the bearer of a function is never involved in any process that realizes the function, thus may never have input molecules. This kind of error predominates in the Cross Product sample, especially in the cross product ‘GO Molecular Function X ChEBI’. Interrater agreement was low here because of two conflicting positions: (1) the assertion is false, because functions can remain unrealized, or (2) the assertion is true, but the categorization as a function is false, as implied by the suffix “activity”.

In fact this latter interpretation (2) is the correct one. The term in the GO is “tRNA sulfurtransferase activity“. Now, the authors do have a good point here – the ontological commitment of GO towards BFO was unclear here (this is now made more explicit with an ontology of bridging axioms that make “catalytic acitivity” a subclass of bfo:process – but note this is still controversial with the BFO people). With the correct intepretation, the authors statement that “This kind of error predominates in the Cross Product sample” is not supported. The authors have simply jumped to the conclusion that everything in GO MF must be a BFO function based on the name of the ontology (which was named by biologists, not philosophers) and extrapolated from this that the ontology is full of errors, in particular unintended consequences of existential restrictions.

Interestingly, whilst focusing on the inconsequential problem of violation of existential dependence, they missed the real unintended consequences. In fact there is a lurking closed world assumption with all of these GO MF logical definitions, and OWL is open world! Each of the reactions that’s defined in terms of inputs and outputs should explicitly state the cardinality of all participants, and in addition there needs to be a closure axiom to say there are no additional inputs or outputs! Unlike the philosophical problem of positing existential dependence between a function and a (potential) continuant (which is not even a problem here, as the reactions are intended to be interpreted as bfo:processes),  this gives results that are empirically wrong! So there was a real serious example of unintended consequences that were missed.

It’s important to bear in mind that some of these cross-product files are separate from the main ontology, not fully official not of as high quality as the main ontology. The GO BP chebi ones are maintained by the GO editors and are high quality, but the definitions for GO MF reactions were created by me, semi-automatically, and not as high quality. This draws attention to the need for better documentation here – if the paper had simply criticized the lack of documentation and clear commitment to BFO they would have been spot on, but instead they use this to make spurious claims.

The authors are on shaky ground with anatomy ontologies again:

Time dependencies

These are commonly expressed in ontologies encoding development or other time-dependent processes. Kinds of participation in such time dependent processes can be difficult to pin down as can the exact ontological dependence between the process and the material entities. The start and end relations are intending to express just such time dependencies to do with the development of anatomical structures.

Pharyngeal endoderm subClassOf

end some

Pharyngula:Prim-15 Roof plate rhombomere 5

subClassOf

start some Segmentation:10-13 somites

However, the stages of development mentioned may not be complete before the material entity comes fully into existence. They also may not be complete when the material entity stops existing. It is difficult to claim a processual entity (which extends over time) is ontologically necessary for a material entity to exist (the claim of existential dependence) unless the material entity was a clear output of this process. The solution here is, again, to substitute existential restriction by value restriction, such as

Pharyngeal endoderm subClassOf

end only Pharyngula:Prim-15

It’s difficult to see a real problem here. So what if the stages are not completed? The authors’ problem is now with the generated OWL but with the strong existential dependence claim: “It is difficult to claim a processual entity (which extends over time) is ontologically necessary for a material entity to exist (the claim of existential dependence) unless the material entity was a clear output of this process“. To which I would reply: so what? The authors are testing a claim that is too strong. The OWL is correct, and the OWL does not make any claims about existential dependence, that claim is in the authors’ minds. It’s difficult to see any practical problems with the OWL representation of the ZFA relations here. If the problem is purely philosophical, this should be published in a philosophical journal.

Furthermore, the supposed correct solution to this non-problem is terrible: using a universal restriction means fixing at a single artificial level of granularity for the stages (i.e we can’t have a property chain end o part_of -> end, which would lead to incorrect inferences).

Again, there are some real problems with some ontologies hidden here:

  • the start and end relations are undefined. There are a number of people working on this, but admittedly it’s lame there is no standard solution in place yet. We should at least have some better documentation for start/end.
  • there are a lot of hidden assumptions in OBO ontologies regarding how applicable each ontology is for the full range of variation found in nature. The FMA has a whole story about “canonical anatomy” here. For many biological ontologies there’s a shared understanding between authors and users that the ontology represents a simplified reference picture, and in fact there may be the occasional zebrafish pharyngeal endoderm that ends a bit after or a bit before the prim-15 stage. We the OBO Foundry could be doing more to ensure this is explicit

If the authors had highlighted this I would have agreed wholeheartedly and apologized for the current state of affairs. However, this doesn’t really have anything to do with “unintended consequences of existential quantifications”. It’s just a plain case of lack of documentation (not that this is excusable, but the point is that the paper is not titled “lack of documentation in biomedical ontologies”)

Finally, the authors also include a discussion of the use of existential restrictions in conjunction with relations such as lacks_part. This part is fairly reasonable but most of it has been said in other publications. There are actually some subtleties here when it comes to phenotype ontologies, but this is best addressed in a separate post. There is a solution partly in place now involving shortcut relations, but this wasn’t mature when the authors wrote the paper, so fair enough.

Overall I wasn’t convinced by these results. The results were not externally validated (this would have  been easy to do – for example, by contacting the ontology authors or pointing out the error on a tracker) and relied on subjective opinions of the authors (even then they largely did not agree). In addition, the relationships were being tested for existential dependence, and it’s no surprise that continuant-stage relationships don’t conform to this, nor is this a problem.

Based on these results, the authors go on to conclude:

Our scrutiny of the OBO Foundry candidate ontologies and cross products yielded a relatively high proportion of inappropriate usages of simple logical constructors. Only focusing on the proper use of existential restriction in class definitions, we found up to 23% of unintended consequences in these constructions. Many Foundry ontologies are widely used throughout the biomedical domain, and therefore such a high error rate seems surprising.

[my emphasis]. To a casual reader the “23%” sounds terrible. But remember:

  • the authors made mistakes in their evaluation – e.g. with GO
  • the authors over-interpreted in many cases, leading to inflation of numbers
  • in the case of roles, the problem is really in adhering to the BFO definition
  • the unintended consequences are largely philosophical consequences regarding existential dependence rather than consequences that would manifest computationally in reasoning.

The last sentence is telling:

Many Foundry ontologies are widely used throughout the biomedical domain, and therefore such a high error rate seems surprising

Indeed, it would be surprising if this were the case. The ZFA is used every day to power gene expression queries on the ZFIN website. Why haven’t any of their users cottoned on these errors? It’s a mystery. Or perhaps not. Perhaps the authors are seeing errors when there are in fact none.

Most of the existential restrictions are in fact, contrary to what the authors claim, intended to be existential restrictions. In some cases, such as the “smoking may cause cancer” type examples, the problems only exist on a philosophical level, and even then if you make certain philosophical assumptions. Saying smoking “SubClassOf may_cause some cancer” would be an example of an unintended consequence according to the authors, because it implies that every instance of smoking is existentially dependent on some instance of cancer, which is philosophically problematic (to some). Nevertheless, it’s well known many people working with OWL ontologies use this idiom because there’s no modal operator in OWL, it’s practical and gives the desired inferences. See What causes pneumonia for more discussion.

The authors go on:

We hypothesize that the main and only reason why this has little affected the usefulness of these ontologies up to now is due to their predominant use as controlled vocabularies rather than as computable ontologies. Misinterpretations of this sort can cause unforeseeable side effects once these ontologies are used for machine reasoning, and the use of logic-based reasoning based on biomedical ontologies is increasing with the advent of intelligent tools surrounding the adoption of the OWL language.

In fact it would be easy to test this hypothesis. If it were true, then it should be possible to add biologically justifiable disjointness axioms to the ontology and then use a reasoner to find the unsatisfiable classes that arise from the purported incorrect use of existential restrictions. It is a shame the authors did not take this empirical approach and instead opted for a more subjective ranking approach.

In fact the transition from weakly axiomatized ontologies to strongly axiomatized ones is happening, and this is uncovering a lot of problems through reasoning. But the problems being uncovered are generally not due to unintended consequences of existential quantifications. The authors widely miss the mark on their evaluation of the problem.

But the authors do end with an excellent point:

Another problem that hindered our experiments is the unclear ontological commitment of many classes and relations in OBO ontologies, which makes it nearly impossible to reach consensus about the truth-value of many of their axioms. This involves not only ambiguities in ontological interpretation of the classes included in the ontologies but also the proliferation of relations which were poorly defined. To address this shortcoming, ontologies can rely on more expressive languages and axiom systems in which the intended semantics of the relations used are constrained, as is done for the OBO relation ontology

The only objection I have there is to point out that most OBO ontologies don’t use a proliferation of relations – the authors are referring to some of the cross-product extensions here. But point taken – some relations need better definitions,  the cross product files are of variable quality and known issues should be documented.

If this were the thesis off the paper I would have less of a problem. However, the paper makes a stronger claim, namely that 23% of the existential restrictions are wrong and should be changed to some other logical constructor (with the implication that this is due to ambiguities in obo-format). Two (hopefully unintended) consequences of this paper are muddying the waters on the semantics of obo-format and spreading misinformation about the quality of relational statements in the OBO library.

This needs to be countered:

  • The official semantics of OBO-Format are such that every relationship tag is by default interpreted as a subclass of an existential restriction. This can be overridden in some circumstances, but in practice is rarely done, see http://oboformat.org/
  • If something is not clear, ask on the obo-format list rather than basing a paper on a misunderstanding
  • Most obo-format ontology authors have sufficient understanding of the all-some semantics such that you should trust the OWL that comes out the other end.
  • If you don’t trust it, then report the problem to the ontology via their tracker.
  • If you think you’ve uncovered systematic errors in either the underlying ontology or in the translation to OWL, verify there really is an error using the appropriate mechanism (e.g. trackers) before jumping to conclusions and writing a paper falsely claiming a 23% error rate. There are in fact many problems with many ontologies, but unintended consequences of existential quantification is are not among them, except in a small number of cases (e.g. CHEBI), which have yet to be shown to cause any harm, but nevertheless need better documentation.

 

Advertisements

The size of Richard Nixon’s nose, part II

In part 1 we saw how to encode a “big nose phenotype” in such a way that it was neutral with respect to the path the class expression takes through the object graph, subsuming all of:

  • any entity with a nose that has the characteristic of being big
  • anything that exhibits a bigness that is a characteristic of a nose

Thus masking over the distinctions inherent in a formal ontological representation.

We can take this one step further and make our big nose phenotype encompass the nose itself, and its own bigness characteristic. The simplest way to do this would be to make the relation exhbits reflexive – either with a direct reflexivity characteristic, or a local reflexivity general axiom:

Thing SubClassOf exhibits some Self

Unfortunately this runs afoul of DL expressivity constraints. Fortunately, there is a trick at hand. A really gnarly one, but it works.

First of all we have to declare a “fake” relation – let’s append SELF onto the end:

ObjectProperty: :exhibitsSELF
SubPropertyOf: :exhibits

Now we make this reflexive:

Class: owl:Thing
SubClassOf:
:exhibitsSELF some Self

This is legal, as exhibitsSELF is a “simple” object property. Finally, we add the following:

ObjectProperty: :exhibitsSELF
SubPropertyOf: :exhibits

We have sneaked our reflexivity constraint in via a fake relation. It’s a shame that all this obfuscating machinery is required to do this, it would be nice if there were some OWL syntactic sugar.

We can do the same thing for has_part, which is traditionally reflexive:

ObjectProperty: :has_partSELF
SubPropertyOf: :has_part
ObjectProperty: :has_part
SubPropertyOf: :exhibits
Characteristics: Transitive

With that in place we can revisit our test probe classes from last time:

Class: :test1
EquivalentTo: :exhibits some (:big and :characteristic_of some :nose)

Class: :test2
EquivalentTo: :exhibits some (:has_part some (:nose and :has_characteristic some :big))

Class: :test3
EquivalentTo: :exhibits some (:nose and :has_characteristic some :big)

Class: :test4
EquivalentTo: :has_part some (:nose and :has_characteristic some :big)

Now the inferred hierarchy looks like this:

test1=test2=test3
--test4

And if we examine our 3 individuals, we see they classify as follows:

  • nixon : test1, test2, test3
  • nixons_nose : test1, test2, test3
  • nixons_nose_size: test1, test2, test3, test4

So using the exhibits relation we can encode a very general notion of phenotype, that of exhibiting some characteristic, which classifies either the organism, the affected part, or the characteristic itself.

The machinery is rather arcane though, and does require stepping outside the EL subset of OWL. In general, it is of course better to decide on a single form. Unfortunately, no one form satisfies all purposes.

An organism-centric representation is intuitive and simple. If the instances you’re classifying are organisms (e.g. humans with disorders, mutant fruitflies, rare butterfly specimens) then this works very well. It also makes it easy to represent “composite” phenotypes such as “organism with big nose and sweaty palms”. However, if we take this to the step of equating the phenotype with this representation, then we have the curious situation where the organism is the phenotype rather than the organism has a phenotype or phenotypes. If we conceive of phenotype as entirely a class level thing, then we have one organism instantiating multiple phenotypes,  but we should be clear that in this model the relationship between the phenotype instances and organism instance is identity.

A organism-part centric view is also intuitive and simple. For example “nose and has_characteristic some big”. But note the entailments we get from this – an abnormally big nose is part of an abnormal head, but it’s not a subclass of an abnormal head. This is in contrast to the relation we expect to have between the corresponding phenotypes, which is a subclass relationship (on the evidence of all pre-coordinated phenotype ontologies). So this representation is absolutely fine, but we should be clear that we are representing anatomy (perhaps variant anatomy in particular) rather than phenotypes – the relationship between the two may be trivial, and glossed over using the exhibits pattern above. But for modeling phenotypes ontologically we have to be clear about the distinction.

A characteristic-centric view is perhaps the most unintuitive – it asks us to believe in characteristics/qualities as individuals in the world, which is perfectly fine in the BFO ontology, but people may still have a hard time conceiving of this, in contrast to the more “physical” class expressions above. However, it offers distinct advantages. It allows us to talk directly about the characteristic itself – e.g. the dysplastic characteristic of John’s heart was due to the presence of a particular sequence in his Shh genes. If we try and switch this around we get into trouble; eg. if we equate the “dysplastic heart” phenotype with a class expression “heart and has characteristic dysplatic”, then we say that this phenotype arises from a Shh mutation, we lose the fact that the “dysplaticity” is the characteristic we care about, rather than any of the other characteristics of John’s heart.

One other advantage of the characteristic-centric view is that it corresponds to a more traditional view of phenotypes as the characteristics of an organism.

We adopted the quality/character-centric view for defining the phenotypes in the MP ontology (see our Genome Biology paper) – this worked fairly well when we tested it by recapitulating asserted subclasses via reasoning. However, it worked less well when we used it for HP, which includes many composite phenotypes – e.g. “large flat nose” – these cannot be equated to any single characteristic, it is in fact two characteristics. We can get around this by equating a phenotype with either an individual characteristic, or a collection of (presumably related) characteristics. More of this on the next post on this matter….

 

Ontologies and Continuous Integration

Wikipedia describes http://en.wikipedia.org/wiki/Continuous_integration as follows:

In software engineering, continuous integration (CI) implements continuous processes of applying quality control — small pieces of effort, applied frequently. Continuous integration aims to improve the quality of software, and to reduce the time taken to deliver it, by replacing the traditional practice of applying quality control after completing all development.

This description could – or should – apply equally well to ontology engineering, especially in contexts such as the OBO Foundry, where ontologies are becoming increasingly interdependent.

Jenkins is a web based environment for running integration checks. Sebastian Bauer, in Peter Robinson’s group had the idea of adapting Jenkins for performing ontology builds rather than software builds (in fact he used Hudson, but the differences between Hudson and Jenkins are minimal). He used  OORT as the tool to build the ontology — Oort takes in one or more ontologies in obo or owl, runs some non-logical and logical checks (via your choice of reasoner) and then “compiles” downstream ontologies in obo and owl formats. Converting to obo takes care of a number of stylistic checks that are non-logical and wouldn’t be caught by a reasoner (e.g. no class can have more than one text definition).

We took this idea and built our own Jenkins ontology build environment, adding ontologies that were of relevance to the projects we were working on. This turned out to be extraordinarily easy – Jenkins is very easy to install and configure, help is always just a mouse click away.

Here’s a screenshot of the main Jenkins dashboard. Ontologies have a blue ball if the last build was successful, a red ball otherwise. The weather icon is based on the “outlook” – lots of successful builds in a row gets a sunny icon. Every time an ontology is committed to a repository it triggers a build (we try and track the edit version of the ontology rather than the release version, so that we can provide direct feedback to the ontology developer). Each job can be customized – for example, if ontology A depends on ontology B, you might want to trigger a build of A whenever a new version of B is committed, allowing you to be forewarned if something in B breaks A.

main jenkins view

Below is a screenshot for the configuration settings for the go-taxon build – this is used to check if there are violations on the GO taxon constraints (dx.doi.org/10.1016/j.jbi.2010.02.002). We also include an external ontology of disjointness axioms (for various reasons its hard to include this in the main GO ontology). You can include any shell commands you like – in principle it would be possible to write a jenkins plugin for building ontologies using Oort, but for now you have to be semi-familiar with the command line and the Oort command line options:

config

Often when a job fails the Oort output can be a little cryptic – generally the protocol is to do detailed investigation using Protege and a reasoner like HermiT to track down the exact problem.

The basic idea is very simple, but works extremely well in practice. Whilst it’s generally better to have all checks performed directly in the editing environment, this isn’t always possible where multiple interdependent ontologies are concerned. The Jenkins environment we’ve built has proven popular with ontology developers, and we’d be happy to add more ontologies to it. It’s also fairly easy to set up yourself, and I’d recommend doing this for groups developing or using ontologies in a mission crticial way.

UPDATE: 2012-08-07

I uploaded some slides on ontologies and continuous integration to slideshare.

UPDATE: 2012-11-09

The article Continuous Integration of Open Biological Ontology Libraries is available on the Bio-Ontologies SIG KBlog site.

The size of Richard Nixon’s nose, part I

Consider a simple model of Richard Nixon:

Individual: :nixon
Types: :Organism
Facts: :has_part :nixons_nose

Individual: :nixons_nose
Types: :nose
Facts: :has_characteristic :nixons_nose_size

Individual: :nixons_nose_size
Types: :big

nixon haspart nose hasquality size

here’s the relations in our background ontology:

ObjectProperty: :has_part
Characteristics: Transitive

ObjectProperty: :has_characteristic
InverseOf:
:characteristic_of

ObjectProperty: :characteristic_of
InverseOf:
:has_characteristic

We have 3 entities: Nixon, his nose, and the characteristic or quality that is Richard Nixon’s nose size. We follow BFO here in individuating qualities: thus even if I had a nose of the “same” size as Richard Nixon, we would not share the same nose-size quality instance, we would each have our own unique nose-size quality instance (for a nice treatment, see Neuhaus et al [PDF]).

Now let’s look at a phenotypic label such as “big nose”. Intuitively we can see that this applies to Richard Nixon. But where exactly in this instance graph is the big nose phenotype? Is it the nose, the size, or Richard himself?

Specifically, if we have a phenotype ontology with a term “increased size of nose” or “big nose”, what OWL class expression do we assign as an equivalent? We have to make a decision as to where to root the path through our instance graph. It might be:

  • The nose: ie nose and has_characteristic some big
  • The size:  i.e. big and characteristic_of some nose
  • The organism: i.e. has_part some (nose and has_characteristic some big)
  • some unspecified thing that has a relationship to one of the above

The structure OWL class expression can be visualized as a path through the nixon graph:

Our decision affects the classification we get from reasoning. A big nose is part of a funny face, but in contrast a person with a big nose is a subclass of a person with a funny face. If you then put your reasoner results into a phenotype analysis you might get different results.

To an ordinary common sense person whose brain hasn’t been infected by ontologies, the difference between a “a nose that is increased in size” and an “increased size of nose” or a “person with a nose that’s increased in size” is just linguistic fluff, but the distinctions are important from an ontology modeling perspective.

Nevertheless, we may want to formalize the fact that we don’t care about these distinctions – we might want our “big nose” phenotype class to be any of the above.

One way would be to make fugly union classes, but this is tedious.

There is another way. We can introduce a generic “exhibits” relation. We elide a textual definition for now, the idea is that this relation captures the general notion of having a phenotype:

ObjectProperty: :exhibits
SubPropertyChain: :exhibits o :has_part
SubPropertyChain: :exhibits o :has_characteristic
SubPropertyChain: :exhibits o :characteristic_of
Characteristics: Transitive

We make this is super-relation of has_part:

ObjectProperty: :has_part
SubPropertyOf: :exhibits
Characteristics: Transitive

We can see exhibits is very promiscuous – when it connects to other relations, it makes a new exhibits relation.

How let’s make some probe classes illustrating the different ways we could define our “don’t care where we root the graph” phenotype:

Class: :test1
EquivalentTo: :exhibits some (:big and :characteristic_of some :nose)

Class: :test2
EquivalentTo: :exhibits some (:has_part some (:nose and :has_characteristic some :big))

Class: :test3
EquivalentTo: :exhibits some (:nose and :has_characteristic some :big)

Class: :test4
EquivalentTo: :has_part some (:nose and :has_characteristic some :big)

After running the reasoner we get the following inferred hierarchy:

-- test1=test3
---- test2
---- test4

So we can see we are collapsing the distinction between  “increased size of nose” and “nose that is increased in size” by instead defining a class “exhibiting an increased size of nose”.

If we then try the DL-query tab in Protege, we can see that the individual “nixon” satisfies all of these expressions.

Why is this important? It means we can join and analyze datasets without performing awkward translations. Group 1 can take a quality-centric approach, Group 2 can take an entity-centric approach, the descriptions or data from either of these groups will classify under the common “exhibits phenotype” class.

This works because of the declared inverse between has characteristic and characteristic of. Graphically we can think of this as “doubling back”:

Unfortunately, inverses put us outside EL++, so we can’t use the awesome Elk for classification.

Not-caring in ontologies is hard work!

What if we want to care even less, and formally have a “big nose phenotype” class classify either nixon, his nose, or the bigness that inheres in his nose? That’s the subject of the next post, together with some answers to the bigger question of “what is a phenotype”.

Migration of Gene Ontology bridging ontologies to OWL

In the GO, we’ve historically maintained a number of experimental extensions to the ontology as obo-format bridging files in the infamous “scratch” folder. Sometimes sets of these migrate into the main ontology (the regulation logical definitions, as well as the occurs in ones started this way). This has always been harder for axioms that point to external ontologies like chebi (although with this change it should be simpler).

These “bridging ontologies” fall into two main categories:

  • logical definitions [1]
  • taxon constraints[2]

As part of a general move to make more direct use of OWL in GO, we have moved the primary location of some of the bridging ontologies to the new SVN repository, with the primary format being OWL. Some of  See: the Ontology extensions page on the GO wiki.

This has a number of advantages:

  • the bridging axioms can be edited directly in protege, in the context of the GO and the external ontology, using an importer ontology
  • we can make use of more expressive class expressions

We will provide translations back to obo but these may be lossy.

Eventually these will move into the main ontology, but there are issues regarding import chains, MIREOTing and public releases that need resolved. For now you have to explicitly import these ontologies plus any dependent ontologies. Note that this is made easier by a combination of svn:externals and catalog xml files – see tips.

Check out the repository like this:

cd
svn co svn+ssh://ext.geneontology.org/share/go/svn/trunk go-trunk
cd go-trunk/ontology/extensions

This should result in a directory structure like this:

ontology/
  external/
    cell-ontology/
    ncbitaxon-slim/
    ..
  editors/
    gene_ontology_write.obo
  extensions/
    x-FOO.owl
    x-FOO-importer.owl
    ...
    catalog-v001.xml

The importer ontologies are designed to be loaded into Protege or programatically via OWLTools (use the new “–use-catalog” option with OWLTools).

Here is an example importer file – it contains no axioms of its own, it exists as a parent to other ontologies:

    <owl:Ontology rdf:about="http://purl.obolibrary.org/obo/go/extensions/x-metazoan-anatomy-importer.owl">
        <owl:imports rdf:resource="http://purl.obolibrary.org/obo/go.owl"/>
        <owl:imports rdf:resource="http://purl.obolibrary.org/obo/ro.owl"/>
        <owl:imports rdf:resource="http://purl.obolibrary.org/obo/uberon/composite-metazoan.owl"/>
        <owl:imports rdf:resource="http://purl.obolibrary.org/obo/go/extensions/x-cell.owl"/>
        <owl:imports rdf:resource="http://purl.obolibrary.org/obo/go/extensions/x-metazoan-anatomy.owl"/>
    </owl:Ontology>

Note that most of these will be fetched from the svn externals directory (if you load from svn, rather than the web).

After loading, you should see something like:
protege import chain screenshot
See the wiki for more details.

The bridging axioms have been under-used in the GO, expect more use of them, as part of ontology development and downstream applications over the coming year!

[1]Mungall, C. J., Bada, M., Berardini, T. Z., Deegan, J., Ireland, A., Harris, M. A., Hill, D. P., and Lomax, J. (2011). Cross-product extensions of the Gene Ontology. Journal of Biomedical Informatics 44, 80-86.PMC dx.doi.org/10.1016/j.jbi.2010.02.002

[2] Deegan Née Clark, J. I., Dimmer, E. C., and Mungall, C. J. (2010). Formalization of taxon-based constraints to detect inconsistencies in annotation and ontology development. BMC Bioinformatics 11, 530. http://www.biomedcentral.com/1471-2105/11/530

Phenotype ontologies on googlecode

For the PATO project we’ve set up a repository on googlecode to collect phenotype ontologies and various bridging axioms:

http://code.google.com/p/phenotype-ontologies

This aggregates together the main phenotype ontologies, together with logical definitions bridging ontologies, as defined in

Mungall, C. J., Gkoutos, G. V., Smith, C. L., Haendel, M. A., Lewis, S. E., and Ashburner, M. (2010). Integrating phenotype ontologies across multiple species. Genome Biology 11, R2. Available at: http://dx.doi.org/10.1186/gb-2010-11-1-r2

You can access the aggregated ontology via this PURL:

http://purl.obolibrary.org/obo/upheno/uberpheno-subq-importer.owl

It may be slow to open this via the web. If you have the phenotype-ontologies repository checked out, you can open the file from the filesystem – external ontologies will be obtained via svn:externals.

I recommend using Elk as the reasoner, others will be too slow with the combination of HP, MP, FMA, MA, PATO, etc. Unfortunately Elk doesn’t yet allow DL queries or explanations of inferences.

The above ontology uses a slightly modified version of the definitions described in the Genome Biology paper – instead of modeling each phenotype as a single quality (e.g. redness of nose), we now model them as aggregates of phenotypes. This tends to work better for HPO, which has many composite phenotypes.

Note also that we’re using a hacked version of the uberon bridging axioms to ZFA, MA and FMA – we treat these as precise equivalents rather than taxonomic equivalents. This is necessary as we mix uberon in with the species ontologies in the logical definitions.