Debugging Ontologies using OWL Reasoning. Part 2: Unintentional Entailed Equivalence

This is part in a series on pragmatic techniques for debugging ontologies. This follows from part 1, which covered the basics of debugging using disjointness axioms using Protege and ROBOT.
In the previous part I outlined basic reasoner-based debugging using Protege and ROBOT. The goal was to detect and debug incoherent ontologies.

One potential problem that can arise is the inference of equivalence between two classes, where the equivalence is unintentional. The following example ontology from the previous post illustrates this:

ObjectProperty: part_of
Class: PNS
Class: Nerve SubClassOf: part_of some PNS
Class: PeripheralNerve EquivalentTo: Nerve and part_of some PNS

In this case PeripheralNerve and Nerve are entailed to be mutually equivalent. You can see this in Protege, as the two classes are grouped together with an equivalence symbol linking them:

Screen Shot 2018-09-03 at 5.19.47 PM

As the explanation shows, the two classes are equivalent because (1) PNs are defined as any nerve in the PNS, and (2) nerve is asserted to be in the PNS.

We assume here that this is not the intent of the ontology developer; we assume they created distinct classes with distinct names as they believe them to be distinct. (Note that some ontologies such as SWEET employ equivalence axioms to denote two distinct terms that mean the same thing, but for this article we assume OBO-style ontology development).

When the ontology developer sees inferences like this, they will likely want to take some corrective action:

  • Under one scenario, the inference reveals to the ontology developer that in fact nerve and peripheral nerve are the same concept, and thus the two classes should be merged, with the label from one being retained as the synonym of the other.
  • Under the other scenario, the ontology developer realizes the concept they have been calling ‘Nerve’ encompasses more general neuron projection bundles found in the CNS; here they may decide to rename the concept (e.g. neuron projection bundle) and to eliminate or broaden the part_of axiom.

So far so good. But the challenge here is that an ontology with entailed equivalencies between pairs of classes is formally coherent: all classes are satisfiable, and there are no inconsistencies. It will not be caught by a pipeline that detects incoherencies such as unsatisfiable classes. This means you may end up accidentally releasing an ontology that has potentially serious biological problems. It also means we can’t use the same technique described in part 1 to make a debug module.

Formally we can state this as there being no unique class assumption in OWL. By creating two classes, c1 and c2, you are not saying that there is something that differentiates these, even if it is your intention that they are different.

Within the OBO ecosystem we generally strive to avoid equivalent named classes (principle of orthogonality). There are known cases where equivalent classes join two ontologies (for example, GO cell and CL cell), in general when we find additional entailed pairs of equivalent classes not originally asserted, it’s a problem. I would hypothesize this is frequently true of non-OBO ontologies too.

Detecting unintended equivalencies with ROBOT

For the reasons stated above, ROBOT has configurable behavior for when it encounters equivalent classes. This can be controlled via the –equivalent-classes-allowed (shorthand: “-e”) option on the reason command. There are 3 options here:

  • none: any entailed equivalence axiom between two named classes will result in an error being thrown
  • all: permit all equivalence axioms, entailed or asserted
  • asserted-only: permit entailed equivalence axioms only if they match an asserted equivalence axiom, otherwise throw an error

If you are unsure of what to do it’s always a good idea to start stringent and pass ‘none’. If it turns out you need to maintain asserted equivalencies (for example, the GO/CL ‘cell’ case), then you can switch to ‘asserted-only’.

The ‘all’ option is generally too permissive for most OBO ontologies. However, for some use cases this may be selected. For example, if your ontology imports multiple non-orthogonal ontologies plus bridging axioms and you are using reasoning to find new equivalence mappings.

For example, on our peripheral nerve ontology, if we run

robot reason -e asserted-only -r elk -i pn.omn

We will get:

ERROR org.obolibrary.robot.ReasonOperation - Only equivalent classes that have been asserted are allowed. Inferred equivalencies are forbidden.
ERROR org.obolibrary.robot.ReasonOperation - Equivalence: <; == <;

ROBOT will also exit with a non-zero exist code, ensuring that your release pipeline fails fast, preventing accidental release of broken ontologies.

Debugging false equivalence

This satisfies the requirement that potentially false equivalence can be detected, but how does the ontology developer debug this?

A typical Standard Operating Procedure might be:

  • IF robot fails with unsatisfiable classes
    • Open ontology in Protege and switch on Elk
    • Go to Inferred Classification
    • Navigate to Nothing
    • For each class under Nothing
      • Select the “?” to get explanations
  • IF robot fails with equivalence class pairs
    • Open ontology in Protege and switch on Elk
    • For each class reported by ROBOT
      • Navigate to class
      • Observe the inferred equivalence axiom (in yellow) and select ?

There are two problems with this SOP, one pragmatic and the other a matter of taste.

The pragmatic issue is that there is a Protege explanation workbench bug that sometimes renders Protege unable to show explanations for equivalence axioms in reasoners such as Elk (see this ticket). This is fairly serious for large ontologies (although for our simple example or for midsize ontologies use of HermiT may be perfectly feasible).

But even in the case where this bug is fixed or circumvented, the SOP above is suboptimal in my opinion. One reason is that it is simply more complicated: in contrast to the SOP for dealing with incoherent classes, it’s necessary to look at reports coming from outside Protege, perform additional seach and lookup. The more fundamental reason is the fact that the ontology is formally coherent even though it is defying my expectations to follow the unique class assumption. It is more elegant if we can directly encode my unique class assumption, and have the ontology be entailed to be incoherent when this is violated. That way we don’t have to bolt on additional SOP instructions or additional ad-hoc programmatic operations.

And crucially, it means the same ‘logic core dump’ operation described in the previous post can be used in exactly the same way.

Approach: SubClassOf means ProperSubClassOf

My approach here is to make explicit the assumption: every time an ontology developer asserts a SubClassOf axiom, they actually mean ProperSubClassOf.

To see exactly what this means, it helps to think in terms of Venn diagrams (Venn diagrams are my go-to strategy for explaining even the basics of OWL semantics). The OWL2 direct semantics are set-theoretic, with every class interpreted as a set, so this is a valid approach. When drawing Venn diagrams, sets are circles, and one circle being enclosed by another denotes subsetting. If circles overlap, this indicates set overlap, and if no overlap is shown the sets are assumed disjoint (have no members in common).

Let’s look at what happens when an ontology developer makes a SubClassOf link between PN and N. They may believe they are saying something like this:

Screen Shot 2018-09-03 at 5.12.16 PM

i.e. implicitly indicating that there are some nerves that are not peripheral nerves.

But in fact the OWL SubClassOf operator is interpreted set-theoretically as subset-or-equal-to (i.e. ) which can be visually depicted as:

Screen Shot 2018-09-03 at 5.13.03 PM

In this case our ontology developer wants to exclude the latter as a possibility (even if we end up with a model in which these two are equivalent, the ontology developer needs to arise at this conclusion by having the incoherencies in their own internal model revealed).

To make this explicit, there needs to be an additional class declared that (1) is disjoint from PN and (2) is a subtype of Nerve. We can think of this as a ProperSubClassOf axiom, which can be depicted visually as:

Screen Shot 2018-09-03 at 5.13.44 PM

If we encode this on our test ontology:

ObjectProperty: part_of
Class: PNS
Class: Nerve SubClassOf: part_of some PNS
Class: PeripheralNerve EquivalentTo: Nerve and part_of some PNS
Class: OtherNerve SubClassOf: Nerve DisjointWith: PeripheralNerve
Class: OtherNerve SubClassOf: Nerve DisjointWith: PeripheralNerve

We can see that the ontology is inferred to be incoherent. There is no need for an additional post-hoc check: the generic incoherence detection mechanism of ROBOT does not need any special behavior, and the ontology editor sees all problematic classes in red, and can navigate to all problems by looking under owl:Nothing:

Screen Shot 2018-09-03 at 5.14.43 PM

Of course, we don’t want to manually assert this all the time, and litter our ontology with dreaded “OtherFoo” classes. If we can make the assumption that all asserted SubClassOfs are intended to be ProperSubClassOfs, then we can just do this procedurally as part of the ontology validation pipeline.

One way to do this is to inject a sibling for every class-parent pair and assert that the siblings are disjoint.

The following SPARQL will generate the disjoint siblings (if you don’t know SPARQL don’t worry, this can all be hidden for you):

prefix xsd: <;
prefix rdfs: <;
prefix owl: <;
?sibClass a owl:Class ;
owl:disjointWith ?c ;
rdfs:subClassOf ?p ;
rdfs:label ?sibLabel
?c rdfs:subClassOf ?p .
FILTER NOT EXISTS { ?c owl:deprecated "true"^^xsd:boolean }
?c rdfs:label ?clabel
BIND(concat("DISJOINT-SIB-OF ", ?clabel) AS ?sibLabel)
BIND (UUID() as ?sibClass)

Note that we exclude deprecated/obsolete classes. The generated disjoint siblings are given a random UUID, and the label DISJOINT-SIB-OF X. You could also opt for the simpler “Other X” as in the above example, it doesn’t matter, only the ontology developer sees this, and only when debugging.

This can be encoded in a workflow, such that the axioms are injected as part of a test procedure. You likely do not want these axioms to leak out into the release version and confuse people.

Future versions of ROBOT may include a convenience function for doing this, but fow now you can do this in your Makefile:

SRC = pn.omn
disjoint_sibs.owl: $(SRC)
robot relax -i $< query --format ttl -c construct-disjoint-siblings.sparql $@
test.owl: $(SRC) disjoint_sibs.owl
robot merge -i $< -i disjoint_sibs.owl -o $@