This is one post in a series of tips on ontology development, see the parent post for more details.
A common mistake is to over-specify an OWL definition (another post will be on under-specification). While not technically wrong, over-specification loses you reasoning power, limiting your ability to auto-classify your ontology. Formally, what I mean by over-specifying here is: stating more conditions than is required for correct entailments.
One manifestation of this anti-pattern is the over-specified genus. (this is where I disagree with Seppala et al on S3.1.1, use the genus proximus, see previous post). I will use a contrived example here, although there are many real examples. GO contains a class ‘Schwann cell differentiation’, with an OWL definition referencing ‘Schwann cell’ from the cell ontology (CL). I consider the logical definition to be neither over- nor under- specified:
‘Schwann cell differentiation’ EquivalentTo ‘cell differentiation’ and results-in-acquisition-of-features-of some ‘Schwann cell’
We also have a corresponding logical definition for the parent:
‘glial cell differentiation’ EquivalentTo ‘cell differentiation’ and results-in-acquisition-of-features-of some ‘glial cell’
The Cell Ontology (CL) contains the knowledge that Schwann cells are subtypes of glial cells, which allows us to infer that ‘Schwann cell differentiation’ is a subtype of ‘glial cell differentiation’. So far, so good (if you read the post on Normalization you should be nodding along). This definition does real work for us in the ontology: we infer the GO hierarchy based on the definition and classification of cells in CL.
Now, imagine that in fact GO had an alternate OWL definition:
‘Schwann cell differentiation’ EquivalentTo ‘glial cell differentiation’ and results-in-acquisition-of-features-of some ‘Schwann cell’
This is not wrong, but is far less useful. We want to be able to infer the glial cell parentage, rather than assert it. Asserting it violates DRY (the Don’t Repeat Yourself principle) as we implicitly repeat the assertion about Schwann cells being glial cells in GO (when in fact the primary assertion belongs in CL). If one day the community decides that in fact that Schwann cells are not glial but in fact neurons (OK, so this example is not so realistic…), then we have to change this in two places. Having to change things in two places is definitely a bad thing.
I have seen this kind of genus-overspecification in a number of different ontologies; this can be a side-effect of the harmful misapplication of the single-inheritance principle (see ‘Single inheritance considered harmful’, a previous post). This can also arise from tooling limitations: the NCIT neoplasm hierarchy has a number of examples of this due to the tool they originally used for authoring definitions.
Another related over-specification is too many differentiae, which drastically limits the work a reasoner and your logical axioms can do for you. As a hypothetical example, imagine that we have a named cell type ‘hippocampal interneuron’, conventionally defined and used in the (trivial) sense of any interneuron whose soma is located in a hippocampus. Now let’s imagine that single-cell transcriptomics has shown that these cells always express genes A, B and C (OK, there are may nuances with integrating ontologies with single-cell data but let’s make some simplifying assumptions for now)/
It may be tempting to write a logical definition:
‘hippocampal interneuron’ EquivalentTo
- interneuron AND
- has-soma-location SOME hippocampus AND
- expresses some A AND
- expresses some B AND
- expresses some C
This is not wrong per se (at least in our hypothetical world where hippocampal neurons always express these), but the definition does less work for us. In particular, if we later include a cell type ‘hippocampus CA1 interneuron’ defined as any interneuron in the CA1 region of the hippocampus, we would like this to be classified under hippocampal neuron. However, this will not happen unless we redundantly state gene expression criteria for every class, violating DRY.
The correct thing to do here is to use what is sometimes called a ‘hidden General Class Inclusion (GCI) axiom’ which is just a fancy way of saying that SubClassOf (necessary conditions) can be mixed in with an equivalence axiom / logical definition:
‘hippocampal interneuron’ EquivalentTo interneuron AND has-soma-location SOME hippocampus
‘hippocampal interneuron’ SubClassOf expresses some A
‘hippocampal interneuron’ SubClassOf expresses some B
‘hippocampal interneuron’ SubClassOf expresses some C
In a later post, I will return to the concept of an axiom doing ‘work’, and provide a more formal definition that can be used to evaluate logical definitions. However, even without a formal metric, the concept of ‘work’ is intuitive to people who have experience using OWL logical definitions to derive hierarchies. These people usually intuitively test things in the reasoner as they go along, rather than simply writing an OWL definition and hoping it will work.
Another sign that you may be overstating logical definitions is if they are for groups of similar classes, yet they do not fit into any design pattern template.
For example, in the above examples, the cell differentiation branch of GO fits into a standard pattern
cell differentiation and results-in-acquisition-of-features-of some C
where C is any cell type. The over-specified definition does not fit this pattern.
One thought on “OntoTip: Don’t over-specify OWL definitions”