Ontotip: Avoid the single child anti-pattern

This article is part of the OntoTips series.

A common structure found in many ontologies is the single child pattern. I consider this an anti-pattern, to be avoided.

The most common form is with is_a children (i.e subClassOf between two named classes), but the anti-pattern also applies to other relationship types. We can formalize the single child subclass pattern as:

  • C1 direct SubClassOf P
  • NOT exists some C2, such as C2 is a direct SubClassof P, and C2 != C1

Depicted as:

C1 is the only child of P

One reason this is an anti-pattern is that it is inherently incomplete. i.e. there must be instances of P that are not instances of C1 (otherwise why have two classes – see the reflexive subclass anti-pattern). Following a principle of reasonable completeness (see open world post) we should include sibling terms where appropriate.

Here is a concrete example from a fictional ontology:

A subset of a fictional ontology, where only one subtype of flu (mild flu) is populated

Here there is a single specialization of a disease term, based on severity.

Another example (adapted from an existing ontology):

A fictional example, where a chemical assay is subtyped by a property of that chemical pool (its concentration)

Here there is a specialization of the assay term based on a property of the pool or iron.

A different example (adapted from an existing ontology):

An adapted example, where a specific assay has a single subtype, differentiated by location of sampling

This kind of structure is not uncommon in many OBO ontologies. And there is a reasonable defense: we have limited ontology editing resources, and many terms are added on request. Curators are free to request a more specific term if they feel it is necessary for annotating (e.g a disease that has as phenotype mild flu) but they may have no need for the implicit sibling terms. And ontology developers see no need to do additional work they are not requested to do.

However, this leads to lopsided ontologies that are often confusing for people not deeply immersed the development of these ontologies. It is hard to tell if omissions are intentional or unintentional. And the practice of instantiating single children has bad downstream effects of annotation, this is something we have frequently observed over multiple ontologies.

Consider the flu example above. A new annotator may want to annotate a disease that has a severe flu phenotype. They may make an implicit assumption that choosing the parent term ‘flu’ would communicate ‘severe flu’; if it was mild, they would have selected ‘mild flu’. But this is not the explicit assertion they are making – they are making a closed world assumption that doesn’t hold for the logic of the ontology. While some of this can be obviated with training, and ensuring curators request specific sibling terms rather than trying to let the parent do the work. But many single-child cases are in fact more nuanced that the flu example.

Instead, it is better to take a more prospective approach to ontology development, try and anticipate in advance terms that may be required, and populate them in a balanced fashion – this will result in more balanced annotations. It is much easier to do this if you follow OWL axiomatization and have a formal design pattern system such as DOSDPs. In fact you can use such a system to automate detection of single-child patterns and imbalances.

While it is trivial to detect single is-a children using a SPARQL query encoding the pattern above, it won’t capture the more nuanced cases of single children by a given axis of classification.

Consider this made-up ontology structure, where we have a parent class with only two subclasses explicitly populated:

made-up example, the class ‘animal’ has only two child classes populated, one by a taxonomic axis (vertebrate) another by a property of the animal (its edibility)

In this particular example, both are also single-children via an axis of classification. While on a gross structural level the lower terms each has a sibling, each sibling is clearly classified differently. The first is classified along classical taxonomic/evolutionary descent terms, the second is by a different property.

The above example is made-up and would strike most people as bad design (even if strictly logically coherent). Where is the concept of inedible animals, where are the invertebrates (and indeed edible and inedible vertebrates and invertebrates)?

But in fact this antipattern plagues most OBO ontologies. These are harder to spot, especially if the ontology is unaxiomatized.

For example:

iron assay with two child terms along different axes

Structurally this doesn’t look like a single-child anti-pattern, but it is in fact an example of a single-child-by-axis pattern. And if there are no subclasses, this is an instance of the ragged lattice pattern, which I will cover in a future post.

While these can’t be detected by straightforward SPARQL queries, if you use a system such as DOSDPs you can use this to analyze your ontology for these structures, and proactively guard against them. 

While the above examples all focus on subClassOf/is-a relations, the same guidelines apply to other edge labels. For example, if an anatomy ontology only listed a single part of the head (such as the mouth):

An incomplete ontology with only one part child for head)

Most people would consider this poor design. While of course it’s true it’s unreasonable to expect ontologies to be complete, the reasonable completeness principle should apply, and if for some reason this not unattainable, at the very least the incompleteness needs to be clearly documented.

In closing, as ontology developers it can be tempting to ignore these single child cases – we have limited resources, and must balance this with being able to provide users with terms they request, which may lead to spottiness and incompleteness. But ignoring these just leads to more work downstream, and in some cases it can lead to incomplete annotation. So avoid single is-a children!