Checking ontologies using ROBOT report (with an example from the cephalopod ontology)

ROBOT is an OBO Tool for use in ontology development pipelines, supported by the Open Bio Ontologies services project. If you set up your ontology repository using the Ontology Development Kit then you will automatically get a workflow that uses ROBOT to perform reasoning and release processes on your ontology. I have previously covered use of ROBOT for reasoner-based validation.

Starting with v1.20 of ROBOT, we now include a new report command. You can call this command to get a report card, giving a health check on multiple aspects of your ontology.

ROBOT vs Cephalopod

You can read the online documentation here, let’s take a look at it in action. We’ll try a report on the cephalopod ontology.

Like most robot commands, the input can either be a local file (specified with –input or –i) or a URL (specified with –input-iri or -I).
robot report -I http://purl.obolibrary.org/obo/ceph.owl

This will return a textual report, broken down into a summary and detail section. Here the summary section starts:


Violations: 236
-----------------
ERROR: 8
WARN: 211
INFO: 17

236 violations, bad cephalopod! The violations are broken into 3 categories: ERROR, WARN and INFO. The most serious ones (ERROR) are listed first:


Level Rule Name Subject Property Value
ERROR duplicate_definition head-mantle fusion [CEPH:0000129] definition [IAO:0000115] .
ERROR duplicate_definition tentacle thickness [CEPH:0000261] definition [IAO:0000115] .
ERROR missing_label anatomical entity [UBERON:0001062] label [rdfs:label]
ERROR missing_ontology_description ceph.owl dc11:description
ERROR duplicate_label leucophore [CEPH:0000284] label [rdfs:label] leucophore
ERROR duplicate_label leucophore [CEPH:0001077] label [rdfs:label] leucophore
ERROR missing_ontology_license ceph.owl dc:license
ERROR missing_ontology_title ceph.owl dc11:title

Duplicate definition is considered a serious problem, since every class in an OBO ontology should be unique, and two classes with the same definition indicates the meaning is the same and the classes should be merged.

Similarly, each class should have a label, otherwise a human has no idea what it is. Note in this case the missing label is on an imported class rather than one native to the ontology itself. Duplicate labels are considered serious, since each class in an ontology should have a unique label, so humans can tell them apart.

Some of the violations are due to missing metadata. Here there is no description of the ontology, no license declared and no title. In OBO we strive to have complete metadata for everyone ontology.

After the ERRORs are the WARNings, for example:


WARN annotation_whitespace accessory gland complex [CEPH:0000004] definition [IAO:0000115] The glandular complex in cirrates that forms sperm packets and is a counterpart of the spermatophore-forming complex of other cephalopods.
WARN annotation_whitespace armature of the arms [CEPH:0000018] definition [IAO:0000115] The grappling structures on the oral surfaces the arms and tentacles, including both suckers and hooks.
WARN annotation_whitespace iteroparous [CEPH:0001067] comment [rdfs:comment] Nautiluses are the only cephalopods to present iteroparity

Annotation whitespace means that there is either trailing, leading, or additional internal whitespace (sorry, fans of two spaces after a period) in the value used in an annotation assertion (remember, ROBOT uses OWL terminology, and here “annotation” means a non-logical axiom, typically a piece of textual metadata used by humans). The annotation property is also reported. It helps to memorize a few IAO properties, e.g IAO:0000115 is the property for (text) definition.

These kinds of issues are not usually harmful, but they can confound some lexical operations.

Currently WARN includes a few false positives, e.g.


WARN multiple_asserted_superclasses anal flap [CEPH:0000009] rdfs:subClassOf valve [UBERON:0003978]
WARN multiple_asserted_superclasses anal flap [CEPH:0000009] rdfs:subClassOf ectoderm-derived structure [UBERON:0004121]

It should be stressed that there is NOTHING WRONG with multiple inheritance in an ontology. It is good engineering practice to assert at most one in the editors version and infer the rest, but as far as what the end-user sees it is all the same. In future this check should only apply to the editors version and not to the released or post-reasoned version.


WARN missing_definition transverse row of suckers [CEPH:0000306] definition [IAO:0000115]
WARN missing_definition cephalopod sperm receptacle [CEPH:0001017] definition [IAO:0000115]
WARN missing_definition inner sucker ring [CEPH:0001020] definition [IAO:0000115]
WARN missing_definition circumoral appendage bud [CEPH:0000003] definition [IAO:0000115]

This one is a more serious problem. One of the OBO principles is that classes should have definitions for human users. Without definitions, we must rely on the label or placement in the graph to intuit meaning, and this is often unreliable.

While text definitions are incredibly important, we only count this as a WARNing because currently even well curated ontologies frequently miss some definitions, having every class defined is a high bar.

Following the warnings are the INFOs, for example


INFO lowercase_definition anal photophore [CEPH:0000304] definition [IAO:0000115] photophore at side of anus

This is warning us that the class for anal photophore has a definition that is all lowercase. This is not particularly serious. However, having inconsistent lexical rules in an ontology can look unprofessional, and in OBO we have the convention that all text definitions have the first word capitalized. Many ontologies such as GO end with a period.

Use in ontology pipelines

The report command can be used in ontology pipelines to “fail fast” if the report produces ERRORs. The default behavior is to exit with a non-zero return code (standard in unix for a command that fails).

This can be configured – for example, if you wish to create a report for informative purposes, but you are not yet up to the standard that ROBOT imposes. You can also set ROBOT to be more strict by failing if a single warning is found, like this:

robot report --input cephalopod.owl \
  --fail-on WARN \
  --output report.tsv

Specify “fail-on none” if you never want the report command to fail, no matter what it finds.

As the output is TSV it’s easy to load into Excel or Google sheets (if you like that sort of thing), to transform to a Wiki table, etc. You can also get the report as YAML.

How does this work?

Each type of check in the report is implemented as a SPARQL query. This makes the framework very declarative, transparent, and easy to extend. Many ontology editors who do not code can read SPARQL queries, which makes it easier to figure out which each check is doing.

You can see the queries in the ROBOT repository.

Who decides on the rules?

Hold on, you may be thinking. Who gave these ROBOT people the right to decide what counts as an error? Maybe you have duplicate definitions and you think that is just fine.

This should not prevent you from using ROBOT, as everything is customizable. You can create your own profile where duplicate definitions are not errors.

However, our overall goal is to come up with a set of checks that are agreed across the OBO community. We have made initial decisions based on our combined experience with diverse ontologies, but we want community feedback. Please give us feedback on the ROBOT GitHub issue tracker!

Ultimately we plan to use the default profile to automatically evaluate ontologies within the OBO library, and provide assistance to reviewers of ontologies.

Acknowledgments

The report command and accompanying queries were written by Becky Tauber. Thanks also to James Overton, HyeongSik Kim, and members of the OBO community who provided specifications, directions, and feedback. In particular many checks came from the set developed originally for the Gene Ontology. ROBOT is supported by the OBO services grant to Bjorn Peters and myself.

Advertisements

One Response to Checking ontologies using ROBOT report (with an example from the cephalopod ontology)

  1. Nicole Vasilevsky says:

    This is great, Chris!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: