Introduction to Protege and OWL for the Planteome project

As a part of the Planteome project, we develop common reference ontologies and applications for plant biology.

Planteome logo

As an initial phase of this work, we are transitioning from editing standalone ontologies in OBO-Edit to integrated ontologies using increased OWL axiomatization and reasoning. In November we held a workshop that brought together plant trait experts from across the world, and developed a plan for integrating multiple species-specific ontologies with a reference trait ontology.

As part of the workshop, we took a tour of some of the fundamentals of OWL, hybrid obo/owl editing using Protege 5, and using reasoners and template-based systems to automate large portions of ontology development.

I based the material on an earlier tutorial prepared for the Gene Ontology editors, it’s available on the Planteome GitHub Repository at:

https://github.com/Planteome/protege-tutorial

The size of Richard Nixon’s nose, part I

Consider a simple model of Richard Nixon:

Individual: :nixon
Types: :Organism
Facts: :has_part :nixons_nose

Individual: :nixons_nose
Types: :nose
Facts: :has_characteristic :nixons_nose_size

Individual: :nixons_nose_size
Types: :big

nixon haspart nose hasquality size

here’s the relations in our background ontology:

ObjectProperty: :has_part
Characteristics: Transitive

ObjectProperty: :has_characteristic
InverseOf:
:characteristic_of

ObjectProperty: :characteristic_of
InverseOf:
:has_characteristic

We have 3 entities: Nixon, his nose, and the characteristic or quality that is Richard Nixon’s nose size. We follow BFO here in individuating qualities: thus even if I had a nose of the “same” size as Richard Nixon, we would not share the same nose-size quality instance, we would each have our own unique nose-size quality instance (for a nice treatment, see Neuhaus et al [PDF]).

Now let’s look at a phenotypic label such as “big nose”. Intuitively we can see that this applies to Richard Nixon. But where exactly in this instance graph is the big nose phenotype? Is it the nose, the size, or Richard himself?

Specifically, if we have a phenotype ontology with a term “increased size of nose” or “big nose”, what OWL class expression do we assign as an equivalent? We have to make a decision as to where to root the path through our instance graph. It might be:

  • The nose: ie nose and has_characteristic some big
  • The size:¬† i.e. big and characteristic_of some nose
  • The organism: i.e. has_part some (nose and has_characteristic some big)
  • some unspecified thing that has a relationship to one of the above

The structure OWL class expression can be visualized as a path through the nixon graph:

Our decision affects the classification we get from reasoning. A big nose is part of a funny face, but in contrast a person with a big nose is a subclass of a person with a funny face. If you then put your reasoner results into a phenotype analysis you might get different results.

To an ordinary common sense person whose brain hasn’t been infected by ontologies, the difference between a “a nose that is increased in size” and an “increased size of nose” or a “person with a nose that’s increased in size” is just linguistic fluff, but the distinctions are important from an ontology modeling perspective.

Nevertheless, we may want to formalize the fact that we don’t care about these distinctions – we might want our “big nose” phenotype class to be any of the above.

One way would be to make fugly union classes, but this is tedious.

There is another way. We can introduce a generic “exhibits” relation. We elide a textual definition for now, the idea is that this relation captures the general notion of having a phenotype:

ObjectProperty: :exhibits
SubPropertyChain: :exhibits o :has_part
SubPropertyChain: :exhibits o :has_characteristic
SubPropertyChain: :exhibits o :characteristic_of
Characteristics: Transitive

We make this is super-relation of has_part:

ObjectProperty: :has_part
SubPropertyOf: :exhibits
Characteristics: Transitive

We can see exhibits is very promiscuous – when it connects to other relations, it makes a new exhibits relation.

How let’s make some probe classes illustrating the different ways we could define our “don’t care where we root the graph” phenotype:

Class: :test1
EquivalentTo: :exhibits some (:big and :characteristic_of some :nose)

Class: :test2
EquivalentTo: :exhibits some (:has_part some (:nose and :has_characteristic some :big))

Class: :test3
EquivalentTo: :exhibits some (:nose and :has_characteristic some :big)

Class: :test4
EquivalentTo: :has_part some (:nose and :has_characteristic some :big)

After running the reasoner we get the following inferred hierarchy:

-- test1=test3
---- test2
---- test4

So we can see we are collapsing the distinction between¬† “increased size of nose” and “nose that is increased in size” by instead defining a class “exhibiting an increased size of nose”.

If we then try the DL-query tab in Protege, we can see that the individual “nixon” satisfies all of these expressions.

Why is this important? It means we can join and analyze datasets without performing awkward translations. Group 1 can take a quality-centric approach, Group 2 can take an entity-centric approach, the descriptions or data from either of these groups will classify under the common “exhibits phenotype” class.

This works because of the declared inverse between has characteristic and characteristic of. Graphically we can think of this as “doubling back”:

Unfortunately, inverses put us outside EL++, so we can’t use the awesome Elk for classification.

Not-caring in ontologies is hard work!

What if we want to care even less, and formally have a “big nose phenotype” class classify either nixon, his nose, or the bigness that inheres in his nose? That’s the subject of the next post, together with some answers to the bigger question of “what is a phenotype”.

Phenotype ontologies on googlecode

For the PATO project we’ve set up a repository on googlecode to collect phenotype ontologies and various bridging axioms:

http://code.google.com/p/phenotype-ontologies

This aggregates together the main phenotype ontologies, together with logical definitions bridging ontologies, as defined in

Mungall, C. J., Gkoutos, G. V., Smith, C. L., Haendel, M. A., Lewis, S. E., and Ashburner, M. (2010). Integrating phenotype ontologies across multiple species. Genome Biology 11, R2. Available at: http://dx.doi.org/10.1186/gb-2010-11-1-r2

You can access the aggregated ontology via this PURL:

http://purl.obolibrary.org/obo/upheno/uberpheno-subq-importer.owl

It may be slow to open this via the web. If you have the phenotype-ontologies repository checked out, you can open the file from the filesystem – external ontologies will be obtained via svn:externals.

I recommend using Elk as the reasoner, others will be too slow with the combination of HP, MP, FMA, MA, PATO, etc. Unfortunately Elk doesn’t yet allow DL queries or explanations of inferences.

The above ontology uses a slightly modified version of the definitions described in the Genome Biology paper – instead of modeling each phenotype as a single quality (e.g. redness of nose), we now model them as aggregates of phenotypes. This tends to work better for HPO, which has many composite phenotypes.

Note also that we’re using a hacked version of the uberon bridging axioms to ZFA, MA and FMA – we treat these as precise equivalents rather than taxonomic equivalents. This is necessary as we mix uberon in with the species ontologies in the logical definitions.