Creating an ontology project, an update

  • In a previous post, I recommended some standard ways of managing the various portions of an ontology project using a version control system like GitHub.

Since writing that post, I’ve written a new utility that makes this task even easier. With the ontology-starter-kit you can generate all your project files and get set up for creating your first release in minutes. This script takes into account some changes since the original post two years ago:

  • Travis-CI has become the de-facto standard continuous integration system for performing unit tests on any project managed in GitHub (for more on CI see this post). The starter-kit will give you a default travis setup.
  • Managing your metadata and PURLs on the OBO Library has changed to a GitHub-based system:
  • ROBOT has emerged as a simpler way of managing many aspects of a release process, particularly managing your external imports

Getting started

To get started, clone or download cmungall/ontology-starter-kit

Currently, you will need:

  • perl
  • make
  • git (command line client)

For best results, you should also download owltools, oort and robot (in the future we’ll have a more unified system)

You can obtain all these by running the install script:

./INSTALL.sh

This should be run from within the ontology-starter-kit directory

Then, from within that directory, you can seed your ontology:

./seed-my-ontology-repo.pl  -d ro -d uberon -u obophenotype -t cnidaria-ontology cnido

 

This assumes that you are building some kind of extension to uberon, using the relation ontology (OBO Library ontology IDs must be used here), that you will be placing this in the https://github.com/obophenotype/ organization  and that the repo name in obophenotype/cnidaria-ontology, and that IDs will be of the form CNIDA:nnnnnnn

After running, the repository will be created in the target/cnidaria-ontology folder, relative to where you are. You can move this out to somewhere more convenient.

The script is chatty, and it informs of you how it is copying the template files from the template directory into the target directory. It will create your initial source setup, including a makefile, and then it will use that makefile to create an initial release, going so far as to init the git repo, add and commit files (unless overridden). It will not go as far as to create a repo for you on github, but it provides explicit instructions on what you should do next:


EXECUTING: git status
# On branch master
nothing to commit, working directory clean
NEXT STEPS:
0. Examine target/cnidaria-ontology and check it meets your expectations. If not blow it away and start again
1. Go to: https://github.com/new
2. The owner MUST be obophenotype. The Repository name MUST be cnidaria-ontology
3. Do not initialize with a README (you already have one)
4. Click Create
5. See the section under '…or push an existing repository from the command line'
E.g.:
cd target/cnidaria-ontology
git remote add origin git@github.com:obophenotype/cnido.git
git push -u origin master

Note also that it also generates a metadata directory for you, with .md and .yml files you can use for your project on obolibrary (of course, you need to request your ontology ID space first, but you can go ahead and make a pull request with these files).

Future development

The overall system may no longer be necessary in the future, if we get a complete turnkey ontology release system with capabilities similar to analogous tools in software development such as maven.

For now, the Makefile approach is most flexible, and is widely understood by many software developers, but a long standing obstacle has been the difficulty in setting up the Makefile for a new project. The starter kit provides a band-aid here.

If required, it should be possible to set up alternate templates for different styles of project layouts. Pull requests on the starter-kit repository are welcome!

 

 

Introduction to Protege and OWL for the Planteome project

As a part of the Planteome project, we develop common reference ontologies and applications for plant biology.

Planteome logo

As an initial phase of this work, we are transitioning from editing standalone ontologies in OBO-Edit to integrated ontologies using increased OWL axiomatization and reasoning. In November we held a workshop that brought together plant trait experts from across the world, and developed a plan for integrating multiple species-specific ontologies with a reference trait ontology.

As part of the workshop, we took a tour of some of the fundamentals of OWL, hybrid obo/owl editing using Protege 5, and using reasoners and template-based systems to automate large portions of ontology development.

I based the material on an earlier tutorial prepared for the Gene Ontology editors, it’s available on the Planteome GitHub Repository at:

https://github.com/Planteome/protege-tutorial

A lightweight ontology registry system

For a number of years, I have been one of the maintainers of the registry that underpins the list of ontologies at the Open Biological Ontologies Foundry/Library (http://obofoundry.org). I also built some of the infrastructure that creates nightly builds of each ontology, verifying it and providing versions in both obo format and owl.

The original system grew organically and was driven by an ultra-simple file called “ontologies.txt“, stored on google code. This grew to be supplemented by a collection of other files for maintaining the list of issue trackers, together with additional metadata to maintain the central OBO builds. The imminent demise of google code and the general creakiness and inflexibility of the old system has prompted the search for a new solution. I wanted something that would make it much easier for ontology providers to update their information, but at the same time allow the central OBO group the ability to vet and correct entries. We needed something more sophisticated than a flat key-value list, yet not overly complex. We also wanted something compatible with semantic web standards (i.e. to have an RDF file with a description of every ontology it it, using standard vocabularies and ontologies for the properties and classes). We also wanted it to look a bit nicer than the old site, which was looking decidedly 2000-and-late.

Screen Shot 2015-08-26 at 7.57.26 PM

The legacy OBOFoundry site, looking dated and missing key information

What are some of the options here?

  • A centralized wiki, with a page for each ontology, and groups updating their entry on the wiki
  • Each group embeds the metadata about the ontology in a website they maintain. This is then periodically harvested by the central registry. Options for embedding the metadata include microdata and RDFa
  • Each group maintains their own metadata in or alongside their ontology in rdf/owl, and this is periodically harvested
  • Piggy back off of an existing registry, e.g. BioPortal
  • A bespoke registry system, designed from the ground up, with its own relational database underpinning it, etc

These are good all solutions in the appropriate context, but none fitted our requirements precisely. Wikis are best for unstructured or loosely structred narrative text, but attempts to embed structured information inside wikis have been less than satisfactory. The microdata/RDFa approach is interesting, but not practical for us. Microdata is inherently limited in terms of extensibility, and RDFa is complex for many users. Additionally it requires both that groups produce their own web sites (many rely on the OBO Foundry to do this for them), and that we both harvest the metadata and relinquish control. As mentioned previously, it is useful for the OBO repository administrators to have certain fields be filled in centrally (sometimes for policy reasons, sometimes technical).  The same concerns underpin the fully decentralized approach, in which every group maintains the metadata directly as part of the ontology, and we harvest this.

Existing registries are built for their own requirements. A bespoke registry system is attractive in many ways, as this can be highly customized, but this can be expensive and we lacked the resources for this.

Solution: GitHub pages and “YAML-LD”

I initially prototyped a solution making use of the GitHub pages framework, driven by YAML files. This can be considered a kind of bespoke system, contradicting what I said above. But rather than roll the entire framework, the system is really just some templates glueing together some existing systems. GitHub support for social coding and YAML helped a lot. The system was very quick to develop and it soon morphed into the actual system to replace the old OBO site.

YAML

YAML is a markup language that superficially resembles the tag-value stanza format we were previously using, but crucially allows for nesting. Here is an example of a snippet of YAML for a cephalopod ontology:

id: ceph
title: Cephalopod Ontology
contact:
  email: cjmungall@lbl.gov
  label: Chris Mungall
description: An anatomical and developmental ontology for cephalopods
taxon:
  id: NCBITaxon:6605
  label: Cephalopod

Note that certain tags have ‘objects’ as their fields, e.g. contact and taxon.

We stick to the subset of YAML that can be represented in JSON, and we can thus define a JSON-LD context, allowing for a direct translation to RDF, which is nice. This part is still being finalized, but the basic idea is that keys like ‘title’ will be mapped to dc:title, and the taxon CURIE will be expanded to the full PURL for cephalopoda.

The basic idea is to manage each ontologies metadata as a separate YAML file in a GitHub repository. GitHub features nice builtin YAML rendering, and files can be edited via the GitHub web-interface, which is YAML-aware.

The list of metadata files are here. Note that these are markdown files ( the .md stands for markdown, not metadata). YAML can actually be embedded in Markdown, so each file is a mini-webpage for the ontology with the metadata embedded right in there. This is in some ways similar to the microdata/RDFa approach but IMHO much more elegant.

GitHub Pages

Each markdown file is rendered attractively through the GitHub interface – for example, here is the md file for the environment ontology, rendered using the builtin GitHub md renderer. Note the yaml block contains structured data and the rest of the file can contain any mixture of markdown and HTML which is rendered on the page. We can do better than this using GitHub pages. Using a simple static site generator and templating system (Jekyll/liquid) we can render each page using our own CSS with our own format. For example here is ENVO again, but rendered using Jekyll. Note that we aren’t even running our own webserver here, this is all a service provided for us, in keeping with our desire to keep things lightweight and resource-light.

Screen Shot 2015-08-26 at 10.43.30 PM

The entire system consists of a few HTML templates plus a single python script that derives an uber-metadata file that powers the central table (visible on the front page).

Distributed editing

Where the system really shines is the distributed and social editing model. All of this comes for free when hosted on GitHub (in theory GitLab or some other sites should work). Anyone can come along and fork the OBOFoundry.github.io github repository into their own userspace and make edits – they can even do this without leaving their web browser (see the Edit button on the bottom left of every OBO ontology page).

What’s to stop some vandal trashing the registry? Crucially, any edits made by a non-owner remains in their own fork until they issue a Pull Request. After that, someone from OBO will come along and either merge in the pull request, or close it (giving a reason why they did not merge of course). The version control system maintains a full audit trail of this, premature merges can be rolled back, etc.

The task of the OBO team is made easier thanks to Travis-CI, a Continuous Integration system integrated into GitHub. I configured the OBOFoundry github site with a Travis configuration file that instructs Travis to check every pushed commit using an automated test suite – this ensures that people editing their yaml files don’t make syntax errors, or omit crucial metadata.

github merge page with travis check

Screenshot of GitHub pull request, showing a passed Travis check

I have previously written about the use of Continuous Integration in ontology development – although CI was developed primarily for software engineering products, it works surprisingly well for ontologies and for metadata. This is perhaps not surprising if we consider these engineered artefacts in the way software is.

The whole end-to-end process is documented in this FAQ entry on the site.

The system has been working extremely well and is popular among the ontology groups that contribute their expertise to OBO – before official launch of the new site, we had 31 closed pull requests. Whereas previously a member of the OBO team would have to coordinate with the ontology provider to enter the metadata (a time consuming process prone to errors and backlogs), now the provider has the ability to enter information themselves, with the benefit of validation from Travis and the OBO team.

Other features

The new site has many other improvements over the last one. It’s not possible to distinguish between the ontology sensu the the umbrella entity vs individual ontology products or editions. For example, the various editions of Uberon (basic, core, composite metazoan) can each be individually registered and validated. There are also a growing number of properties that can be associated with the ontology, from a twitter handle to logos to custom browsers. Hopefully some of these features will be useful to the OBO community. Of course, the overall look could still be massively improved easily by someone with some web design chops (it’s very bland generic bootstrap at the moment). But this isn’t really the point of this post, which is more about the application of a certain set of technologies to allow a balance between centralization and distributed editing that suits the needs of the OBO Foundry. Leveraging existing services like GitHub pages, Travis and the GitHub fork-and-pull-request model allows us to get more mileage for less effort.

The future of metadata

The new OBO site was inspired in many ways by the system developed by my colleague Jorrit Poelen for the Global Biotic Interactions database (GloBI), in which simple JSON metadata files describing each interaction dataset are provided in individual GitHub repositories. A central system periodically harvests these into a large searchable index, where different datasets are integrated. This is not so different from common practice among software developers, who provide metadata for their project in the form of pom.xml files and package.json files (not out of their love of metadata, but more because this provides a useful service or is necessary for working in a wider ecosystem, and integrating with other software components). As James Malone points out, it makes far more sense to simply pull this existing metadata rather than force developers to register in a monolithic rigid centralized registry. If there are incentives  for providers of any kind of information artefacts (software, ontologies, datasets) to provide richer metadata at source in large already-existing open repositories such as GitHub then it does away with the need to build separately funded large monolithic registries. The new OBO system and the GloBI approach are demonstrating some of these incentives for ontologies and datasets. The current OBO system still has a large centralized aspect, due in part to the nature of the OBO Foundry, but in future may become more distributed.


Chris Mungall

owljs – a javascript library for OWL hacking

owljs ia a javascript library for doing stuff with OWL. It’s available from github:

https://github.com/cmungall/owljs

Whilst it attempts to following CommonJS, you currently have to use RingoJS  (a Rhino engine) as it makes use of JVM calls to the OWLAPI

owl plus rhino equals fun

 

Why javascript?

Why javascript you may ask? Isn’t that a hacky language run in browsers? In fact javascript is increasingly used on the server side as well as in browsers, as can be seen in the success of node.js. With Java 8 incorporating Nashorn as a scripting engine, it looks like javascript on the server side is here to stay.

Why not just use java? Java can be very verbose and is not the ideal language for writing short ontology processing scripts, especially with the OWL API.

There are a number of other languages better suited to scripting and declarative programming in general, many of which run on the JVM. This includes

  • Groovy – a popular choice for interfacing with the OWL API
  • The Armed Bear flavor of Common Lisp, as used in LSW2.
  • Clojure, a variant of lisp, as used in Phil Lord’s powerful Tawny-OWL framework.
  • Scala, a superbly elegant functional programming language used to great effect in Jim Balhoff’s beautifully elegant scowl.
  • Iron Python – a popular choice for interfacing with the Brain. And of course, Python is the de facto language for bioinformatics these days

There are also offerings in non-JVM languages such as my own posh – in addition most languages provide some kind of RDF library, but this can often be low level for working in OWL.

I decided to write a javascript library for a number of reasons. Our group already produces a lot of javascript code, most of which can be run on the server. For example, the golr libraries used in the AmiGO 2 codebase are CommonJS, as are those used for the Monarch API. Thse APIs all access ontologies through services (and can thus be run on a non-JVM javascript engine like node), and we would not make these APIs depend on a JVM. However, the ability to go the other way is useful – in a powerful ontology processing environment that offers all the features of the OWL API, being able to access all kinds of bioinformatics data through ready-made javascript APIs.

Another reason is that JSON is ubiquitous, and having your data format be a subset of the language has some major advantages.

Plus, after an initial period of ambivalence, I have grown to really like javascript. It’s just functional enough to do some cool things.

What can you do with it?

I hope to provide some specific examples later on this blog. For now, take a look at the docs on github. Major features are:

Stay tuned for more information!

 

 

 

 

The perils of managing OWL in a version control system

Background

Version Control Systems (VCSs) are commonly used for the management
and deployment of biological ontologies. This has many advantages,
just as is the case for software development. Standard VCS
environments and hosting solutions like github provide a wealth of
features including easy access to historic versions, branching, forking, diffs, annotation of changes, etc.

VCS systems also integrate well with Continuous Integration systems.
For example, a CI system can be configured to run a series of checks and even publish, triggered by a git commit/push.

OBO Format was designed with VCSs in mind. One of the main guiding
principles was that ontologies should be diffable. In order to
guarantee this, the OBO format specifies a recommended tag ordering
ensuring that serialization of an ontology into a file is
deterministic. OBO format was also designed such that ascii-level
diffs were as human readable as possible.

OBO Format is a deprecated format – I recommend groups switch to using
one of the W3C concrete forms of OWL. However, this comes with one
caveat – if the source (editors) version of an ontology is switched
from obo to any other OWL serialization, then human-readable diffs are
lost. Additionally, the non-deterministic serialization of the
ontology results in spurious diffs that not only hamper
human-readability, but also cause bottlenecks in VCS. As an example,
releasing a version of the Uberon ontology can consume over an hour
simply performing SVN operations.

The issue of human-readability is being addressed by a working group
to extend Manchester Syntax (email me for further details). Here I
focus not on readability of diffs, but on the size of diffs, as this
is an important aspect of managing an ontology in a VCS.

Methods

I measured the “diffability” of different OWL formats by taking a
mid-size ontology incorporating a wide range of OWL constructs
(Uberon) and measuring
size of diffs between two ontology versions in relation to the change in
the number of axioms.

Starting with the 2014-03-28 release of Uberon, I iteratively removed
axioms from the ontology, saved the ontology, and measured the size of
the diff. The diff size was simply the number of lines output using
the unix diff command (“wc -l”).

This was done for the following OWL formats: obo, functional
notation (ofn), rdf/xml (owl), turtle (ttl) and Manchester notation
(omn). The number of axioms removed was 1, 2, 4, 8, .. up to
2^16. This was repeated ten times.

The OWL API v3 version 0.2.1-SNAPSHOT was used for all serializations,
except for OBO format, which was performed using the 2013-03-28
version of oboformat.jar. OWLTools was used as the command line
wrapper.

Results

The results can be downloaded HERE, and are plotted in the following
figure.

 

Plot showing size of diffs in relation to number of axioms added/removed

Plot showing size of diffs in relation to number of axioms added/removed

As can be seen there is a marked difference between the two RDF
formats (RDF/XML and Turtle) and the dedicated OWL serializations
(Manchester and Functional), which have roughly similar diffability to
OBO format.

In fact the diff size for RDF formats is both constant and large
regardless of the size of the diff. This appears to be due to
non-determinism when serializing axiom annotations.

This analysis only considers a single ontology, and a single version of the OWL API.

Discussion and Conclusions

Based on these results, it would appear to be a huge mistake to ever
manage an RDF serialization of OWL in a VCS. Using Manchester or
Functional gives superior diffability, with the number of axiom
changed proportional to size of the diff. OBO format offers human
readability of diffs as well, but this format is limited in
expressivity.

These recommendations are consistent with the size of the file in each format.

The following numbers are for Uberon:

  • obo 11M
  • omn 28M
  • ofn 37M
  • owl 53M
  • ttl 58M

However, one issue here is that RDF-level tools may not accept a
dedicated OWL serialization such as ofn or omn. Most RDF libraries
will however, accept RDF/XML or Turtle.

The ontology manager is then faced with a quandary – cut themselves
off from a segment of the semantic web and have diffs that are
manageable (if not readable) or live with enormous spurious diffs for
the benefits of SW integration.

The best solution would appear to be to manage source versions in a
diffable format, and release in a more voluminous RDF/semweb
format. This is not so different from software management – the users
consume a compile version of the software (jars, object files, etc)
and the software is maintained as diffable source. It’s generally
considered bad practice to check in derived products into a VCS.

However, this answer is not really satisfactory to maintainers of
ontologies, who lack tools as mature as those in the software
realm. We do not yet have the equivalent of Maven, CPAN, NPM, Debian,
etc for ontologies*. Modern ontologies have dependencies managed using
OWL imports that do not mesh well with simple repositories like
Bioportal that treat each ontology as a monolithic unit.

The approach I would recommend is therefore to adapt the RDF/XML
generator of the OWL API such that it is deterministic, or to write an
RDF roundtripper that always produces a determinstic
serialization. This should be coupled with ongoing efforts to add
human-readable class labels as comments to enhance readability of diffs.
Ideally the recommended deterministic serialization order would be formally
specified, such that different software (and different versions of the same
software) could adhere to it.

At the same time, we need to be working on analogs of maven and
package management systems in the ontology world.

 

Footnote:

Some ongoing efforts ito mavenize ontologies:

Updates:

 

 

 

 

 

Creating an ontology project

UPDATE: see the latest post on this subject.

 

This article describes how to manage all the various files created as part of an ontology project. This assumes some familiarity with the unix command line. The article does not describe how to create the actual content for your ontology. For this I recommend the material from the NESCENT course on building anatomy ontologies, put together by Melissa Haendel and Matt Yoder, as well as the OBO Foundry tutorial from ICBO 2013.

Some of the material here overlaps with the page on making a google code project for an ontology.

Let’s make a jelly project

For our purposes here, let’s say you’re a Cnidaria biologist and you’re all ready to start building an ontology that describes the anatomy and traits of this phylum. How do you go about doing this?

Portuguese man-of-war (Physalia physalis)

Portuguese man-of-war (Physalia physalis)

Well, you could just fire up Protege and start building the thing, keeping the owl file on your desktop, periodically copying the file to a website somewhere. But this isn’t ideal. How will you track your edits? How will you manage releases? What about imports from other ontologies (you do intend to import parts of other ontologies, don’t you? If the answer is “no” go back and read the course material above!).

It’s much better to start off on the right foot, keeping all your files organized according to a directory structure common layout, and making use of simple and sensible practices from software engineering.

OORT

As part of your release process you’ll make use of OWLTools and OORT, which can be obtained from the OWLTools google code repository.

Make sure you have OWLTools-Oort/bin/ in your PATH

The first thing to do is to create a directory on your machine for managing all your files – as an ontology developer you will be managing a collection of files, not just the core ontology file itself. We’ll call this directory the “ontology project”.

To make things easy, owltools comes with a handy script called create-ontology-project  to create a stub ontology project. This script is distributed with OWLTools but is available for download here:

http://owltools.googlecode.com/svn/trunk/OWLTools-Oort/bin/create-ontology-project

The first thing to do is select your ontology ID (namespace). This *must* be the same as the ID space you intend to use. So if your URIs/IDs are to be CNIDO_0000001 and so on, the ontology ID *must* be “cnido“. Note that whilst your IDs will be in SHOUTY CAPITALS, the actual ontology itself is all ~~gentle lowercase~~~, even the first letter. This is actually part of OBO Foundry ID policy.

Running the script

Now, type this on the command line:


create-ontology-project cnido

(you will need to add this to your path – it’s in the OWLTools-Oort/bin directory).

You will see the following output:


SUCCESS!
Directory Listing:
-----------------
cnido
cnido/doc
cnido/doc/README.txt
cnido/images
cnido/images/README.txt
cnido/LICENSE.txt
cnido/README.txt
cnido/src
cnido/src/ontology
cnido/src/ontology/catalog-v001.xml
cnido/src/ontology/CHANGES
cnido/src/ontology/diffs
cnido/src/ontology/diffs/Makefile
cnido/src/ontology/imports
cnido/src/ontology/imports/README.txt
cnido/src/ontology/cnido-edit.owl
cnido/src/ontology/cnido-idranges.owl
cnido/src/ontology/Makefile
cnido/tools
cnido/tools/README.txt

followed by:

What now?
* Create a git or svn project. E.g.:
cd cnido
git init
git add -A
git commit -m 'initial commit'
* Now visit github.com and create project cnido-ontology
* Edit ontology and create first release
cd cnido/src/ontology
make initial-build
* Create a jenkins job

What next?

You may not need all the stub files that are created from the outset, but it’s a good idea to have them there from the outset, as you may need them in future.

I recommend your first step is to follow the instructions above to (1) initiate a local git repository by typing the 4 commands above (2) publish this on github. You will need to go to github.com, create an account, create a project, and select “create a project from an existing repository“. (more on this later).

Once this is done, you can start modifying the stub files.

The top level README provides a high level overview of your project. You should describe the content and use cases here. You can edit this in a normal text editor — alternately, if you intend to use github (recommended) then you can wait until you commit and push this file and then edit it via the github web interface.

You will probably spend most of your time in the src/ontology directory, editing cnido-edit.owl

id-ranges

If you intend to have multiple people editing, then the cnido-idranges.owl file will be essential. You can edit this directly in Protege (but it may actually be easier to edit the file in a text editor). Assign each editor an ID range here (just follow the existing file as an example). Note that currently Protege does not read this file, so this just serves as formal documentation.

In future, id-ranges may be superseded by urigen servers, but for now they provide a useful way of avoiding collisions.

Documentation

If you use github or another hosting solution like google code, you can use their wiki system. You should keep any files associated with the documentation (word docs, presentations, etc) in the doc/ folder. You can link to them directly from the wiki.

Images

You can ask the OBO admins to help you set up a purl redirect such that URLs of the form

http://purl.obolibrary.org/obo/cnido/images/

Will redirect to your images/ directory, which is where you will place any pictures of jelly fish or anenome body parts that you want to be associated with classes in the ontology (assuming you have the rights to do this). I recommend Jim Balhoff’s depictions plugin.

Imports

Managing your imports can be a difficult task and deserves its own article.

For now you can browse the ctenophore-ontology project to see an example of a setup, in particular:

This setup uses the OWLAPI for imports, but others prefer to make use of OntoFox.

Releases

You can use OORT to create your release files. The auto-generated Makefile stub should be sufficient to manage a basic release pipeline.In the src/ontology directory, type this:


make all

This should make both cnido.obo and cnido.owl – these will be the files the rest of the world sees. cnido-edit is primarily seen by you and your fellow cnidarian-obsessed editors.

Caveats

Depending on the specific needs of your project, some of the defaults and stubs provided by the create-ontology-project script may not be ideal for you. Or you may simply prefer to create the directory structure manually, it’s not very hard – this is of course fine. The script is provided primarily to help you get started, hopefully it will prove useful.

Finally, if you know any cnidarian biologists interested in contributing to an ontology, let me know as we are lacking detailed coverage in existing ontologies!

 

GO annotation origami: Folding and unfolding class expressions

With the introduction of Gene Association Format (GAF) v2, curators are no longer restricted to pre-composed GO terms – they can use a limited form of anonymous OWL Class Expressions of the form:

GO_Class AND (Rel_1 some V_1) AND (Rel_2 some V2)

The set of relationships is specified in column 16 of the GAF file.

However, many tools are not capable of using class expressions – they discard the additional information leaving only the pre-composed GO_Class.

Using OWLTools it is possible to translate a GAF-v2 set of associations and an ontology O to an equivalent GAF-v1 set of associations plus an analysis ontology O-ext. The analysis ontology O-ext contains the set of anonymous class expressions folded into named classes, together with equivalence axioms, and pre-reasoned into a hierarchy using Elk.

See http://code.google.com/p/owltools/wiki/AnnotationExtensionFolding

For example, given a GO annotation of a gene ‘geneA’:

gene: geneA
annotation_class:  GO:0006915 ! apoptosis
annotation_extension: occurs_in(CL:0000700) ! dopaminergic neuron

The folding process will generate a class with a non-stable URI, automatic label and equivalence axiom:

Class: GO/TEMP_nnnn
  Annotations: label "apoptosis and occurs_in some dopaminergic neuron"
  EquivalentTo: 'apoptosis' and occurs_in some 'dopaminergic neuron'
  SubClassOf: 'neuron apoptosis'

This class will automatically be placed in the hierarchy using the reasoner (e.g. under ‘neuron apoptosis’). For the reasoning step to achieve optimal results, the go-plus-dev.owl version should be used (see new GO documentation). A variant of this step is to perform folding to find a more specific subclass that the one used for direct annotation.

The reverse operation – unfolding – is also possible.  For optimal results, this relies on Equivalent Classes axioms declared in the ontology, so make sure to use the go-plus-dev.owl. Here an annotation to a pre-composed complex term (eg neuron apoptosis) is replaced by an annotation to a simpler GO term (eg apoptosis) with column 16 filled in (e.g. occurs_in(neuron).

The folding operation allows legacy tools to take some advantage of GO annotation extensions by generating an ‘analysis ontology’ (care must be taken in how this is presented to the user, if at all). Ideally more tools will use OWL as the underlying ontology model and be able to handle c16 annotations directly, ultimately requiring less pre-coordination in the GO.