This is one post in a series of tips on ontology development, see the parent post for more details.
The main premise of this piece is that ontology developers can learn from the experience of software engineers. Ontologists are fond of deriving principles based on abstract concepts or philosophical traditions, whereas more engineering-oriented principles such as those found in software engineering have been neglected, which is to our detriment.
Figure: An appreciation of engineering practice and in particular software development principles is often overlooked by ontologists.
In its decades-long history, software development has matured, encompassing practices such as modular design, version control, design patterns, unit testing, continuous integration, and a variety of methodologies , from waterfall top-down design through to extreme and agile development. Many of these are relevant to ontology development, more than you might think. Even if a particular practice is not directly applicable, knowledge of it can help; thinking like a software engineer can be useful. For example, most good software engineers have internalized the DRY principle (Don’t Repeat Yourself), and will internally curse themselves if they end up duplicating chunks of code or logic as expedient hacks. They know that they are accumulating technical debt (for example, necessitating parallel updates in multiple places). The DRY principle and the DRY way of thinking should also permeate ontology development. Similarly, software developers cultivate a sense of ‘code smell’, and will tell you if a piece of code has a ‘bad smell’ (naturally, this is always code written by someone else).
Don’t worry if you don’t have any experience programming, as the equivalent ontology engineering practices and intuitions can be learned through training, experience, the use of appropriate tools, and sharing experience with others. Unfortunately, tools are not yet as mature for ontology engineering as they are for software, but we are trying to address this with the Ontology Development Kit, an ongoing project to provide a framework for ordinary ontology developers to apply standard engineering principles and practice.
Computer scientists and software engineers are also fortunate in having a large body of literature covering their discipline in a holistic fashion; this includes classics such as The Mythical Man Month, The “Gang of Four” Design Patterns book, Martin Fowler’s blog and his book Refactoring. While there are good textbooks on ontologies these tend to be less engineering-focused, at best analogs of the (excellent, but sometimes theoretical) Structure and Interpretation of Computer Programs. Exceptions include the excellent, practical engineering-oriented ontogenesis blog.
An incomplete list of transferrable software concept, principles, and practice includes:
- Version control: use GitHub or equivalent (why? see the top answer here, just replace “code” with “ontology”)
- Use continuous integration (see here), social coding, non-private issue trackers
- Commit early, commit often
- Read about design patterns, and anti-patterns. Recognize and act on bad smells
- Avoid spaghetti logic
- Separate source from compiled product (see ROBOT workflows).
- Be use case and requirements driven. Usability of the ontology is key. Your goal is not to make a perfect ontology to be marveled at by ontologists, but to build something that is usable by people in your domain.
- At the same time, build for extensibility. Understand the concept of technical debt.
- Test, test, test
- Learn debugging tools [see future blog post]
- Provide documentation for users
- Provide documentation for ontology developers (and your future self, you will thank your past self); especially your design patterns and design decisions. Inline documentation is your friend
- Think modular; reuse components from other ontologies, or from modules within your own ontology
- DRY: Don’t Repeat Yourself
- While good engineering is essential, the so-called soft skills are actually harder. Learn to acquire these.
- Ask for help early and often. And work to support inclusivity and a culture where people feel safe to ask questions.
I hold that all of the above are either directly transferrable or have strong analogies with ontology development. I hope to expand on many of these on this blog and other forums, and encourage others to do so.
Also, I can’t emphasize strongly enough that I’m not saying that engineering principles are more important than other attributes such as an understanding of a domain. Obviously an ontology constructed with either insufficient knowledge of the domain or inattention to users of the ontology will be rubbish. My point is just that a little time spent honing the skills and sensibilities described here can potentially go a very long way to improving sustainability and maintenance of ontologies.
This is a great collection Chris, super job! I have just one question about your list: “Separate source from compiled product (see ROBOT workflows).” The idea of a ‘compiled product’ is a little tricky to relate to the ontology development life cycle. Is it the ontology that gets produced by ROBOT workflow, or the inferred triples that get produced by an inference engine, or something else?
Looking forward to the coming posts!
Very important point. I would also add to the list, or at least think about: static code analysis, test coverage analysis
Good post! The two that I could question, I think are “modular” — clearly, yes, but the notion of modularity or at least the OWL support for it, is much weaker than modularity that we have in programming languages. In OWL it just means “stick your stuff in different files”.
The other one is DRY. This was a mantra in software development for a long time, but I think it has weakened in the last few years, following things like the “leftpad” incident. In software development, I would now say, balance the risk of repeating yourself, against the risk of a tangled dependency graph. Of course, I will be first to admit that BTRRYARTDG is less snappy than DRY, but you can see the point I am sure.
Thanks! I think all could be questioned, it’s more about building up a shared vocabulary so we can discuss these things clearly and constructively as a community. I will have a lot to say about modularity in future posts…
The leftpad/DRY example is great, I have seen a few leftpad-analog examples in OBO, where someone imports a single class X from ontology O, when said ontology O was abandonware, and X was completely out of scope for O anyway. Furthermore, importing the O module had the side-effect of injecting a bunch of poisonous axioms. Here the ontology developer was better rolling their own X.
In this case the only side-effect was an annoying delayed release of the ontology as an import chain spanning in incohrency was debugged. But when I look at some dependency chains I worry we’re leaving ourselves open for leftpad type incidents in the future.
Commentary on leftpad: https://www.davidhaney.io/npm-left-pad-have-we-forgotten-how-to-program/
Of course, the analogy isn’t perfect: for ontologies, reusing IDs avoids any ID mapping which users will thank you for, which doesn’t really have a coding analogy AFAIK.