Ontology driven site for exploring fruit fly brains and their neurons:
‘Bitzer,’ said Thomas Gradgrind. ‘Your definition of a horse.’ ‘Quadruped. Graminivorous. Forty teeth, namely twenty-four grinders, four eye-teeth, and twelve incisive. Sheds coat in the spring; in marshy countries, sheds hoofs, too. Hoofs hard, but requiring to be shod with … Continue reading
Ontology driven site for exploring fruit fly brains and their neurons:
This post is written largely as a response to Phil Lord’s thoughtful response to my previous post on realism and to a marathon pair of threads on the OBO-discuss e-mail list realism and ontology building (taster here). I thought the initial discussion in those threads was useful and pointed the way towards some progress and compromise. Unfortunately, and perhaps inevitably, by the end people’s positions appeared more polarized than ever. I’m not sure this is justified.
Anyway, back to my response to Phil. I’ll try to deal first with the philosophy (or my probably naive take on it), and only after that with how that might relate to practical issues of ontology building.
“Most scientists believe in reality so, when faced with realism vs conceptualism, their gut feeling is that the former will be right. They believe in a mind-independent reality so, therefore, conceptualism must be wrong.”
Not only do I believe in mind-independent reality, I believe that science makes claims about mind independent reality that it is reasonable to believe are true. In my experience, most scientists (certainly most biologists) believe this too. In making claims about mind independent reality, science makes claims about the regularity of the universe. When a chemist states some conclusion about a reaction involving benzene molecules, she is referring to a class of things with shared properties (something like: 6 carbon atoms arranged in a hexagon around which, electrons are delocalized from the carbon atoms, resulting in even length and strength C-C bonds… (I’m no chemist)) many instances of which exist. I have no problem with someone calling such classes universals as a way of distinguishing them from arbitrary or contingent collections of objects. I, perhaps naively, believe this to be a form of realist position. And it’s a position I hold irrespective of its application to ontology building (of which more later). My views on this largely come from two years I spent arguing with postmodernists who taught a masters degree in Science Communication I took at Imperial College.
It strikes me that what Phil calls realism seems to be much more specific than this – he at least sometimes gives the impression that a realist position involves accepting the BFO + whatever Barry Smith proposes. My knowledge of philosophy is quite limited, but I’ve read enough to know that this is unwarranted – it seems to stem largely from ongoing arguments about nature of the OBO-Foundry .
I’m also not convinced by Phil’s arguments about mathematical abstraction. Mathematical abstraction is clearly a critical part of science (or at least much of it), and I’m convinced that the current OBO foundry / BFO approach is fundamentally flawed in its obvious inability to cope with it. But I also think one should be careful about reducing scientific claims to mathematical abstractions alone. Phil quotes Feynman:
The next question was – what makes planets go around the sun? At the time of Kepler some people answered this problem by saying that there were angels behind them beating their wings and pushing the planets around an orbit. As you will see, the answer is not very far from the truth. The only difference is that the angels sit in a different direction and their wings push inward.
Character of Physical Law
— Richard Feynman
And goes on to say “The statement that g∝1/r2 is the same as Fwingsofangels∝1/r2. So long as we agree that the angels behave in a precise, predictable way, there is no deep reason to distinguish between the two, except for simple pragmatism: “gravity” is shorter and easier to say than “the wings of angels”.”
The obvious response is: even if Kepler had got the maths right, is it really irrelevant to our acceptance or rejection of his theory whether he believed that force was exerted on the planets by creatures with wings sprouting from their backs? Even in the most mathematically abstracted areas of science, we can’t completely purge ontological claims.
Why should this be of any relevance to ontology building?
Well, first of all, this position is a useful counterbalance to those (depressingly common) who confuse ontology building with building dictionaries or thesauri. If I tried to take into account every usage of a particular word by biologists, I don’t believe I could build a logically consistent structure, or one that is at all useful for reasoning. It is also a counterbalance to those (once depressingly common, now less so) who think that the job of ontology building involves simply asserting is_a and part_of heirarchies of undefined terms without worrying much about logic. I’ll concede that a conceptualist approach may also provide a counterbalance to these approaches.
Secondly, and more importantly, I care about whether it is reasonable to believe, on the basis of the scientific evidence we have, that instances of the classes I define exist. I care about this because
(a) One major aim of my work is to provide a reference for wild-type anatomy. If I mix in classes of structure that have no instances, my ontology ceases to be a reliable reference.
(b) I see no reason to expect logical consistency if classes lacking instances are allowed.
– I hold the assumption that the real world that our scientific theories make statements about does not contradict itself
I should qualify this by saying that, realistically, only making terms for classes that have we have good reason to believe have (or have had) instances is an aspiration. I’m sure I fall short all the time, but I still think it a worthwhile aim. I also have no objection to people building ontologies that include classes we have good reason to believe to do not have instances, as long as we have some clear label them so that those of us who don’t want to work with these can filter them out.
Now, talk of instances means, as far as I can tell, that reality (or claims about it) are at least sometimes of importance to conceptualists. I have to admit, I’m still puzzled by how one can claim to be a conceptualist and talk of the exist>ance or not of instances, but that they do points to potential compromises:
Can we find ways to mark classes to make this disinction clear? Something along the lines of:
– Class we have good reason to believe has no instances [REFS?]
– Class believed on theoretical grounds alone to have instances [REFS]
– Class for which there is experimental evidence for the existance of instances – evidence summary [REFS]
With these distinctions made, I can choose to only import terms of the third class into the ontologies I build (even then I may set criteria for accepting terms based on the quality of the evidence).
Finally, I see talk of universals as useful only in *guiding* what I do – making me wary of classes that appear arbitrary, contingent, cobbled together for convenience. I don’t see the need to get hung up on drawing a clean line between universals and other classes – in my experience, this is neither practical nor useful.
None of this means I reject engineering considerations in ontology building. In fact, they’re essential to what I do. I lose sleep over whether the ontologies I build are maintainable and scaleable – in part because I know the pain of untangling a poorly constructed ontology that has been used for massive numbers of annotations, but also because I’d like to think I’m building a structure others could build on in the future. It is also essential that the ontologies I build can be used with a reasoner to answer particular types of query, and to do so as rapidly as possible. Such queries are running live on our test site. Again, engineering considerations are essential to this.
A major and seemingly endless debate (feud?) within the ontology world is between those who describe themselves as realists, and those who I’ll call conceptualists (I’m not sure many would use this label, but the people i have in mind are united by a love of the word ‘concept’.)
Ever since I encountered this argument, I’ve thought of myself as taking a realist approach to ontology building (at least for ontologies that cover scientific knowledge). My reasons for this haven’t really changed much in the face of the various arguments I’ve encountered:
Science makes claims about the world and assumes that world to be consistent. We are trying to build logically consistent structures that are useful for making true inferences – making yet more claims about the world that logically follow from the claims made by science. The results of those inferences will be judged by how they match reality. An inference that is demonstrably false indicates a problem with the initial assertions (or with the inference mechanism). All of this gives us a clear mechanism for judging the quality of an ontology: according to current scientific understanding, are the assertions and inferences of an ontology true?
It seems reasonable to me to claim that, if I build an anatomy ontology that can answer the question “What bones are part of normal human arm?” I’m recording assertions about actual (real) anatomy . The ontology is asserting that – if you were to dissect a normal* human arm, you’d find a set of bones with some specific set of characteristics defined in the ontology. To put it another way, the ontology that can answer this question is recording some regularity in the world. It is a statement about the existence of some class of structures. (* to be fair, the term ‘normal’ here is a good place to start challenging the claims I’m making).
The assertions made in an ontology may, of course, be wrong. But it would be rather silly to claim that infallibility is required in order to follow a realist approach. Only caricature realists (Mr Gradgrind?) believe science to be infallible. To say that we have reason to believe the claims science makes about reality because they are based on evidence is not to say that science’s claims are infallible. Scientists spend their lives judging whether particular claims are reasonable to believe given the evidence. In many cases, the evidence is so overwhelming that we might go further and say that the claims in question are beyond reasonable doubt.
But, who can discount new evidence coming along at some point that overturns such theories? Well, as a realist I think that if a theory is convincingly overturned, we should amend our ontologies accordingly. What would a non-realist do? Should a conceptualist’s ontology of chemistry include phlogiston?
In the meantime, I think we should just record evidence for any assertion we make that might be controversial and link our assertions to the literature that supports them. I see no reason not to extend this to widely believed theoretical predictions. One could make a term for the class Higgs Boson and populate it with predicted properties. I don’t see this as anti-realist: the prediction is built on much indirect evidence as well as theory. Experiments at the LHC should soon tell us whether to delete the term, or to associate more evidence and literature with it.
A realist approach distinguishes what I do clearly from those who think the job of ontologies is to model the language that scientists use. Even within small expert communities of scientists, very many terms are not used consistently. Partly this is because scientists learn many words in the same way as the rest of us – not by memorising some definition but by following the example of others. When more precision is needed for an argument, this is usually specified by the context of usage of a term. But ontologies and ontology annotation, rip words from their context. While we use words from scientific discourse and try to reflect their meaning as closely as possible, in the end the meaning of an ontology term comes from its definition. And to make ontologies that are useful for reasoning, this definition is almost inevitably tighter than the various meanings of general usage
So why are so many intelligent people hostile to a realist approach?
One possible reason is a failure of nerve. Many people become quite nervous at talk of truth and reality. They want to build their epistemic uncertainty into their ontologies. Why anyone should consider this a practical approach has never been clear to me.
Perhaps as important as any direct reason are the various things associated with the approaches of some prominent figures in the realist camp.
Perhaps foremost among these is an insistence by certain realists on the existence and importance of Universals. Some would say that only Universals should be allowed in an ontology. But what’s a universal? It could be an Aristotelian natural type. (Anyone who knows some history of biology should be nervous of this). But I see the term simply as a way of talking about the regularities in the world that are both the objects of scientific discourse and the basis of scientific generalizations. This excludes completely arbitrary groupings. A class that includes only laptops and empty coffee cups is unlikely to be the subject of scientific generalizations. The class of benzene molecules is. However, the grey area between these two extremes is large. I suspect that trying to pre-judge some crisp boundary between the two is rather a waste of time and could all too easily end up excluding classes scientists do find it useful to generalize about.
Finally, some realists have an attachment to particular upper ontologies that a number of conceptualists object to – perhaps with good reason. But, given that realists themselves often disagree vigorously about upper ontologies, objection to some particular upper ontology is not necessarily an objection to realism.
So, for now at least, I’m convinced that realism provides a practical approach to building ontologies of science. But, I’d be very interested to hear arguments against those I’ve presented here.
So, what’s this all about?
I’m a biologist (curator) working for a large database that has long maintained a bunch of structured, controlled vocabularies, the largest of which covers anatomy. When I started work at the database, much of my interaction with these structured vocabularies, which I soon learned to refer to as ‘ontologies’, was through a command-line tool written by our eccentric, massively bearded and super-hacky developer – the only developer I’ve ever met who was proud to be on the ‘trailing edge of technology’. This had a bunch of commands with names like categorise, dissect, locate, origin, fate. The names are self explanatory: categorize showed the various ways a particular structure was classified, dissect displayed its parts, locate showed what it was part of and origin and fate showed what the structure developed from an into. I thought it was pretty cool – might provide a good way for users to explore anatomy, especially if hooked up to some pretty pictures and some written definitions (it didn’t seem to have many of these).
Pretty cool… but it didn’t always give the right answers, and the answers were very frequently incomplete. So, I got curious about fixing the underlying data structure.
But how? I naively assumed that there must be some clear rules for working with this stuff… rules that would get our hacky little tool – and hopefully a less hacky online tool for our site – to give the right answers. How could I know how to add sufficient part_of relations in the right place for the dissect command to work. How did part_of and is_a interact? And anyway, what does this relation part_of really mean?
As I started to investigate editing this data structure – not a database, but a large text file with an eccentric syntax – another, related issue stuck me. How had the organisation of these files come about? Was it even possible to work with them in any sustainable way?
And then there was the way that we edited these files – via a GUI-type editor that was, at the time at least, more than a little bug-ridden and unstable, especially when running on our Sun workstations (very trailing edge…). The tool was promising, but very frustrating, and not much help in guiding how I should work. Surely there had to be a better way…
So. It’s a few years later and I’m a lot closer to having decent answers to these questions, but I’m certainly not there yet. That original command-line tool, written in archaic Pascal, died with our last batch of Suns. But now I have new toys – OWL reasoners and Protege 4. Along the way, I’ve been to too many meetings. I’ve learnt to code passably (mainly in Perl), picked up some logic, some realist philosophy, read lots of old anatomical texts. I’ve met and argued with philosophers, logicians, computer scientists and various biologists at various stages of the same learning process I was going through. Its been both fun and frustrating and I’m still only half convinced this is actually a career. I’m starting this blog as a way to work through some of my ideas about ontologies, their use and their development. If nothing else, the process might help me get some of my ideas straight, but I hope a few people might turn up and comment…