So, what’s this all about?
I’m a biologist (curator) working for a large database that has long maintained a bunch of structured, controlled vocabularies, the largest of which covers anatomy. When I started work at the database, much of my interaction with these structured vocabularies, which I soon learned to refer to as ‘ontologies’, was through a command-line tool written by our eccentric, massively bearded and super-hacky developer – the only developer I’ve ever met who was proud to be on the ‘trailing edge of technology’. This had a bunch of commands with names like categorise, dissect, locate, origin, fate. The names are self explanatory: categorize showed the various ways a particular structure was classified, dissect displayed its parts, locate showed what it was part of and origin and fate showed what the structure developed from an into. I thought it was pretty cool – might provide a good way for users to explore anatomy, especially if hooked up to some pretty pictures and some written definitions (it didn’t seem to have many of these).
Pretty cool… but it didn’t always give the right answers, and the answers were very frequently incomplete. So, I got curious about fixing the underlying data structure.
But how? I naively assumed that there must be some clear rules for working with this stuff… rules that would get our hacky little tool – and hopefully a less hacky online tool for our site – to give the right answers. How could I know how to add sufficient part_of relations in the right place for the dissect command to work. How did part_of and is_a interact? And anyway, what does this relation part_of really mean?
As I started to investigate editing this data structure – not a database, but a large text file with an eccentric syntax – another, related issue stuck me. How had the organisation of these files come about? Was it even possible to work with them in any sustainable way?
And then there was the way that we edited these files – via a GUI-type editor that was, at the time at least, more than a little bug-ridden and unstable, especially when running on our Sun workstations (very trailing edge…). The tool was promising, but very frustrating, and not much help in guiding how I should work. Surely there had to be a better way…
So. It’s a few years later and I’m a lot closer to having decent answers to these questions, but I’m certainly not there yet. That original command-line tool, written in archaic Pascal, died with our last batch of Suns. But now I have new toys – OWL reasoners and Protege 4. Along the way, I’ve been to too many meetings. I’ve learnt to code passably (mainly in Perl), picked up some logic, some realist philosophy, read lots of old anatomical texts. I’ve met and argued with philosophers, logicians, computer scientists and various biologists at various stages of the same learning process I was going through. Its been both fun and frustrating and I’m still only half convinced this is actually a career. I’m starting this blog as a way to work through some of my ideas about ontologies, their use and their development. If nothing else, the process might help me get some of my ideas straight, but I hope a few people might turn up and comment…