Agrelon
Christopher Guy Yocum
ORCID: 0000-0002-7241-3264
Choosing an ontology is always a difficult task (see Triplestores, Ontologies, and Reasoning). There are many different considerations to make. Among them is: availability, suitability for the dataset, and complexity. There is also the ever present pull of writing an ontology for your data from scratch, which is often best avoided but is always a temptation.
For IrishGen, I decided early on to use the Relationship ontology. This was due to the fact that it covered most of what was necessary and any additional predicates needed or any adjustments could be incorporated in a follow-on ontology. Moreover, the Relationship ontology was built on top of the very well-known and universally used Foaf ontology.
In the course of the development of IrishGen, more predicates needed
to be added to cover some more areas that the Relationship ontology
did not. Additional tweaks were also needed to make the parentOf
relation into a subproperty of the ancestorOf
predicate (and
inversely for the childOf
) as this would not allow transitivity
backwards (also remember that your parents are your ancestors as
well).
In private email conversations with other individuals who are
interested in the Semantic Web and Linked Data, a serious flaw in the
way in which the Relationship ontology uses owl:equivalentClass
was
brought to my attention. They proposed that IrishGen use an ontology
for people and institutions which was developed separately by them and
addressed the problem, called
Agrelon. Then as
now, switching from using the Relationship ontology to using Agrelon
would involve a lot of work so I politely declined. However, I kept
the idea in my mind.
While working with Stardog, you will notice that some queries are just plain slow. There is an example of this with Stardog in Making IrishGen SPARQL Part II: Constructs where a Construct query would take 21 seconds to run when it would not be expected to take that long. This was a constant irritant in other queries as well and I preferred Stardog’s strict adherence to standards. So, in a fit of aggravation, I did a quick and dirty transformation of IrishGen from the Relationship ontology to Agrelon as I remembered the flaw that my email conversation earlier had pointed out. In the case that I was working with, I got a massive speed boost to the query. This boosted my motivation to attempt the integration of Agrelon into IrishGen.
The more interested among you may have noticed that in the utils
directory in the IrishGen repository has gained the Python file
irish_gen_add_agrelon.py
.
This Python script will, with the proper libraries installed (see code for details), add Agrelon to an IrishGen TRiG file. A user can now run this against some or all of IrishGen to add Agrelon predicates to IrishGen. Also, I learned a few things about Agrelon while writing the script. For instance, Agrelon contains many of the relations that the Relationship ontology lacked and needed to be added by IrishGen (including the ancestor tweak mentioned above) which means less work on the part of IrishGen to maintain the special ontologies.
However, the reader may notice that I used a script to add the possibility of using Agrelon rather than adding Agrelon directly to the IrishGen TRiG files. There is a formal limitation to using the Python script to add Agrelon. The formal limitation is that Graphs have no natural ordering. This might not seem like an impediment but IrishGen’s files generally follow the order of the text that they transform. A user should be able to read the IrishGen file and follow along from the original source material. If the script was run and the files checked in directly from that, the the link between the order of the source text and the ordering of the corresponding file would be lost and visual auditing by a human would become prohibitively frustrating and time consuming. Reordering the files by hand after running the script would be a prohibitive amount of work without many more people willing to become involved.
As for the speed issue, when using GraphDB, it is much less of a concern because of the fact that GraphDB does forward-chaining which calculates all the implications of the ontologies and then puts that on disk. This means that Agrelon’s speed boost is only really good for backwards-chaining systems like Stardog, where the implications of the ontologies are only worked out when the query is executed. The Relationship ontology’s flaw still exists, of course, but it does not interfere with the regular use of IrishGen so it less of a concern.
I would recommend giving the Agrelon ontology a try and comparing it with the Relationship ontology.