Agrelon and the Future Direction of IrishGen
Christopher Guy Yocum
ORCID: 0000-0002-7241-3264
With the newest release of IrishGen 20220907.1, the Agrelon ontology is fully incorporated in the IrishGen database. This will allow users to query IrishGen based on either the Relationship Ontology or Agrelon. I would recommend using Agrelon from 20220709.1 onward for reasons set out in this post.
At this point there is not much more than can be done with IrishGen in its current form. To get where it is, I used CELT transcriptions of LL and Rawl. B502. Other information was transcribed from published genealogical sources which were easily transcribed. For the medieval Irish genealogies, that is not the whole corpus. There are many other genealogical corpora. Currently, there are 38 open MS corpora as recorded in the IrishGen project page. To get these into IrishGen, I would need to transcribe them into a file then translate them into RDF. This is a laborious process that, while valuable in its own right, will take time and money and is overwhelming for a single individual to do. I started on NLS Adv MS 72.1.6 so I may return to that. In the long run, IrishGen may not grow very quickly without more volunteers or some other solution coming to light.
One last thing that IrishGen needs that may be easily done is to swap
the hash for the slash in the URLs that are created for each
individual. The reason for this is that the hash leaks that the
database is based on files as it includes the .trig
in the URL. If
this were transferred to the web and put on a server, it would be much
cleaner for access by browser and easier to remember.
For instance, currently to link to an individual instance in an item,
someone would need to use this URL
http://example.com/LL/aisneidem_di_araill.trig#Conchobuir
. This is
sometimes called a “leaky abstraction” as it leaks that there exists a
file called aisneidem_di_araill.trig
which contains the item
entitled “Aisneidem Di Araill” and that it is in the TRiG file format.
To hide this from someone looking at a URL, swapping the #
for a /
will do that and make a few other things possible. A URL that uses a
slash will look like
http://example.com/LL/aisneidem_di_araill/Conchobuir
. In this
situation you can easily parse the URL this way http://example.com
+
MS + Item + Individual Instance. Thus if a user requests
http://example.com/LL/aisneidem_di_araill/
, they will get the
metadata back about that item from the server. If they request
http://example.com/LL
, they can get the metadata about the
manuscript. This hides the fact that the entire database is a set of
flat files on Github as the user will likely never care about that and
indeed should not care. Moreover, the underlying database (if there
were to be one online in the future), would only need to run a SPARQL
describe
query to return all known information about a requested URL.
I will concentrate on the hash/slash swap which will be a bit tricky
as I will need to keep the owl:sameAs
links that are in the files
consistent if they change. I have scripts which will help me detect
this so it is only a minor annoyance. The next thing after that is
much more daunting which is transcribing genealogical corpora which
have not been published. I will probably pick up NLS Adv MS 72.1.6
again which has some of its contents published
here
as TEI XML. I am not sure if it would be worth doing all the context
work to attempt a proper publication out of it (or any of the other
genealogical corpora) but I may explore what the options are once I
get that far.