TLDR: The Evergreen and Koha integrated library systems now
express their record details in the schema.org vocabulary out of the box using RDFa.
Individual holdings are expressed as Offer instances per the W3C
Schema Bib Extension community group proposal to parallel commercial sales
offers. And I have published a branch to give the same capabilities to
the VuFind discovery layer, as well.
In the spring of 2012, I took my first steps in the structured data world by
teaching Evergreen 2.2 how to express some record details in the schema.org. It was a small step towards taking the
machine-readable data that we had made useful to humans on the record detail
catalogue page and marking it up so that it was once again machine readable. At
that time, Evergreen only knew how to map MARC data to two schema.org types
(Book and MusicRecording--which should have been
MusicAlbum, but I eventually fixed that) and a handful of
attributes: name, ISBN, publisher,
publication date, author, contributor,
and keywords. Pretty barebones, but a start nonetheless.
I used the HTML5 microdata approach because I was new to structured data and
microdata was what was demonstrated in all of the schema.org examples, so it
seemed like the obvious choice. Over the last year, however, I realized that
RDFa is a W3C standard for
accomplishing the same goals as microdata, bolstered by an open community
standards-making process, and featuring the ability to mix in properties and
types from multiple vocabularies. I touched on this in my Evergreen 2013
conference presentation: Structured
data: making metadata matter for machines. While RDFa Lite is extremely easy to get
started with, I have been diving deeper into RDFa proper to make use of some of
the more advanced properties, such as @about to work around
unwanted chaining introduced by @href attributes.
Over the last few weeks, I was able to concentrate on improving the
schema.org mapping for Evergreen--introducing holdings as instances of the
http://schema.org/Offer class, providing much more granular author and
contributor data--and cut over to RDFa. While the tools at RDFa Tools were quite useful for debugging
my efforts, I also have to thank the denizens of the #rdfa IRC channel (Manu Sporny in particular) for
patiently helping me understand some of my rookie mistakes. Ben Shum also kept
me honest by patiently testing multiple iterations of my branches with the
Google Rich Snippets tool and reporting any issues that he encountered; this
led to my realization that using @resource and @about
were necessary in some contexts.
Once I had worked out a decent mapping in Evergreen (a library system I have
been contributing to for over six years now), I decided to tackle the VuFind
discovery layer. VuFind uses a straightforward template system, and I was able
to put together a branch that integrated schema.org as RDFa (details at VuFind bug 425), building
on Eoghan Ó Carragáin's initial efforts. Once again I included
holdings-as-Offers, as the Evergreen driver for VuFind made that easy enough to
test. As part of my work, I contributed some enhancements for the Evergreen
driver that have already been integrated into VuFind. The initial reception
from the VuFind community was positive, although my branch arrived too late for
the VuFind 2.1 release; if all goes well, it will be integrated for the VuFind
2.2 release. In the mean time, sites running VuFind that want schema.org
structured data can integrate my branch themselves--and please provide feedback!
As I was on a roll, I also opted to tackle the Koha integrated library
system. With some initial pointers from Galen Charlton and Chris Cormack to the
XSLT-based templating system that Koha uses, I was able to implement schema.org
with holdings-as-Offers in a matter of hours for the first iteration. Jared
Camins then worked patiently with me as I added small commits to address issues
that came up on the Evergreen side, but in under a week from
start to finish the branch was signed off, passed QA, and and pushed to master.
(It actually broke the build due to a coding
violation--doh!--but that was quickly cleaned up.)
The upshot? We now have two library systems set to publish rich schema.org
structured data--including holdings--in RDFa, out of the box by default, in
their record detail pages on the Web, and a third system ready to go. Let me
simply say that I love the agility of open source software. So, for
the future, I intend to tackle a few more library systems; digital repositories
seem like they would be worthwhile targets. On that front, I have inquired
on the DSpace developers' list about whether there is still interest in
integrating schema.org (as had been expressed a year ago), but have not
yet received a reply. Perhaps ArchivesSpace, or furthering the existing support
on Islandora? Let me know if you're interested!
I'd be interested in furthering the existing support in Islandora, but it may be a few months out for us. We're about to launch our first collection and I'm still teaching myself how to manage the metadata and forms. Extending RDFa would be a good bootstrapping project.