Saturday, November 8. 2014
How discovery layers have closed off access to library resources, and other tales of schema.org from LITA Forum 2014
At the LITA Forum yesterday, I accused (presentation) most discovery layers of not solving the discoverability problems of libraries, but instead exacerbating them by launching us headlong to a closed, unlinkable world. Coincidentally, Lorcan Dempsey's opening keynote contained a subtle criticism of discovery layers. I wasn't that subtle.
Here's why I believe commercial discovery layers are not "of the web": check out their
User-Agent: * Disallow /
That effectively says "Go away, machines; your kind isn't wanted in these parts." And that, in turn, closes off access to your libraries resources to search engines and other aggregators of content, and is completely counter to the overarching desire to evolve to a linked open data world.
During the question period, Marshall Breeding challenged my assertion as being unfair to what are meant to be merely indexes of library content. I responded that most libraries have replaced their catalogues with discovery layers, closing off open access to what have traditionally been their core resources, and he rather quickly acquiesced that that was indeed a problem.
(By the way, a possible solution might be to simply offer two different URL patterns, something like
Compared to commercial discovery layers on my very handwavy usability vs. discoverability plot, general search engines rank pretty high on both axes; they're the ready-at-hand tool in browser address bars. And they grok schema.org, so if we can improve our discoverability by publishing schema.org data, maybe we get a discoverability win for our users.
But even if we don't (SEO is a black art at best, and maybe the general search engines won't find the right mix of signals that makes them decide to boost the relevancy of our resources for specific users in specific locations at specific times) we get access to that structured data across systems in an extremely reusable way. With sitemaps, we can build our own specialized search engines (Solr or ElasticSearch or Google Custom Search Engine or whatever) that represent specific use cases. Our more sophisticated users can piece together data to, for example, build dynamic lists of collections, using a common, well-documented vocabulary and tools rather than having to dip into the arcane world of library standards (Z39.50 and MARC21).
So why not iterate our way towards the linked open data future by building on what we already have now? As Karen Coyle wrote in a much more elegant fashion, the transition looks roughly like:
That is, by simply tweaking the same mechanism you already use to generate a human readable HTML page from the data you have stored in a database or flat files or what have you, you can embed machine readable structured data as well.
That is, in fact, exactly the approach I took with Evergreen, VuFind, and Koha. And they now expose structured data and generate sitemaps out of the box using the same old MARC21 data. Evergreen even exposes information about libraries (locations, contact information, hours of operation) so that you can connect its holdings to specific locations.
And what about all of our resources outside of the catalogue? Research guides, fonds descriptions, institutional repositories, publications... I've been lucky enough to be working with Camilla McKay and Karen Coyle on applying the same process to the Bryn Mawr Classical Review. At this stage, we're exposing basic entities (Reviews and People) largely as literals, but we're laying the groundwork for future iterations where we link them up to external entities. And all of this is built on a Tcl + SGML infrastructure.
So why schema.org? It has the advantage of being a de-facto generalized vocabulary that can be understood and parsed across many different domains, from car dealerships to streaming audio services to libraries, and it can be relatively simply embedded into existing HTML as long as you can modify the templating layer of your system.
And schema.org offers much more than just static structured data; schema.org Actions are surfacing in applications like Gmail as a way of providing directly actionable links--and there's no reason we shouldn't embrace that approach to expose "SearchAction", "ReadAction", "WatchAction", "ListenAction", "ViewAction"--and "OrderAction" (Request), "BorrowAction" (Borrow or Renew), "Place on Reserve", and other common actions as a standardized API that exists well beyond libraries (see Hydra for a developing approach to this problem).
I want to thank Richard Wallis for inviting me to co-present with him; it was a great experience, and I really enjoy meeting and sharing with others who are putting linked data theory into practice.
Monday, October 13. 2014
My slides from DCMI 2014: schema.org in the wild: open source libraries++.
Last week I was at the Dublin Core Metadata Initiative 2014 conference, where Richard Wallis, Charles MacCathie Nevile and I were slated to present on schema.org and the work of the W3C Schema.org Bibliographic Extension Community Group (#schemabibex). As a first-timer at DCMI, I wasn't sure what kind of an audience to expect: there is a peer-reviewed papers track, and a series of sessions on a truly intimidating topic (RDF Application Profiles), but on the other hand our own topic was fairly basic. As it turned out, there was an invigoratingly mixed set of backgrounds present, and Eric Miller's opening keynote, which gave an oral history of the origins of DCMI and a look towards the future challenges for the organization, reassured me that I wasn't going to be out of my depth.
Special kudos to Eric for his analogy of the Web to a credit card, which offers both human-readable and machine-readable data. A nice, clean image!
Richard, Charles and I opted to structure our 1.5 hour session as a series of short talks followed by a long period of discussion. However, as often happens, the excitement of speaking in front of a room that drew so many attendees that we had to jam with more chairs led to that plan breaking down. I cut my own materials back to illustrating how one of my primary contributions to the #schemabibex effort--representing library holdings using schema.org's GoodRelations-based Product/Offer model--had been implemented in free software library systems, including Evergreen, Koha, and VuFind. I walked from a basic bibliographic record (represented as a Product), through to the associated borrowable items (represented as Offers with a price of $0.00, call numbers as SKUs, and barcodes as serialNumbers), that were offered by a specific Library with its own set of operating hours, address, and contact information... all published out of the box as RDFa in modern Evergreen systems.
I did stray a little to posit that the use case for schema.org is not and should not be limited to "search engine optimization", but that this very simple level of structured data could fairly easily form the basis of an API. In the rather limited discussion that we were able to hold at the end of the session (and encroaching on break time), Charles counselled that libraries shouldn't really bother with dumbing down their beautiful metadata simply to publish schema.org... while I countered that the pursuit of publishing beautiful metadata in the past has generally led librarians to publish no metadata at all, and that schema.org was a great first step towards building a web of cultural heritage metadata meant for machine consumption.
I wish I could have stayed longer at DCMI, but it was Thanksgiving in Canada and there were families to visit and feast with--not to mention children to help take car of--so I had to depart after just a day and a half. I'm encouraged by the steps the organization is taking to renew itself, and I hope to be able to participate again in the future.
Saturday, September 13. 2014
Version 1.91 of the http://schema.org vocabulary was released a few days ago, and I once again had a small part to play in it.
With the addition of the workExample and exampleOfWork properties, we (Richard Wallis, Dan Brickley, and I) realized that examples of these CreativeWork example properties were desperately needed to help clarify their appropriate usage. I had developed one for the blog post that accompanied the launch of those properties, but the question was, where should those examples live in the official schema.org docs? CreativeWork has so many children, and the properties are so broadly applicable, that it could have been added to dozens of type pages.
It turns out that an until-now unused feature of the schema.org infrastructure is that examples can live on property pages; even Dan Brickley didn't think this was working. However, a quick test in my sandbox showed that it _was_ in perfect working order, so we could locate the examples on their most relevant documentation pages... Huzzah!
I was then able to put together a nice, juicy example showing relationships between a Tolkien novel (The Fellowship of the Ring), subsequent editions of that novel published by different companies in different locations at different times, and movies based on that novel. From this librarian's perspective, it's pretty cool to be able to do this; it's a realization of a desire to express relationships that, in most library systems, are hard or impossible to accurately specify. (Should be interesting to try and get this expressed in Evergreen and Koha...)
In an ensuing conversation on public-vocabs about the appropriateness of this approach to work relationships, I was pleased to hear Jeff Young say "+1 for using exampleOfWork / workExample as many times as necessary to move vaguely up or down the bibliographic abstraction layers."... To me, that's a solid endorsement of this pragmatic approach to what is inherently messy bibliographic stuff.
Kudos to Richard for having championed these properties in the first place; sometimes we're a little slow to catch on!
Friday, July 18. 2014
Since returning from my sabbatical, I've felt pretty strongly that one of the things our work place is lacking is open communication about the work that we do--not just outside of the library, but within the library as well. I'm convinced that the more that we know about the demands on each other's time and the goals that we're trying to achieve, the more likely we'll be able to work together towards the same goals and have a better understanding of each other's challenges.
Towards that end, I've decided to try maintaining a work blog so that my colleagues will have a better idea about what I've been up to. I wouldn't be surprised if some of my peers think that I sit in my office all day browsing the internet (which, actually, happens sometimes but I swear I'm doing it to try and find a solution for a problem!), because the day-to-day work of a systems librarian can be pretty esoteric. And when you know that they have many expectations for you to fix the many small annoyances they have to deal with, it might help them to develop some empathy if they understand what you actually are spending your time on.
Anyway, I decided not to mirror the content here because, well, it's probably too site-specific to really be of interest to you, my dear readers. Whoever you are. However, I will link to the two entries that I've cranked out so far; you can decide if you want to follow along from there:
This work is licensed under a Creative Commons Attribution-Share Alike 2.5 Canada License.