Tuesday, January 21. 2014
Last week I was putting the finishing touches on the first serious academic paper I have written in a long time, and decided that I wanted to provide backup for some of the assertions I had made. Naturally, the deadline was tight, so getting any articles via interlibrary loan was out of the question. This was going to be a purely electronic, immediate access affair.
So what does a systems librarian with a vast array of licensed materials (the dark web as the info lit people like to say) at his university's disposal do when faced with a research problem like this? Well, turn to Google Scholar, naturally.
As it turns out, I was able to find the majority of what I needed through Google Scholar: this will come as a surprise to no-one who has used it, but it's remarkably good at finding electronic copies of articles and conference proceedings. Sometimes they are preprints on the conference website; sometimes they are copies posted in the institutional repository or on the researcher's own website; sometimes they are what appears to be illicit copies* (you can tell by the watermark) posted on random domains. The more recent the article, the more likely it seemed I was able to find an on-demand copy.
My work is in the intersection of the semantic web and library systems, so it's perhaps not surprising that the library-oriented articles tended to have been published further in the past and were less likely to have a freely available copy available, whereas almost anything of interest on the semantic web side was immediately available. I suspect that not many people were thinking about open access to research in the 90's; still, it made me cringe a little to find familiar names amongst the authors of papers on open source library software that I would have liked to cite, but which were locked behind a paywall that not even my university (with its amazing provincial and federal consortium deals) had licensed access. So, of course, the citations went to papers that were available to me.
Call it anecdotal, call me a lazy researcher, but to me the evolution seems inevitable. If your work is freely available (ideally via a properly legal venue, like publishing in an open access journal, or deposiiting copy of your paper in your institutional repository or on your web site--assuming your publication contract allows it) then you are more likely to get citations; and if that pattern continues and coincides with citation counts as one measure of a researcher's effectiveness, what will the incentive be for keeping your work locked behind a paywall?
*A notable example is the seminal article "The Semantic Web" written by Tim Berners-Lee, James Hendler, and Ora Lassila and published in Scientific American in 2001. The official version of the paper is locked behind Scientific American's paywall at http://www.scientificamerican.com/article/the-semantic-web/ and they serve up interstitial ads between searches on their site(!). The primary electronic version offered up by Google Scholar, on the other hand, is a PDF posted at Google Code. Google Code is hardly a notable scholarly publishing site, but I bet it serves up way more copies than SciAm does.**
**Note that if you dig into all of the available copies of the article, there are hundreds scattered across university course pages and semantic web community sites. (Cue Darth Vader: "The infringement is strong in this one.") I assume SciAm knows that the blowback of trying to enforce copyright measures against infringers with this particular high-profile article would be intense; I'm not sure what lesson we're supposed to derive from that.
Thursday, October 31. 2013
I released File_MARC 1.0.1 yesterday after receiving a bug report from the most excellent Mark Jordan about a basic (but data corrupting) problem that had existed since the very early days (almost seven years ago). If you generate MARC binary output from File_MARC, you should upgrade immediately.
In the MARC binary output code, I was testing a string for the presence of a value--roughly, "if ($value)"--and returning false if no value was present. Which is fine, except when said value was '0', in which case that test returns FALSE. Whoops.
It's one of the oldest gotchas in PHP, and it lived for a very long time in this library. Probably because very few people want to generate MARC binary output. But now, that bug is squashed, and a unit test will ensure that it does not come back.
Thursday, October 17. 2013
I submitted the following proposal to the Library Technology Conference 2014 and thought it might be of general interest.
Structuring library data on the web with schema.org: we're on it!
This work is licensed under a Creative Commons Attribution-Share Alike 2.5 Canada License.