Monday, May 16. 2011The new Google Books API and possibilities for librariesOn the subject of the new Google Books API that was unveiled during the Google IO 2011 conference last week, Jonathan Rochkind states: Once you have an API key, it can keep track of # requests for that key — it’s not clear to me if they rate limit you, and if so at what rate. I can answer that. There's no mystery to how many queries per day you're allowed per API key; as the Google API Console shows, the default limit is 1,000 per day. Note: default - this suggests that Google is willing to be flexible on this front. Now, I can imagine the immediate grousing response along the lines of "Good luck getting Google to respond", etc. This is one of the reasons I attended Google IO last year and this year: human contact is much more valuable than email, forum posts, or whinging blogs. I have paid for the conference and all expenses out of my own pocket each year because, as a librarian/developer, there aren't many entities that are more relevant to our overall information landscape than Google at the moment. So, I sat in on the Integrating to eBooks: APIs to Sell and Read eBooks for Affiliates, Retailers and Device Makers session and took advantage of the public Q&A session at the end to ask some questions (skip ahead to 31:39 if you want to hear the questions and answers). The default limit of 1,000 queries per day per API key was a bit of a concern, as one direction that my colleague Art Rhyno has been exploring for the creation of a local federated search solution is the creation of a "bookshelf" in Google Books that represents the entire collection of the University of Windsor. There is no documentation about the limits on the size of this bookshelf, and I was able to get an answer that that is because there currently is no limit. Good news to these ears. Also, I was told that the limit of 1,000 queries per day was just a starting point that could be upped, given a reasonable request. Noting the absence of any sort of loaning feature, I asked what plans (if any) Google Books had to offer users the ability to loan purchased books. I received the expected answer ("We can't talk about future plans") but by being present at the session I was able to ensure that the question was impressed not only on the people responsible for Google Books, but also for all of the other attendees and for subsequent viewers of the online session. Baby steps, eh? Beyond that, I was also able to talk directly with Pratip and Kevin, the speakers at the session, to further describe this particular use case that libraries have for Google Books (enabling full-text search of the bulk of their collection, whether print or electronic) and some of the possible advantages to Google, and despite their session's clear focus on selling books via affiliate links, they appeared to be genuinely open to the possibilities of partnerships with libraries (hey, there is even the possibility of libraries acting as affiliate sellers for Google Books and reaping revenue that way; others have done it with Amazon, so as much as I may find the practice distasteful personally, some places find it acceptable). So, the conversation has begun, as conversations should - person to person - and I'll report back when / if we make further progress. Friday, April 29. 2011Authority support in Evergreen 2.0I'm at the Evergreen 2011 conference in balmy Decatur, Georgia... which wasn't a sure thing yesterday, given that the day started with an eight hour delay at the Sudbury airport due to fog - not to mention having to fly through the storm that spawn a tornado in Alabama. After all that, though, it's great to be back in the same physical space as the vibrant Evergreen community! Yesterday afternoon I gave a presentation on Authorities in Evergreen 2.0, covering (as the title suggests) Evergreen's support for authority records in the 2.0 release (as well as a peek at the future of Evergreen 2.2). The session appeared to be well-received - yay! - and I tried recording it on my colleague Rick Scott's Sansa Clip+. Hopefully that worked out and I'll be able to update this post with the audio, so you can have the full-on audio and slide experience. The presentation is available under the Creative Commons Attribution Share Alike license, in the hopes that others will be able to use it for training purposes, to extend and improve it, and generally help out with the adoption of Evergreen. Sunday, March 13. 2011Evergreen's continuous integration servers - past, present, and futuretldr version: the Evergreen project now has a continuous integration server and build farm and needs testcases to make the best use of that infrastructure to help us provide higher-quality releases in the future. Evergreen buildbot - pastBack in November 2009, Evergreen developer Shawn Boyette launched the Evergreen buildbot - a continuous integration server that ran basic tests with every commit to the OpenSRF and Evergreen repositories and created nightly tarballs of the code. It was a promising start towards a system that would provide us with instant feedback about the state of our code - at least as much as we had tests for it. Unfortunately, the server ran for only a few months before disappearing when Shawn parted ways with Equinox in early 2010. I always thought it was a shame we had lost this piece of the development infrastructure, but Equinox had offered accounts on a server for anyone in the Evergreen community interested in taking on the task of setting up a new continuous integration test server - and through the rest of 2010, nobody stepped up to take on that responsibility. Most of us were busy developing and testing Evergreen 2.0, I suspect. So, in January of 2011, when I had a bit of breathing room, I scoped out the current state of continuous integration frameworks and discovered that the buildbot project (no relation to Shawn's code, other than a serendipitous name) was written in Python and therefore was much more approachable to me than the other leading alternative, Hudson... so I wrote up my findings and a quick proposal. Evergreen buildbot - present
A few days later I had the buildbot running on the server provided by Equinox,
providing reports on the status of the OpenSRF builds on Ubuntu Lucid. After
putting out a call to the community for build servers to provide coverage for
Evergreen on different operating systems, I had enough responses to focus my
mind on improving the Evergreen build. Evergreen now has the same standard
layout for Perl modules that we adopted a year ago for OpenSRF, along with
some basic sanity tests in Perl (such as So, thanks to Equinox for providing the testing server that serves as the mothership for controlling all of the build tests. And many thanks to the University of Prince Edward Island Robertson Library and the Georgia Public Library Service for providing build servers for the build farm. We now have Evergreen test coverage on the Ubuntu Lucid and Debian Squeeze Linux distributions (huzzah) and OpenSRF test coverage on Ubuntu Lucid. If you have an interest in getting test coverage for a different distribution and have a server to spare, please feel free to contact me and we can get your server added to the build farm. Checking build statusYou can check the current state of the code for various OpenSRF and Evergreen branches at any time by visiting the Evergreen buildbot page and choosing one of the menu options. Recent builds provides a simple list of the success or failure of the 20 most recent builds. Waterfall, on the other hand, provides the detailed status of every tested combination of Linux distribution and code branch. Evergreen buildbot - futureWe still have work to do to deliver on the promise of the buildbot. Most important, I think, is that a continuous integration server can only run the tests that it has been given - and we have not given it many tests. It kills me that people discovered some fairly fundamental problems with the Evergreen 2.0 release (some recent examples include most identifier searches not working and limitations with Unicode in patron names). Now that we have a continuous integration server, we need a testing framework so that it becomes easy to add tests along the lines of "Import a set of sample bibliographic records, then run the following sets of searches (ISSN and ISBN with and without hyphens; EAN; UPC...) and ensure that the returned results match these expected results". It should be a human's job to set up that automated test once so that we're forever confident in the future that we're not screwing up those basic features, no matter what we change in our database schema or underlying code. Now, there are very few people that can currently create that sort of a test. There might be none at the moment, in fact, because we need that previously mentioned testing framework to be sorted out and integrated into the buildbot, However, in the short term we can create these testing scenarios so that humans can reproduce them during testing blitzes, until such time as we have the testing framework sorted out and can begin automating these tests. Otherwise, I fear that we'll go into the Evergreen 2.1 alpha/beta/release candidate cycle and get reports from testing that indicate that all is well - but only because some of the more complex tasks haven't actually been attempted - and we'll find ourselves scrambling once again after the release to fix problems that become evident when sites actually start moving to the release. Beyond tests, we need to teach it to create cleanly packaged tarballs on a regular basis - although that should arguably be nothing more than, or not much more than, the equivalent of running make package rather than pushing all kinds of specialized packaging logic into the buildbot itself. Autotools wizards, your assistance would be greatly appreciated. Spreading Evergreen buildbot knowledgeTo ensure that our project can survive the loss of the current master build server (or me, for that matter!), I've been committing a password-sanitized copy of the buildbot configuration to the examples directory of the OpenSRF repository. In addition to reducing the dependency on one person and one server, it also gives anyone else interested in contributing to the Evergreen buildbot the ability to easily define a build master and build slaves in a local environment. Thursday, March 3. 2011Creating a MARC record from scratch in PHP using File_MARCIn the past couple of days, two people have written me email essentially saying: "Dan, this File_MARC library sounds great - but I can't figure out how to create a record from scratch with it! Can you please help me?" Yes, when you're dealing with MARC, you'll quickly get all weepy and get help from anyone you can. So, first things first - there is a really basic example that you can find in the File_MARC tests directory called marc_record_001.phpt. What, you couldn't find that? I'm not surprised, to be honest. Tests are great but when you install PEAR libraries the tests get separated from the code and you might not even know that there _are_ tests to cadge code from. So instead, here's a whack of code that should provide a good starter for you:
<?php
require 'File/MARC.php';
$marc = new File_MARC_Record();
$marc->appendField(new File_MARC_Data_Field('100', array(
new File_MARC_Subfield('a', 'Doe, John'),
), null, null
));
$marc->appendField(new File_MARC_Data_Field('245', array(
new File_MARC_Subfield('a', 'Main title: '),
new File_MARC_Subfield('b', 'subtitle'),
new File_MARC_Subfield('c', 'author')
), null, null
));
print "Yes, we do pretty print\n";
print $marc . "\n";
print "Yes, we write MARC21";
$fh = fopen('marcy.mrc', 'w');
fwrite($fh, $marc->toRaw());
fclose($fh);
print "... written.\n\n";
print "Yes, we write MARCXML\n";
print $marc->toXML() . "\n\n";
print "Yes, we write MARC-in-JSON\n";
print $marc->toJSON() . "\n\n";
print "Yes, we even write the MARC-HASH JSON serialization\n";
print $marc->toJSONHash() . "\n\n";
?>
and here's the not very exciting output...
bash$ $ php marcy.php
Yes, we do pretty print
LDR
100 _aDoe, John
245 _aMain title:
_bsubtitle
_cauthor
Yes, we write MARC21... written.
Yes, we write MARCXML
<?xml version="1.0" encoding="UTF-8"?>
<collection xmlns="http://www.loc.gov/MARC21/slim">
<record>
<leader>00099na 2200049 4500
<datafield tag="100" ind1=" " ind2=" ">
<subfield code="a">Doe, John
</datafield>
<datafield tag="245" ind1=" " ind2=" ">
<subfield code="a">Main title:
<subfield code="b">subtitle
<subfield code="c">author
</datafield>
</record>
</collection>
Yes, we write MARC-in-JSON
{"leader":"00099 2200049 4500","fields":[
{"100":{"ind1":" ","ind2":" ","subfields":[
{"a":"Doe, John"}]}},
{"245":{"ind1":" ","ind2":" ","subfields":[
{"a":"Main title: "},
{"b":"subtitle"},
{"c":"author"}]}}
]}
Yes, we even write the MARC-HASH JSON serialization
{"type":"marc-hash","version":[1,0],"leader":"00099 2200049 4500",
"fields":[
["100"," "," ",[["a","Doe, John"]]],
["245"," "," ",[["a","Main title: "],["b","subtitle"],["c","author"]]]
]}
Hopefully this helps. Have at it!
« previous page
(Page 3 of 35, totaling 137 entries)
» next page
|
QuicksearchAbout MeI'm Dan Scott: barista, library geek, and free-as-in-freedom software developer.
I hack on projects such as the Evergreen
open-source ILS project and PEAR's File_MARC package .
By day I'm the Systems Librarian for Laurentian University. You can reach me by email at dan@coffeecode.net. Identi.ca microblogging
LicenseCategories |
