Thursday, May 26. 2011Farewell, old Google Books APIsSince the announcement of the new v1 Google Books API, I've been doing a bit of work with it in Python (following up on my part of the conversation). Today, Google announced that many of their older APIs were now officially deprecated. Included in that list are the Google Books Data API and the Google Books JavaScript API. These APIs will be retired as of December 1, 2011. (Thanks to jgeerdes in the #googleapis IRC channel for the heads up today). There already has been some outrage expressed over the switch to new APIs; five months is not a lot of time to shift gears if you've built a significant architecture on top of the old APIs. But I have some sympathy for Google, in this case; the new "Discovery" APIs are based on a common, consistent architecture that will be easier for them to document, maintain, manage, and ... monetize, of course. (Good time for full disclosure, I suppose: I am a Google stockholder.) So far, the only major concern I have with the new v1 Google Books API is one missing function that was available in the Data API: the ability to do a full-text search of a custom bookshelf. Accordingly, I've filed a bug in the AJAX APIs issue tracker. Here's hoping that the deprecation of the old APIs enables Google to focus on their anointed APIs on all fronts: documentation, features, and support. Bug 587 should be a good testcase. Monday, May 23. 2011Reducing cached content pain after Evergreen upgradesIf you have been through an Evergreen upgrade, you know that the days after the upgrade can be painful. Users complain that the catalogue doesn't work right, there are mysterious glitches that happen on some machines and not others (even though the browser and operating systems are identical on each machine!), rebooting doesn't help... and then eventually the problem goes away. The problem isn't all that mysterious, really, it's the result of the browser caching content. Normally, browser caching is a very positive experience: when a browser requests a file from a Web server, the Web server tells it to how long the browser should hold onto the file via a Cache-control directive. This means that if a page on your Web site is dozens of hundreds of images and CSS and JavaScript files, your browser doesn't have to download every one of those files on every page you visit; as long as the file hasn't expired, the browser can just serve it up from the local cache and only the fresh content needs to be fetched from the server. It's how the Web works, and it's really important for performance reasons. However, if your Web server has told your browser to cache files for a month, and then during that month you upgrade your Web site so that there is new JavaScript and CSS files that your fresh content depends on, then you can run into trouble until those cached files expire. And that is exactly the case that we run into with Evergreen upgrades - only the problem is amplified by how heavily the Evergreen catalogue (which is just a Web site) relies on JavaScript for basic operations. On the user side, you can handle the problem a few ways:
Neither of these user-side approaches is particularly convenient. Doing a hard refresh may work for one page, but as the user navigates to a different page that uses different CSS and JavaScript, they will have to do another hard refresh... and so on, which in the case of Evergreen means users will have to refresh around a half-dozen different pages (home page, search results, record details, account, advanced search). Hard refreshes are also not reliable, as resources fetched by XHR are not actually refreshed (this is a long-standing bug with Chrome and Firefox). If you don't know what XHR is, just know that Evergreen uses a lot of them. And emptying the browser cache is both painful (every browser has a different way of emptying browser cache) and overkill (you just want to discard the cache for one site, but most browsers will discard the cache for every site they have visited). The "right" solution is to have the server tell the browser to fetch a new version of the resource. You could change the caching settings to be very short-lived - for example, change the cache time from one month down to one day for JavaScript and CSS - but unless you upgrade your site very frequently, that would mean that 99% of the time your users' browsers will be making unnecessary requests, and their experience of your catalogue will be that it is slower to load than other sites on the Web. Not so good. The other approach is to change the pathname for the cached resources at upgrade time so that the browser doesn't find a match in its local cache and has to fetch the new version. There's some good news: some work has been going on in the Evergreen 2.1 release to tackle this problem, but it is not yet complete. And most sites are only looking at moving to 2.0 right now. As it happens, we made the jump from 1.6.1.8 to 2.0.6+ yesterday and boy howdy the browser cache was a problem after the upgrade, as one would expect. I took a quick stab at identifying the most likely paths that needed to be refreshed and threw together some shell commands to "munge" our catalogue skins so that browsers would be forced to pick up the new versions of the content. Post-upgrade panic, I refactored those commands into a Perl script named cache-munger.pl (well, more precisely, a Perl script that generates shell commands). The Perl script has two hardcoded variables: a datestamp (which is really any uniquely identifying string that can appear in a directory name and URL) and a list of catalogue skins to munge. When you run the script, it generates a set of shell commands that you should be able to run on your Evergreen 2.0 instance to force browsers to cache the new version of your catalogue's JavaScript and CSS files. Some limitations: I haven't written a script to convert your skins back to pristine mode (that's mostly a matter of updating the ack-grep commands and reversing the sed commands). And I haven't written a script to update a munged set of skins. And, I'm not 100% sure that I've hit every set of JavaScript and CSS that needs to be refreshed after an upgrade from 1.6 to 2.0. But it's a reasonable start, in my opinion, and hopefully it helps inform the Evergreen 2.1 effort so that we can have a standard, supported, painless means of telling browsers to fetch new resources as an automatic part of any upgrade in the future. Monday, May 16. 2011The new Google Books API and possibilities for librariesOn the subject of the new Google Books API that was unveiled during the Google IO 2011 conference last week, Jonathan Rochkind states: Once you have an API key, it can keep track of # requests for that key — it’s not clear to me if they rate limit you, and if so at what rate. I can answer that. There's no mystery to how many queries per day you're allowed per API key; as the Google API Console shows, the default limit is 1,000 per day. Note: default - this suggests that Google is willing to be flexible on this front. Now, I can imagine the immediate grousing response along the lines of "Good luck getting Google to respond", etc. This is one of the reasons I attended Google IO last year and this year: human contact is much more valuable than email, forum posts, or whinging blogs. I have paid for the conference and all expenses out of my own pocket each year because, as a librarian/developer, there aren't many entities that are more relevant to our overall information landscape than Google at the moment. So, I sat in on the Integrating to eBooks: APIs to Sell and Read eBooks for Affiliates, Retailers and Device Makers session and took advantage of the public Q&A session at the end to ask some questions (skip ahead to 31:39 if you want to hear the questions and answers). The default limit of 1,000 queries per day per API key was a bit of a concern, as one direction that my colleague Art Rhyno has been exploring for the creation of a local federated search solution is the creation of a "bookshelf" in Google Books that represents the entire collection of the University of Windsor. There is no documentation about the limits on the size of this bookshelf, and I was able to get an answer that that is because there currently is no limit. Good news to these ears. Also, I was told that the limit of 1,000 queries per day was just a starting point that could be upped, given a reasonable request. Noting the absence of any sort of loaning feature, I asked what plans (if any) Google Books had to offer users the ability to loan purchased books. I received the expected answer ("We can't talk about future plans") but by being present at the session I was able to ensure that the question was impressed not only on the people responsible for Google Books, but also for all of the other attendees and for subsequent viewers of the online session. Baby steps, eh? Beyond that, I was also able to talk directly with Pratip and Kevin, the speakers at the session, to further describe this particular use case that libraries have for Google Books (enabling full-text search of the bulk of their collection, whether print or electronic) and some of the possible advantages to Google, and despite their session's clear focus on selling books via affiliate links, they appeared to be genuinely open to the possibilities of partnerships with libraries (hey, there is even the possibility of libraries acting as affiliate sellers for Google Books and reaping revenue that way; others have done it with Amazon, so as much as I may find the practice distasteful personally, some places find it acceptable). So, the conversation has begun, as conversations should - person to person - and I'll report back when / if we make further progress.
(Page 1 of 1, totaling 3 entries)
|
QuicksearchAbout MeI'm Dan Scott: barista, library geek, and free-as-in-freedom software developer.
I hack on projects such as the Evergreen
open-source ILS project and PEAR's File_MARC package .
By day I'm the Systems Librarian for Laurentian University. You can reach me by email at dan@coffeecode.net. Identi.ca microblogging
LicenseCategories |
