Coffee|Code: Dan Scott's blog - evergreen

Our nginx caching proxy setup for Evergreen

2017-08-24T16:00:00-04:00

A long time ago, I experimented with using nginx as a caching proxy in front of Evergreen but never quite got it to work. Since then, a lot has changed in both nginx and Evergreen, and Bill Erickson figured out how to get nginx to proxy the websockets that Evergreen now needs for its web-based staff client. This spring, as part of my work towards building prototype offline support for the Evergreen catalogue's My Account section, I dug in and started figuring out some of the final pieces that are needed to enable nginx to proxy most of the static content that Apache (with its bloated processes) would otherwise have to serve up, and wrote a configuration generator script for the nginx and Apache pieces. And in July, we went live with the configuration.

This post documents what we currently (as of August 2017) are running on our Evergreen 2.12 server with Ubuntu 16.04. If you have any questions about this or our corresponding Apache configuration, please let me know and I'll attempt to answer them!

/etc/nginx/sites-enabled/evergreen.conf

This is the core configuration for the nginx server:

proxy_cache_path /tmp/nginx_cache levels=1:2 keys_zone=my_cache:10m max_size=1g
                 inactive=60m use_temp_path=off;
proxy_cache_key $scheme$http_host$request_uri;

server {
    listen 80;
    server_name clients.concat.ca;

    include /etc/nginx/concat_ssl.conf;
    include /etc/nginx/osrf_sockets.conf;

    location / {
        proxy_pass https://localhost:7443;

        rewrite ^/?$ /updates/manualupdate.html permanent;

        include /etc/nginx/concat_headers.conf;
    }
}

The proxy_cache_path directive tells nginx where to store the data it is caching, what kind of directory structure it should create (levels), the name of the shared memory zone to use (keys_zone), the maximum size of the disk cache (max_size), how long to retain a cached copy of the file (inactive), and whether to use the value of the proxy_temp_path directive as a parent directory for the cache.
The proxy_cache_key tells nginx to use a combination of the request scheme (typically HTTP or HTTPS), the hostname, and the full request URI (including GET arguments) to store and lookup the cached data. Apache's response tells nginx how long the request should be cached (whether it should expire immediately, or as of #1681095 "Extend browser cache-busting support", cache for a full year for images, JavaScript, and CSS (at least until you run autogen.sh again).
We currently include one server directive per hostname that we support, which is quite repetitive. Looking at this with fresh eyes, we should probably simply use something like server_name *.concat.ca to cover all of our hostnames on our domain with a single directive.
In this block, we only listen to port 80, which seems odd given that we're an HTTPS-only site. Read on!
include /etc/nginx/concat_ssl.conf; keeps all of the TLS-related configuration in one place, including listening to port 443. We'll pry open this file later.
include /etc/nginx/osrf_sockets.conf; keeps all of the OpenSRF websockets translator proxy configuration in one place. We'll also pry open this file later.
The location / block handles the proxying. At first I was nervous and wanted to proxy the actual hostname instead of localhost to ensure we got the right templates, etc, but it turns out the proxy headers guide the request to the right host. So now I'm relaxed and we simply pass the request on to https://localhost:443. Be very careful with those trailing slashes!

/etc/nginx/concat_ssl.conf

listen 443 ssl http2;
ssl_certificate /etc/apache2/ssl/server.crt;
ssl_certificate_key /etc/apache2/ssl/server.key;

if ($scheme != "https") {
    return 301 https://$host$request_uri;
}

# generate with openssl dhparam -out dhparams.pem 2048
ssl_dhparam /etc/apache2/dhparams.pem;

# From https://mozilla.github.io/server-side-tls/ssl-config-generator/
ssl_prefer_server_ciphers on;
ssl_session_timeout 1d;
ssl_session_cache shared:SSL:50m;
ssl_session_tickets off;

# intermediate configuration. tweak to your needs.
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
ssl_ciphers 'ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA:ECDHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-RSA-AES256-SHA256:DHE-RSA-AES256-SHA:ECDHE-ECDSA-DES-CBC3-SHA:ECDHE-RSA-DES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:DES-CBC3-SHA:!DSS';

# HSTS (ngx_http_headers_module is required) (15768000 seconds = 6 months)
add_header Strict-Transport-Security max-age=15768000;

# OCSP Stapling ---
# fetch OCSP records from URL in ssl_certificate and cache them
ssl_stapling on;
ssl_stapling_verify on;

There's a fair bit going on here, but it's almost entirely related to TLS support and a lot of the content comes either from the Mozilla TLS configuration generator or from Certbot's configuration plugin for nginx. Perhaps most interesting is the listen 443 ssl http2; line that enables listening on the standard HTTPS port and also supports HTTP/2 for browsers that support it--effectively a way to use a single connection from a browser to a server to issue many parallel requests for resources, amongst other performance enhancements.

We also force any HTTP request to use an HTTPS connection using the if ($scheme != "https") { block.

/etc/nginx/concat_headers.conf

This is extracted from the sample nginx configuration shipped with OpenSRF:

location /osrf-websocket-translator {
    proxy_pass https://localhost:7682;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

    # Needed for websockets proxying.
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";

    # Raise the default nginx proxy timeout values to an arbitrarily
    # high value so that we can leverage osrf-websocket-translator's
    # timeout settings.
    proxy_connect_timeout 5m;
    proxy_send_timeout 1h;
    proxy_read_timeout 1h;
}

/etc/nginx/concat_headers.conf

This is not perfectly named; while we do set up the proxy headers in this file, we also include some of the other statements we would otherwise have to repeat inside the server block. Here's what the contents look like:

proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;

proxy_cache my_cache;
proxy_cache_use_stale error timeout http_500 http_502 http_503 http_504;
proxy_cache_lock on;

rewrite ^/?$ /eg/opac/home permanent;

The proxy_set_header directive adds headers to the requests forwarded to Apache, so that Apache can figure out which host was actually requested, accurately log requests (instead of saying everything is coming from localhost), etc. These directives were copied directly from the sample nginx configuration shipped with OpenSRF.
proxy_cache tells this server to use the cache we previously named in our keys_zone parameter.
proxy_cache_use_stale tells this server to return stale data (if it has a cached copy) if Apache returns an error or a timeout or any of the specified HTTP status codes while trying to fetch a fresh copy.
proxy_cache_lock tells this server to, should multiple identical requests for data that needs to be cached or refreshed arrive, only allow a single request to be passed through to Apache and have the other requests wait. This can be one way to avoid the "someone set a book down on a keyboard and caused 100 identical requests in one second" problem.
The rewrite simply directs the request for a bare hostname (with or without a trailing slash) to the catalogue home page.

Enriching catalogue pages in Evergreen with Wikidata

2017-08-12T16:00:00-04:00

I'm part of the Music in Canada @ 150 Wikimedia project, organizing wiki edit-a-thons across Canada to help improve the presence of Canadian music and musicians in projects like Wikpedia, Wikidata, and Wikimedia Commons. It's going to be awesome, and it's why I invested time in developing and delivering the Wikidata for Librarians presentation at the CAML preconference.

Right now I'm at the Wikimania 2017 conference, because it is being held in Montréal--just down the road from me when you consider it is an international affair. The first two days were almost entirely devoted to a massive hackathon consisting of hundreds of participants with a very welcoming, friendly ambiance. It was inspiring, and I participated in several activities:

installing Wikibase--the technical foundation for Wikidata--from scratch
an ad-hoc data modelling session with Jan and Stacy Allison-Cassin that resulted in enhancing the periodicals structure on Wikidata

But I also had the itch to revisit and enhance the JavaScript widget that runs in our Evergreen catalogue which delivers on-demand cards of additional metadata about contributors to recorded works. I had originally developed the widget as a proof-of-concept for the potential value to cultural institutions of contributing data to Wikidata--bearing in mind a challenge put to the room at an Evergreen 2017 conference session that asked what tangible value linked open data offers--but it was quite limited:

it would only show a card for the first listed contributor to the work
it was hastily coded, and thus duplicated code, used shortcuts, and had no comments
the user interface was poorly designed
it was not explicitly licensed for reuse

So I spent some of my hackathon time (and some extra time stolen from various sessions) fixing those problems--so now, when you look at the catalogue record for a musical recording by the most excellent Canadian band Rush, you will find that each of the contributors to the album has a musical note (♩) which, when clicked, displays a card based on the data returned from Wikidata using a SPARQL query matching the contributor's name (limited in scope to bands and musicians to avoid too many ambiguous results).

I'm not done yet: the design is still very basic, but I'm happier about the code quality and it now supports queries for all of the contributors to a given album. It is also licensed for reuse under the GPL version 2 or later license, so as long as you can load the script in your catalogue and tweak a few CSS query selector statements to identify where the script should find contributor names and where it should place the cards, it should theoretically be usable in any catalogue of musical recordings. And with the clear "Edit on Wikidata" link, I hope that it encourages users to jump in and contribute if they find one of their favourite performers lacks (or shows incorrect!) information.

You can find the code on the Evergreen contributor git repository.

Evergreen as a Progressive Web App?

2017-04-14T00:35:00-04:00

Progressive Web Apps are pretty cool, and for good reason: the idea is to take advantage of the advanced features of our web browsers to provide capabilities that rival native apps, while still offering good performance and functionality to users of other browsers.

However, if you've done much reading about PWAs, you could be forgiven for thinking they require a client-side JavaScript framework like React or Angular to be possible. So last week at the 2017 Evergreen International Conference, I demonstrated that it is possible to graft PWA attributes onto Evergreen's classic Perl-based Template Toolkit web architecture--to the point of scoring 100/100 on Google's Lighthouse web site audit tool (from a baseline of 37/100).

You might enjoy my presentation, We aim to misbehave - Evergreen: Progressive Web App (yes, that's a Firefly reference), or you might enjoy poking around the code I posted in the corresponding branch. Check out the new pwa examples directory for a README and the core examples.

It's far from perfect at this point, but as a proof of concept, I'm quite pleased, and I think it offers a possible vision of the way forward, particularly for the My Account section of the public catalogue, which really deserves to become its own app. If nothing else, it has refocused attention on enhancing Evergreen's web performance, and that can only be a good thing, right?

Querying Evergreen from Google Sheets with custom functions via Apps Script

2016-04-15T18:36:00-04:00

Our staff were recently asked to check thousands of ISBNs to find out if we already have the corresponding books in our catalogue. They in turn asked me if I could run a script that would check it for them. It makes me happy to work with people who believe in better living through automation (and saving their time to focus on tasks that only humans can really achieve).

Rather than taking the approach that I normally would, which would be to just load the ISBNs into a table in our Evergreen database and then run some queries to take care of the task as a one-off, I opted to try for an approach that would enable others to run these sort of adhoc reports themselves. As with most libraries, I suspect, we work with spreadsheets a lot--and as our university has adopted Google Apps for Education, we are slowly using Google Sheets more to enable collaboration. So I was interested in figuring out how to build a custom function that would look for the ISBN and then return a simple "Yes" or "No" value according to what it finds.

Evergreen has a robust SRU interface, which makes it easy to run complex queries and get predictable output back, and it normalizes ISBNs in the index so that a search for an 10-digit ISBN will return results for the corresponding 13-digit ISBN. That made figuring out the lookup part of the job easy; after that, I just needed to figure out how to create a custom function in Google Sheets.

As it turns out, there's a dead-simple introductory tutorial for creating a custom function in Apps Script which tells you how to create a new function. And to make a call to a web service, there's the URLFetchApp class. After that, it's a matter of basic JavaScript. In the end, my custom function looks like the following:

/**
* A custom function that checks for an ISBN in Evergreen
*
* Returns "Yes" if there is a match, or "No" if there is no match
*/

function checkForISBN(isbn) {
  var hostname = 'https://example.org';
  var urlBase = hostname + '/opac/extras/sru';

  /* Supply a numeric or shortname library identifier
  * to restrict the search to that part of the organization
  */
  var libraryID = '103';
  if (libraryID) {
    urlBase += '/' + libraryID;
  }
  urlBase += '?version=1.1&operation=searchRetrieve&maximumRecords=1&query=';
  var q = encodeURIComponent('identifier|isbn:' + isbn);
  var url = urlBase + q;
  var response = UrlFetchApp.fetch(url);
  if (response.getContentText().search('1') > -1) {
    return "Yes";
  }
  return "No";
}

Then I just add a column beside the column with ISBN values and invoke the function as (for example) =CheckForISBN(C2).

Given a bit more time, it would be easy to tweak the function to make it more robust, offer variant search types, and contribute it as a module to the Chrome Web Store "Sheet Add-ons" section, but for now I thought you might be interested in it.

*Caveats*: With thousands of ISBNs to check, occasionally you'll get an HTTP response error ("#ERROR") in the column. You can just paste the formula back in again and it will resubmit the query. The sheet also seems to resubmit the request on a periodic basis, so some of your "Yes" or "No" values might change to "#ERROR" as a result.

Querying Evergreen from Google Sheets with custom functions via Apps Script

2016-04-15T18:36:00-04:00

/**
* A custom function that checks for an ISBN in Evergreen
*
* Returns "Yes" if there is a match, or "No" if there is no match
*/

function checkForISBN(isbn) {
  var hostname = 'https://example.org';
  var urlBase = hostname + '/opac/extras/sru';

  /* Supply a numeric or shortname library identifier
  * to restrict the search to that part of the organization
  */
  var libraryID = '103';
  if (libraryID) {
    urlBase += '/' + libraryID;
  }
  urlBase += '?version=1.1&operation=searchRetrieve&maximumRecords=1&query=';
  var q = encodeURIComponent('identifier|isbn:' + isbn);
  var url = urlBase + q;
  var response = UrlFetchApp.fetch(url);
  if (response.getContentText().search('1') > -1) {
    return "Yes";
  }
  return "No";
}

Then I just add a column beside the column with ISBN values and invoke the function as (for example) =CheckForISBN(C2).

Library catalogues and HTTP status codes

2014-12-29T16:50:00-05:00

I noticed in Google's Webmaster Tools that our catalogue had been returning some Soft 404s. Curious, I checked into some of the URIs suffering from this condition, and realized that Evergreen returns an HTTP status code of 200 OK when it serves up a record details page for a record that has been deleted. The HTML itself has a nice big red alert box warning users that the record has been deleted to help humans realize that what was once there is no longer, but machines typically don't read English. However, at some point in the past few months, Google started parsing the HTML and recognizing when HTTP status codes are misleading.

That led me to wonder what happens when you request a record detail page by ID for a record that doesn't exist in Evergreen. As it turns out, it currently returns HTTP status code 200 with a detail page devoid of any details. Also not good! Being a good little Evergreen community member, I opened a bug and put together a fairly simple fix so that the catalogue will return a 404 Not Found for non-existent records and 410 Gone for deleted records. Huzzah for HTTP standards compliance. We build a better web one small step at a time.

That, in turn, led me to wonder what happens when you request record details for non-existent records in other library systems. Here's what I found:

Bibliocommons: Status 302 Moved temporarily that then leads back to an empty search form. Not good.
Blacklight: Status 404 Not Found. Good!
Encore: N/A - appears to send up session based URLs for records. Really?
III: Status 200 OK. Not good.
Koha: Status 302 Found with a Location: header leading to a page with a status 404 Not Found. That redirect probably makes it harder for the machines to recognize that the resource does not at all exist than if it directly returned a 404.
Polaris: N/A - it seems that the normal web interface doesn't link directly to titles; instead it serves up titles in the context of search results by position. The mobile web interface offers persistent URLs, but requests for non-existent records return a status 302 Found that redirects back to an empty search form. Not good.
Primo (using a permalink): Status 302 Found that then leads to an empty record details page with a status 200 OK. Not good.
Symphony: N/A - I tried a few systems (Houston Public Library, Oxnard Public Library) and it seems SirsiDynix still doesn't use persistent URLs, nor surface permalinks for records in the default interface.
Voyager: Status 200 OK. Not good.
Vufind: Status 404 Not Found. Good!
WorldCat: Status 200 OK. Not good.

Overall, this is a pretty dismal picture of the state of some of the most commonly used library catalogue systems when it comes to compliance with basic web standards. Kudos to Blacklight and Vufind for getting it right--and assuming that my branch gets integrated, Evergreen should join them in the near future.

Library catalogues and HTTP status codes

2014-12-29T16:50:00-05:00

That, in turn, led me to wonder what happens when you request record details for non-existent records in other library systems. Here's what I found:

Bibliocommons: Status 302 Moved temporarily that then leads back to an empty search form. Not good.
Blacklight: Status 404 Not Found. Good!
Encore: N/A - appears to send up session based URLs for records. Really?
III: Status 200 OK. Not good.
Koha: Status 302 Found with a Location: header leading to a page with a status 404 Not Found. That redirect probably makes it harder for the machines to recognize that the resource does not at all exist than if it directly returned a 404.
Polaris: N/A - it seems that the normal web interface doesn't link directly to titles; instead it serves up titles in the context of search results by position. The mobile web interface offers persistent URLs, but requests for non-existent records return a status 302 Found that redirects back to an empty search form. Not good.
Primo (using a permalink): Status 302 Found that then leads to an empty record details page with a status 200 OK. Not good.
Symphony: N/A - I tried a few systems (Houston Public Library, Oxnard Public Library) and it seems SirsiDynix still doesn't use persistent URLs, nor surface permalinks for records in the default interface.
Voyager: Status 200 OK. Not good.
Vufind: Status 404 Not Found. Good!
WorldCat: Status 200 OK. Not good.

Putting the "Web" back into Semantic Web in Libraries 2014

2014-12-04T21:15:00-05:00

I was honoured to lead a workshop and speak at this year's edition of Semantic Web in Bibliotheken (SWIB) in Bonn, Germany. It was an amazing experience; there were so many rich projects being described with obvious dividends for the users of libraries, once again the European library community fills me with hope for the future success of the semantic web.

The subject of my talk "Cataloguing for the open web with RDFa and schema.org" (slides and video recording - gulp) pivoted while I was preparing materials for the workshop. I was searching library catalogues around Bonn looking for a catalogue with persistent URIs that I could use for an example. To my surprise, catalogue after catalogue used session-based URLs; it took me quite some time before I was able to find ULB, who had hosted a VuFind front end for their catalogue. Even then, the robots.txt restricted crawling by any user agent. This reminded me rather depressingly of my findings from current "discovery layers", which entirely restrict crawling and therefore put libraries into a black hole on the web.

These findings in the wild are so antithetical to the basic principles of enabling discovery of web resources that, in a conference about the semantic web, I opted to spend over half of my talk making the argument that libraries need to pay attention to the old-fashioned web of documents first and foremost.

The basic building blocks that I advocated were, in priority order:

Persistent URIs, on which everything else is built
Sitemaps, to facilitate discovery of your resources
A robots.txt file to filter portions of your website that should not be crawled (for example, search results pages)
RDFa, microdata, or JSON-LD only after you've sorted out the first three

Only after setting that foundation did I feel comfortable launching into my rationale for RDFa and schema.org as a tool for enabling discovery on the web: a mapping of the access points that cataloguers create to the world of HTML and aggregators. The key point for SWIB was that RDFa and schema.org can enable full RDF expressions in HTML; that is, we can, should, and must go beyond surfacing structured data to surfacing linked data through @resource attributes and schema:sameAs properties.

The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation. Tim Berners-Lee, Scientific American, 2001

I also argued that using RDFa to enrich the document web was, in fact, truer to Berners-Lee's 2001 definition of the semantic web, and that we should focus on enriching the document web so that both humans and machines can benefit before investing in building an entirely separate and disconnected semantic web.

I was worried that my talk would not be well received; that it would be considered obvious, or scolding, or just plain off-topic. But to my relief I received a great deal of positive feedback. And on the next day, both Eric Miller and Richard Wallis gave talks on a similar, but more refined, theme: that libraries need to do a much, much better job of enabling their resources to be found on the web--not by people who already use our catalogues, but by people who are not library users today.

There were also some requests for clarification, which I'll try to address generally here (for the benefit of anyone who wasn't able to talk with me, or who might watch the livestream in the future).

"When you said anything could be described in schema.org, did you mean we should throw out MARC and BIBFRAME and EAD?"

tldr: I intended and, not instead of!

The first question I was asked was whether there was anything that I had not been able to describe in schema.org, to which I answered "No"--especially since the work that the W3C SchemaBibEx group had done to ensure that some of the core bibliographic requirements were added to the vocabulary. It was not as coherent or full a response as I would have liked to have made; I blame the livestream camera

But combined with a part of the presentation where I countered a myth about schema.org being a very coarse vocabulary by pointing out that it actually contained 600 classes and over 800 properties, a number of the attendees interpreted one of the takeaways of my talk as suggesting that libraries should adopt schema.org as the descriptive vocabulary, and that MARC, BIBFRAME, EAD, RAD, RDA, and other approaches for describing library resources were no longer necessary.

This is not at all what I'm advocating! To expand on my response, you can describe anything in schema.org, but you might lose significant amounts of richness in your description. For example, short stories and poems would best be described in schema.org as a CreativeWork. You would have to look at the associated description or keyword properties to be able to figure out the form of the work.

What I was advocating was that you should map your rich bibliographic description into corresponding schema.org classes and properties in RDFa at the time you generate the HTML representation of that resource and its associated entities. So your poem might be represented as a href="http://schema.org/CreativeWork">CreativeWork, with a name, author, description, keywords, and about values and relationships. Ideally, the author will include at least one link (either via sameAs, url, or @resource) to an entity on the web; and you could do the same with about if you are using a controlled vocabulary.

If you take that approach, then you can serve up schema.org descriptions of works in HTML that most web-oriented clients will understand (such as search engines) and provide basic access points such as name / author / keywords, while retaining and maintaining the full richness of the underlying bibliographic description--and potentially providing access to that, too, as part of the embedded RDFa, via content negotiation, or <link rel="">, for clients that can interpret richer formats.

"What makes you think Google will want to surface library holdings in search results?"

There is a perception that Google and other search engines just want to sell ads, or their own products (such as Google Books). While Google certainly does want to sell ads and products, they also want to be the most useful tool for satisfying users' information needs--possibly so they can learn more about those users and put more effective ads in front of them--but nonetheless, the motivation is there.

Imagine marking up your resources with the Product / Offer portion of schema.org you are able to provide search engines with availability information in the same way that Best Buy, AbeBooks, and other online retailers do (as Evergreen, Koha, and VuFind already do). That makes it much easier for the search engines to use everything they may know about their users, such as their current location, their institutional affiliations, their typical commuting patterns, their reading and research preferences... to provide a link to a library's electronic or print copy of a given resource in a knowledge graph box as one of the possible ways of satisfying that person's information needs.

We don't see it happening with libraries running Evergreen, Koha, and VuFind yet, realistically because the open source library systems don't have enough penetration to make it worth a search engine's effort to add that to their set of possible sources. However, if we as an industry make a concerted effort to implement this as a standard part of crawlable catalogue or discovery record detail pages, then it wouldn't surprise me in the least to see such suggestions start to appear. The best proof that we have that Google, at least, is interested in supporting discovery of library resources is the continued investment in Google Scholar.

And as I argued during my talk, even if the search engines never add direct links to library resources from search results or knowledge graph sidebars, having a reasonably simple standard like the GoodRelations product / offer pattern for resource availability enables new web-based approaches for building appplications. One example could be a fulfillment system that uses sitemaps to intelligently crawl all of its participating libraries, normalizes the item request to a work URI, and checks availability by parsing the offers at the corresponding URIs.

Putting the "Web" back into Semantic Web in Libraries 2014

2014-12-04T21:15:00-05:00

The basic building blocks that I advocated were, in priority order:

Persistent URIs, on which everything else is built
Sitemaps, to facilitate discovery of your resources
A robots.txt file to filter portions of your website that should not be crawled (for example, search results pages)
RDFa, microdata, or JSON-LD only after you've sorted out the first three

The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation. Tim Berners-Lee, Scientific American, 2001

There were also some requests for clarification, which I'll try to address generally here (for the benefit of anyone who wasn't able to talk with me, or who might watch the livestream in the future).

"When you said anything could be described in schema.org, did you mean we should throw out MARC and BIBFRAME and EAD?"

tldr: I intended and, not instead of!

"What makes you think Google will want to surface library holdings in search results?"

How discovery layers have closed off access to library resources, and other tales of schema.org from LITA Forum 2014

2014-11-08T16:41:00-05:00

At the LITA Forum yesterday, I accused (presentation) most discovery layers of not solving the discoverability problems of libraries, but instead exacerbating them by launching us headlong to a closed, unlinkable world. Coincidentally, Lorcan Dempsey's opening keynote contained a subtle criticism of discovery layers. I wasn't that subtle.

Here's why I believe commercial discovery layers are not "of the web": check out their robots.txt files. If you're not familiar with robots.txt files, these are what search engines and other well-behaved automated crawlers of web resources use to determine whether they are allowed to visit and index the content of pages on a site. Here's what the robots.txt files look like for a few of the best-known discovery layers:

User-Agent: *
Disallow /

That effectively says "Go away, machines; your kind isn't wanted in these parts." And that, in turn, closes off access to your libraries resources to search engines and other aggregators of content, and is completely counter to the overarching desire to evolve to a linked open data world.

During the question period, Marshall Breeding challenged my assertion as being unfair to what are meant to be merely indexes of library content. I responded that most libraries have replaced their catalogues with discovery layers, closing off open access to what have traditionally been their core resources, and he rather quickly acquiesced that that was indeed a problem.

(By the way, a possible solution might be to simply offer two different URL patterns, something like /library/* for library-owned resources to which access should be granted, and /licensed/* for resources to which open access to the metadata is problematic due to licensing issues, and which robots can therefore be restricted from accessing.)

Compared to commercial discovery layers on my very handwavy usability vs. discoverability plot, general search engines rank pretty high on both axes; they're the ready-at-hand tool in browser address bars. And they grok schema.org, so if we can improve our discoverability by publishing schema.org data, maybe we get a discoverability win for our users.

But even if we don't (SEO is a black art at best, and maybe the general search engines won't find the right mix of signals that makes them decide to boost the relevancy of our resources for specific users in specific locations at specific times) we get access to that structured data across systems in an extremely reusable way. With sitemaps, we can build our own specialized search engines (Solr or ElasticSearch or Google Custom Search Engine or whatever) that represent specific use cases. Our more sophisticated users can piece together data to, for example, build dynamic lists of collections, using a common, well-documented vocabulary and tools rather than having to dip into the arcane world of library standards (Z39.50 and MARC21).

So why not iterate our way towards the linked open data future by building on what we already have now?

As Karen Coyle wrote in a much more elegant fashion, the transition looks roughly like:

Stored data -> transform/template -> human readable HTML page
Stored data -> transform/template (tweaked) -> machine & human readable HTML page

That is, by simply tweaking the same mechanism you already use to generate a human readable HTML page from the data you have stored in a database or flat files or what have you, you can embed machine readable structured data as well.

That is, in fact, exactly the approach I took with Evergreen, VuFind, and Koha. And they now expose structured data and generate sitemaps out of the box using the same old MARC21 data. Evergreen even exposes information about libraries (locations, contact information, hours of operation) so that you can connect its holdings to specific locations.

And what about all of our resources outside of the catalogue? Research guides, fonds descriptions, institutional repositories, publications... I've been lucky enough to be working with Camilla McKay and Karen Coyle on applying the same process to the Bryn Mawr Classical Review. At this stage, we're exposing basic entities (Reviews and People) largely as literals, but we're laying the groundwork for future iterations where we link them up to external entities. And all of this is built on a Tcl + SGML infrastructure.

So why schema.org? It has the advantage of being a de-facto generalized vocabulary that can be understood and parsed across many different domains, from car dealerships to streaming audio services to libraries, and it can be relatively simply embedded into existing HTML as long as you can modify the templating layer of your system.

And schema.org offers much more than just static structured data; schema.org Actions are surfacing in applications like Gmail as a way of providing directly actionable links--and there's no reason we shouldn't embrace that approach to expose "SearchAction", "ReadAction", "WatchAction", "ListenAction", "ViewAction"--and "OrderAction" (Request), "BorrowAction" (Borrow or Renew), "Place on Reserve", and other common actions as a standardized API that exists well beyond libraries (see Hydra for a developing approach to this problem).

I want to thank Richard Wallis for inviting me to co-present with him; it was a great experience, and I really enjoy meeting and sharing with others who are putting linked data theory into practice.

How discovery layers have closed off access to library resources, and other tales of schema.org from LITA Forum 2014

2014-11-08T16:41:00-05:00

User-Agent: *
Disallow /

So why not iterate our way towards the linked open data future by building on what we already have now?

As Karen Coyle wrote in a much more elegant fashion, the transition looks roughly like:

Stored data -> transform/template -> human readable HTML page
Stored data -> transform/template (tweaked) -> machine & human readable HTML page

I want to thank Richard Wallis for inviting me to co-present with him; it was a great experience, and I really enjoy meeting and sharing with others who are putting linked data theory into practice.

DCMI 2014: schema.org holdings in open source library systems

2014-10-14T01:07:00-04:00

My slides from DCMI 2014: schema.org in the wild: open source libraries++.

Last week I was at the Dublin Core Metadata Initiative 2014 conference, where Richard Wallis, Charles MacCathie Nevile and I were slated to present on schema.org and the work of the W3C Schema.org Bibliographic Extension Community Group (#schemabibex). As a first-timer at DCMI, I wasn't sure what kind of an audience to expect: there is a peer-reviewed papers track, and a series of sessions on a truly intimidating topic (RDF Application Profiles), but on the other hand our own topic was fairly basic. As it turned out, there was an invigoratingly mixed set of backgrounds present, and Eric Miller's opening keynote, which gave an oral history of the origins of DCMI and a look towards the future challenges for the organization, reassured me that I wasn't going to be out of my depth.

Special kudos to Eric for his analogy of the Web to a credit card, which offers both human-readable and machine-readable data. A nice, clean image!

Richard, Charles and I opted to structure our 1.5 hour session as a series of short talks followed by a long period of discussion. However, as often happens, the excitement of speaking in front of a room that drew so many attendees that we had to jam with more chairs led to that plan breaking down. I cut my own materials back to illustrating how one of my primary contributions to the #schemabibex effort--representing library holdings using schema.org's GoodRelations-based Product/Offer model--had been implemented in free software library systems, including Evergreen, Koha, and VuFind. I walked from a basic bibliographic record (represented as a Product), through to the associated borrowable items (represented as Offers with a price of $0.00, call numbers as SKUs, and barcodes as serialNumbers), that were offered by a specific Library with its own set of operating hours, address, and contact information... all published out of the box as RDFa in modern Evergreen systems.

I did stray a little to posit that the use case for schema.org is not and should not be limited to "search engine optimization", but that this very simple level of structured data could fairly easily form the basis of an API. In the rather limited discussion that we were able to hold at the end of the session (and encroaching on break time), Charles counselled that libraries shouldn't really bother with dumbing down their beautiful metadata simply to publish schema.org... while I countered that the pursuit of publishing beautiful metadata in the past has generally led librarians to publish no metadata at all, and that schema.org was a great first step towards building a web of cultural heritage metadata meant for machine consumption.

I wish I could have stayed longer at DCMI, but it was Thanksgiving in Canada and there were families to visit and feast with--not to mention children to help take car of--so I had to depart after just a day and a half. I'm encouraged by the steps the organization is taking to renew itself, and I hope to be able to participate again in the future.

Cataloguing for the open web: schema.org in library catalogues and websites

2014-07-01T20:00:00-04:00

tldr; my slides are href="http://stuff.coffeecode.net/2014/understanding_schema">here, and the slides from Jenn and Jason are also available from href="http://connect.ala.org/node/222959">ALA Connect.

On Sunday, June 29th Jenn Riley, Jason Clark, and I presented at the ALCTS/LITA jointly sponsored session href="http://ala14.ala.org/node/14382">Understanding schema.org. The build-up to the session was pretty amazing; I was delighted to learn that Jason and I had been working on pretty much parallel efforts over the past couple of years. Jenn did a great job of organizing the session, and by the time we started talking 276 people had indicated their interest in attending: that was two more than those who had indicated an interest in attending the BIBFRAME Forum Update scheduled in the same time slot. Our room was large and quite full.

Jenn started the session out string by advancing her concept that libraries need to target discovery elsewhere: that is, that there is no way that libraries can compete directly with major search engines like Google, Bing, and Yahoo, either through the discovery tools that we have to offer, our presence in the consciousness of most of the population as the starting point for discovery, or in the resources we can direct towards closing the huge gap in technology, usability, and mindshare that the search engines have opened up over the past two decades. But, we can take steps to start working with the search engines to enable our resources to be discovered and accessed more directly by them.

That led quite naturally to my own part of the session, in which I talked about my attempt to turn cataloguing's efforts to provide access points in our niche catalogues into access points for the open web by publishing schema.org structured data from library catalogues like Evergreen, Koha, and VuFind. I started things out by pointing out the legacy of restrictive robots.txt files that still live on in many catalogues today, then worked through some basics like how sitemaps enable search engines--which strive to provide relevant, useful results that matter to users in their context at a particular place and time--to efficiently crawl just the most recently changed pages of interest. Then I launched into the heart of the talk that showed how catalogues that publish schema.org structured data can turn an undifferentiated mass of presentation-oriented HTML and words into machine-comprehensible entities: classes like Book and Organization, connected by properties like publisher, and with values for properties like author, datePublished, and isbn.

For this talk I used visualizations generated by the href="http://rdfa.info/play">RDFa playground to illustrate the structured data contained in some real examples of a production Evergreen system (thanks to Bibliomation). Given that I'm normally a text-and-talk kind of guy, the illustrations seemed to help out--particularly in showing how holdings map quite readily to the Product / Offer structure more commonly used by commercial enterprises to reflect the prices, locations, and availability of their products.

Of course, the evolution from unstructured, to structured, to linked data had its payoff beginning with the link from holdings to the libraries that hold the resources. We have plenty more we can and must do, but unlike other efforts which are still crystallizing and which will require significant architectural work to happen before libraries can even begin trying out real systems, you can use schema.org-enabled systems today. And adapting systems to publish schema.org structured data only requires access to the HTML templates for your system (which, hopefully, you have: otherwise you have bigger problems to deal with!) and following the patterns that have already been established by Evergreen, Koha, and VuFind.

Jason did a great job showing both a broader use case for schema.org, including work he has led on digital collections such as embedding the Recipe type in a book of recipes. And he covered some of the evolution of the vocabulary, including the exciting possibilities introduced by the Action type and potentialAction property for describing RESTful APIs... which naturally led to an off-the-top-of-the-head enumeration of such actions as BorrowAction and LendAction that are perfect for libraries.

Perhaps the best part of the session, however, were the insightful questions from the audience (along with the genuinely enthusiastic response to our talks). We had deliberately left 15 minutes for questions, and we were not disappointed: from questions about how we move from structured data to more linked data (I riffed on the Dodds/Davis href="http://patterns.dataincubator.org/book/progressive-enrichment.html">Progressive Enrichment linked data pattern, suggesting that we should be able to href="/archives/278-Broadening-support-for-linked-data-in-MARC.html">store links for each field or value of interest directly in our MARC records), to questions about what proprietary systems are doing this with schema.org today (alas, none that I'm aware of, unless something has changed since href="/archives/282-Were-not-waiting-for-the-ILS-to-change.html">February).

Cataloguing for the open web: schema.org in library catalogues and websites

2014-07-01T20:00:00-04:00

Linked data interest panel, part 1

2014-06-28T16:14:00-04:00

Good talk by Richard Wallis this morning at the ALA Annual Conference on publishing entities on the web. Many of his points map extremely closely to what I've been saying and will be saying tomorrow during my own session (albeit with ten fewer minutes).

I was particularly heartened to hear him talk about the great potential for disintermediation of discovery of library resources, from aggregation by national and global providers like OCLC to directly crawling a library's own data and providing links directly to the library resources. This was one of the conclusions of the paper I published earlier this year.

I would have liked to have heard some mention of Evergreen, Koha, VuFind and other open source systems that are already publishing schema.org linked data, either in the context of SchemaBibEx where they served as reference implementations and proofs of concept, or in the context of system procurement. But you can't win them all!

Linked data interest panel, part 1

2014-06-28T16:14:00-04:00

RDFa introduction and codelabs for libraries

2014-06-27T15:06:00-04:00

My RDFa introduction and codelab materials for the ALA 2014 preconference on Practical linked data with open source are now online!

And now I've finished leading the RDFa + schema.org codelab that I've been stressing over and refining for about a month at the American Library Association annual conference Practical linked data with open source preconference. Long story short, most people got about as far as I expected (part-way through the first exercise), but they all got through the initial hurdles and learned enough to keep learning on their own. My hopes are that this leads to:

the implementation of structured or even linked data in existing systems, for those that at least have systems that give them the ability to edit their HTML templates
the addition of linked data to library web pages the next time they get refreshed or redesigned (it happens pretty often!)
some patterns of implementation, so that we hopefully arrive at a relatively standard way of marking up the same metadata (given the many alternatives that we have just within schema.org for something like a publisher)
when tweaking templates for display or design purposes, to avoid mangling existing structured data that a system like Evergreen, Koha, or VuFind publishes by default
more awesomeness in the world of library metadata!

Oh, and for posterity, I temporarily marked up this page to link to our pizza order form as a really lame short URL service, and as I did that impishly polluted the schema.org vocabulary with the new type PizzaOrderPreferences. I don't think that's going to make it into the official vocab though! The code was:

 <p vocab="http://schema.org/" typeof="PizzaOrderPreferences">  And <a href="http://doodle.com/exampleblahblah" property="url">order pizza here</a>.</p><p>If our pizza order doesn't get gamed, that just shows how few people visit my blog!
</p>

RDFa introduction and codelabs for libraries

2014-06-27T15:06:00-04:00

My RDFa introduction and codelab materials for the ALA 2014 preconference on Practical linked data with open source are now online!

the implementation of structured or even linked data in existing systems, for those that at least have systems that give them the ability to edit their HTML templates
the addition of linked data to library web pages the next time they get refreshed or redesigned (it happens pretty often!)
some patterns of implementation, so that we hopefully arrive at a relatively standard way of marking up the same metadata (given the many alternatives that we have just within schema.org for something like a publisher)
when tweaking templates for display or design purposes, to avoid mangling existing structured data that a system like Evergreen, Koha, or VuFind publishes by default
more awesomeness in the world of library metadata!

 <p vocab="http://schema.org/" typeof="PizzaOrderPreferences">  And <a href="http://doodle.com/exampleblahblah" property="url">order pizza here</a>.</p><p>If our pizza order doesn't get gamed, that just shows how few people visit my blog!
</p>

The state of structured data in Evergreen: 2.6 edition

2014-03-22T15:23:00-04:00

Yesterday at the 2014 Evergreen International Conference I presented Structured library data: holdings, libraries, and beyond--a talk about the work I've done specifically with Evergreen and making some of the connections with Koha and VuFind's capabilities. Lots of attendees seemed happy with the talk and the direction that we're going with Evergreen, and have hope for the future relevance of our libraries' resources within normal search engines, as well as all of the possibilities opened up by exposing this open data about our libraries (locations, hours, branch relationships, contact informatoin) and their resources in a much more consumable form.

There was so much energy in the room, I could have talked for another hour... I love the Evergreen community!

The state of structured data in Evergreen: 2.6 edition

2014-03-22T15:23:00-04:00

There was so much energy in the room, I could have talked for another hour... I love the Evergreen community!

We're not waiting for the ILS to change

2014-02-21T15:30:00-05:00

Over at the Metadata Matters blog, Diane Hillman wrote Why Are We Waiting for the ILS to Change?, asking (in the context of the difficulties libraries experience in making their systems work with RDA):

What I saw underlying that conversation was the assumption that the only way change could happen was if the ILS’s themselves changed; in other words if the ILS vendors decided to lead rather than follow. The situation now is that system vendors say they’ll build RDA compliant systems when their customers ask for them, and libraries say that they’ll use ‘real’ RDA when there are systems that can support it. This is a dance of death, and nobody wins.

I took this as a jumping-off point to discuss the state of linked data support in library systems and discovery software and posted the following comment (currently awaiting moderation):

Who's waiting? Sweden's LIBRIS took essentially the approach you suggested back in 2007, and Bibliothèque Nationale de France and Deutsche Nationalbibliothek have also followed similar paths.

On the smaller-scale, traditional library "integrated" side of things Evergreen and Koha, and on the "disintegrated discovery layer" side VuFind and Blacklight, have integrated RDFa or microdata to publish structured data using schema.org. Here's hoping these open source systems can spur the proprietary alternatives to start competing and doing better.

Ross Singer mentioned that Capita Prism offers linked data in N3 / Turtle / RDF/XML / JSON from record details pages like http://capitadiscovery.co.uk/surrey-ac/items/1173856, so happily there is at least one proprietary catalogue in the smaller-scale library space doing work in this field.

Jumping from RDA to linked data might be a bit of a stretch, but the lack of movement by proprietary vendors in particular hit a sore point that I developed during some of our early W3 Schema.org Bibliographic Extension Community Group discussions. I had asked if anyone else was trying to actually implement what we were discussing. A response from one of the proprietary software representatives was "No, we're waiting to see what develops..." -- which is exactly the attitude that leads to the "dance of death" that Diane described. It can also lead to decisions that are suboptimal, ambiguous, or unimplementable because nobody actually tried to put theory into practice.

Thankfully, a small investment of effort into modifying open source systems to serve as reference implementations can provide a significant amount of insight into flaws or possibilities with otherwise theoretical directions, as well as delivering practical benefits to everyone who uses that software if those modifications are accepted by the parent projects. Here's hoping that the more agile options like Koha, Evergreen, VuFind, and Blacklight continue to push the evolution of their proprietary competitors.

We're not waiting for the ILS to change

2014-02-21T15:30:00-05:00

What I saw underlying that conversation was the assumption that the only way change could happen was if the ILS’s themselves changed; in other words if the ILS vendors decided to lead rather than follow. The situation now is that system vendors say they’ll build RDA compliant systems when their customers ask for them, and libraries say that they’ll use ‘real’ RDA when there are systems that can support it. This is a dance of death, and nobody wins.

I took this as a jumping-off point to discuss the state of linked data support in library systems and discovery software and posted the following comment (currently awaiting moderation):

Who's waiting? Sweden's LIBRIS took essentially the approach you suggested back in 2007, and Bibliothèque Nationale de France and Deutsche Nationalbibliothek have also followed similar paths.

On the smaller-scale, traditional library "integrated" side of things Evergreen and Koha, and on the "disintegrated discovery layer" side VuFind and Blacklight, have integrated RDFa or microdata to publish structured data using schema.org. Here's hoping these open source systems can spur the proprietary alternatives to start competing and doing better.

Ross Singer mentioned that Capita Prism offers linked data in N3 / Turtle / RDF/XML / JSON from record details pages like http://capitadiscovery.co.uk/surrey-ac/items/1173856, so happily there is at least one proprietary catalogue in the smaller-scale library space doing work in this field.

Mapping library holdings to the Product / Offer mode in schema.org

2014-02-03T18:35:00-05:00

Back in August, I mentioned that I taught Evergreen, Koha, and VuFind how to express library holdings in schema.org via the http://schema.org/Offer class. What I failed to mention was how others can do the same with their own library systems (well, okay, I linked to the W3C Schema.org Bibliographic Extension Community Group proposal for representing holdings via Offer but didn't focus on how one would go about doing that). This might have led to Diane Hillman recently finding the wrong, abandoned holdings proposal (thankfully Richard Wallis helped clear things up!). So, better late than never, here is a quick summary:

Each copy that the library holds is marked up as an individual `Offer <http://schema.org/Offer>`__.
The `itemOffered <http://schema.org/itemOffered>`__ property attaches an Offer to a corresponding `Product <http://schema.org/Product>`__ that contains the main description of the goods. In most library systems, this is going to be the title of the item, list of creators, abstract, subject classifications, etc; that which we generally refer to as the bibliographic record. While this will probably have its own type already (Book or Movie or MusicAlbum or the like), you can also include Product as a secondary type (either via a whitespace-delimited list or via the schema.org additionalType property).
Mapping more familiar library terminology to the pertinent properties from Offer goes something like this:
- Library = seller - the range of Organization includes Library as a child type, so you can link to a highly structured description of the library including hours of operation, contact information, location... and that's exactly what we now do in Evergreen
- Call number / shelf number = sku - because a stock-keeping unit number is "a merchant-specific identifier for a product or service", and what is a call number if not a means by which you identify stock on the shelf?
- Barcode = serialNumber - the unique "alphanumeric identifier of a particular product", am I right?
- Shelving location = availableAtOrFrom - "[t]he place(s) from which the offer can be obtained"; with a range of Place this really should be linked to sub-units of the Library type you pointed to for the seller property, but schema.org does accept reality and the inevitability that some plain text values are going to be supplied where a typed range was indicated.
- Item status = availability
- Borrowing terms = businessFunction - another enumeration, for which the most likely value for a library is http://purl.org/goodrelations/v1#LeaseOut. After all, what is a library loan other than a lease with a limited period during which the price is $0.00?
- Price = price - while theoretically unnecessary, explicitly specifying a price of $0.00 may satisfy search engines that always expect to see a price attached to an offer (I'm looking at you, Google Structured Data Testing Tool!)

The language for some of the terminology may seem a little overly commercial right now, but the next iteration of the schema.org standard will adopt language that more broadly supports non-commercial activities... and this broadening of a number of schema.org definitions is also an outcome of the Schema BibEx community efforts. I'm pretty happy with the results of the group over the last six months! Hopefully this sheds some long-overdue light on some of the results of our efforts, and helps other systems adopt our group's recommended practices for exposing metadata via schema.org.

Mapping library holdings to the Product / Offer mode in schema.org

2014-02-03T18:35:00-05:00

Each copy that the library holds is marked up as an individual `Offer <http://schema.org/Offer>`__.
The `itemOffered <http://schema.org/itemOffered>`__ property attaches an Offer to a corresponding `Product <http://schema.org/Product>`__ that contains the main description of the goods. In most library systems, this is going to be the title of the item, list of creators, abstract, subject classifications, etc; that which we generally refer to as the bibliographic record. While this will probably have its own type already (Book or Movie or MusicAlbum or the like), you can also include Product as a secondary type (either via a whitespace-delimited list or via the schema.org additionalType property).
Mapping more familiar library terminology to the pertinent properties from Offer goes something like this:
- Library = seller - the range of Organization includes Library as a child type, so you can link to a highly structured description of the library including hours of operation, contact information, location... and that's exactly what we now do in Evergreen
- Call number / shelf number = sku - because a stock-keeping unit number is "a merchant-specific identifier for a product or service", and what is a call number if not a means by which you identify stock on the shelf?
- Barcode = serialNumber - the unique "alphanumeric identifier of a particular product", am I right?
- Shelving location = availableAtOrFrom - "[t]he place(s) from which the offer can be obtained"; with a range of Place this really should be linked to sub-units of the Library type you pointed to for the seller property, but schema.org does accept reality and the inevitability that some plain text values are going to be supplied where a typed range was indicated.
- Item status = availability
- Borrowing terms = businessFunction - another enumeration, for which the most likely value for a library is http://purl.org/goodrelations/v1#LeaseOut. After all, what is a library loan other than a lease with a limited period during which the price is $0.00?
- Price = price - while theoretically unnecessary, explicitly specifying a price of $0.00 may satisfy search engines that always expect to see a price attached to an offer (I'm looking at you, Google Structured Data Testing Tool!)

What would you understand if you read the entire world wide web?

2014-02-03T15:40:00-05:00

On Tuesday, February 4th, I'll be participating in Laurentian University's Research Week lightning talks. Unlike most five-minute lightning talk events in which I've participated, the time limit for each talk tomorrow will be one minute. Imagine 60 different researchers getting up to summarize their research in one minute each, and you have what is likely to be a brain-melting hour. Should be fun!

Here's a rough draft of what I'm planning to say (which, when read at an even cadence with decent intonation, comes out to exactly one minute:)

What would you understand if you read the _entire_ world wide web?

As humans, we would understand a lot: but we can rely on the context, structure, and significance of elements of web pages to derive meaning.

The algorithms behind search engines adopt a similar approach, but struggle with ambiguity; when a web page mentions "Dan Scott", is it:

"Dan Scott" the character from the One Tree Hill TV show

"Dan Scott" the artist from Magic the Gathering card game

"Dan Scott" the Ontario academic professor from the University of Waterloo

"Dan Scott" the Ontario academic librarian from Laurentian University

schema.org is a vocabulary for embedding explicit meaning and intent within web pages that offers a way to disambiguate those entities.

My research is a collaborative effort--within the auspices of the World Wide Web Consortium--to define bibliographic extensions for schema.org where necessary, and best practices based on concrete implementations in three different library systems.

What would you understand if you read the entire world wide web?

2014-02-03T15:40:00-05:00

Here's a rough draft of what I'm planning to say (which, when read at an even cadence with decent intonation, comes out to exactly one minute:)

What would you understand if you read the _entire_ world wide web?

As humans, we would understand a lot: but we can rely on the context, structure, and significance of elements of web pages to derive meaning.

The algorithms behind search engines adopt a similar approach, but struggle with ambiguity; when a web page mentions "Dan Scott", is it:

"Dan Scott" the character from the One Tree Hill TV show

"Dan Scott" the artist from Magic the Gathering card game

"Dan Scott" the Ontario academic professor from the University of Waterloo

"Dan Scott" the Ontario academic librarian from Laurentian University

schema.org is a vocabulary for embedding explicit meaning and intent within web pages that offers a way to disambiguate those entities.

My research is a collaborative effort--within the auspices of the World Wide Web Consortium--to define bibliographic extensions for schema.org where necessary, and best practices based on concrete implementations in three different library systems.

Ups and downs

2014-01-30T15:00:00-05:00

Tuesday was not the greatest day, but at least each setback resulted in a triumph...

First, the periodical proposal for schema.org--that I have poured a good couple of months of effort into--took a step closer to reality when Dan Brickley announced on the public-vocabs list that he had created a test build that incorporated the RDFS that I had written up. Excitement rapidly turned to horror, though, as I realized that I had made a classic copy/paste error, in which I had changed the displayed name of the domainIncludes value but had not changed the actual URI... Long story short, the test build looked nothing like what the schemabibex group had agreed on, and I was terribly embarrassed.

Luckily, after I fixed the RDFS, Dan was able to put together a revised test build later that day that actually reflected our intentions. So that can continue moving forward...

Second, our Evergreen instance started acting up rather badly. All of the connections to the database server were being gobbled up, and we were scrambling to figure out why. While I'm on sabbatical I'm not really supposed to be involved in the day-to-day operations, but when a core service stops running it's okay for research to wait for a little bit... Eventually I tracked down a fix for a potential denial of service problem ( Search result rendering can crush the system) that hadn't been merged into our production system (the fix came out after the start of my sabbatical), and shortly after I put that into production we were back up and running.

Third, after the Evergreen problem was resolved, Bill Dueber pinged me innocently on IRC. He had run into a problem with File_MARC; when serializing MARC as MARC-in-JSON format, fields with a subfield $0 were getting trashed. Data corrupting bugs are one of the most serious classes of bugs for any package maintainer, so I jumped on this problem too... After a little bit of analysis, I figured out that PHP's type coercion for integer-like keys when creating arrays and its json_encode() implementation were combining to ruin the MARC-in-JSON serialization in this one particular case. Faced with rewriting the entire serialization logic, I did what any (in)sane programmer would and ended up running a regex against the result of json_encode() to turn the array-ified subfield $0 back into a key/value pair. File_MARC 1.1.1 is now available at your nearest PEAR mirror...

Ups and downs

2014-01-30T15:00:00-05:00

Tuesday was not the greatest day, but at least each setback resulted in a triumph...

Luckily, after I fixed the RDFS, Dan was able to put together a revised test build later that day that actually reflected our intentions. So that can continue moving forward...

A slice of sabbatical

2014-01-21T03:22:00-05:00

Yesterday I tested, signed off, and pushed a bunch of bug fixes for the Evergreen library system. Not going to lie; I'm hoping that by clearing up some of the backlog, a few of my own code contributions (like "Add per-library info pages with schema.org structured data support" and Enhanced title display) might get some attention... both branches go a long way towards improving the state of schema.org structured data support in Evergreen.

Today, I took the W3C Schema.org Bibliographic Extension proposal for adding support for periodicals and converted it from wiki format into the RDF Schema format desired by the W3C Web Schemas group. That draft lives here and doesn't look like much. Funny to think that that represents a few months of committee work (two hundred emails or thereabouts, with three conference calls in the mix as well).

I also pushed updated versions of the Perl MARC::Charset and MARC::Record packages to the Fedora Linux distribution. We library types need our tools in top condition, and I had let the packages lag behind the released versions for a while. Nice to clear that off my plate.

A slice of sabbatical

2014-01-21T03:22:00-05:00

RDFa and schema.org all the library things

2013-08-30T16:56:00-04:00

TLDR: The Evergreen and Koha integrated library systems now express their record details in the schema.org vocabulary out of the box using RDFa.

Individual holdings are expressed as Offer instances per the W3C Schema Bib Extension community group proposal to parallel commercial sales offers. And I have published a branch to give the same capabilities to the VuFind discovery layer, as well.

In the spring of 2012, I took my first steps in the structured data world by teaching Evergreen 2.2 how to express some record details in schema.org. It was a small step towards taking the machine-readable data that we had made useful to humans on the record detail catalogue page and marking it up so that it was once again machine readable. At that time, Evergreen only knew how to map MARC data to two schema.org types (Book and MusicRecording--which should have been MusicAlbum, but I eventually fixed that) and a handful of attributes: name, ISBN, publisher, publication date, author, contributor, and keywords. Pretty barebones, but a start nonetheless.

I used the HTML5 microdata approach because I was new to structured data and microdata was what was demonstrated in all of the schema.org examples, so it seemed like the obvious choice. Over the last year, however, I realized that RDFa is a W3C standard for accomplishing the same goals as microdata, bolstered by an open community standards-making process, and featuring the ability to mix in properties and types from multiple vocabularies. I touched on this in my Evergreen 2013 conference presentation: Structured data: making metadata matter for machines. While RDFa Lite is extremely easy to get started with, I have been diving deeper into RDFa proper to make use of some of the more advanced properties, such as @about to work around unwanted chaining introduced by @href attributes.

Over the last few weeks, I was able to concentrate on improving the schema.org mapping for Evergreen--introducing holdings as instances of the http://schema.org/Offer class, providing much more granular author and contributor data--and cut over to RDFa. While the tools at RDFa Tools were quite useful for debugging my efforts, I also have to thank the denizens of the #rdfa IRC channel (and Manu Sporny in particular) for patiently helping me understand some of my rookie mistakes. Ben Shum also kept me honest by patiently testing multiple iterations of my branches with the Google Rich Snippets tool and reporting any issues that he encountered; this led to my realization that using @resource and @about were necessary in some contexts.

Once I had worked out a decent mapping in Evergreen (a library system I have been contributing to for over six years now), I decided to tackle the VuFind discovery layer. VuFind uses a straightforward template system, and I was able to put together a branch that integrated schema.org as RDFa (details at VuFind bug 425), building on Eoghan Ó Carragáin's initial efforts. Once again I included holdings-as-Offers, as the Evergreen driver for VuFind made that easy enough to test. As part of my work, I contributed some enhancements for the Evergreen driver that have already been integrated into VuFind. The initial reception from the VuFind community was positive, although my branch arrived too late for the VuFind 2.1 release; if all goes well, it will be integrated for the VuFind 2.2 release. In the mean time, sites running VuFind that want schema.org structured data can integrate my branch themselves--and please provide feedback!

As I was on a roll, I also opted to tackle the Koha integrated library system. With some initial pointers from Galen Charlton and Chris Cormack to the XSLT-based templating system that Koha uses, I was able to implement schema.org with holdings-as-Offers in a matter of hours for the first iteration. Jared Camins then worked patiently with me as I added small commits to address issues that came up on the Evergreen side, but in under a week from start to finish the branch was signed off, passed QA, and and pushed to master.

(It actually broke the build due to a coding violation--doh!--but that was quickly cleaned up.)

The upshot? We now have two library systems set to publish rich schema.org structured data--including holdings--in RDFa, out of the box by default, in their record detail pages on the Web, and a third system ready to go.

Let me simply say that I love the agility of open source software. So, for the future, I intend to tackle a few more library systems; digital repositories seem like they would be worthwhile targets. On that front, I have inquired on the DSpace developers' list about whether there is still interest in integrating schema.org (as had been expressed a year ago), but have not yet received a reply. Perhaps ArchivesSpace, or furthering the existing support on Islandora? Let me know if you're interested!

RDFa and schema.org all the library things

2013-08-30T16:56:00-04:00

TLDR: The Evergreen and Koha integrated library systems now express their record details in the schema.org vocabulary out of the box using RDFa.

(It actually broke the build due to a coding violation--doh!--but that was quickly cleaned up.)

Making the Evergreen catalogue mobile-friendly via responsive CSS

2013-04-22T02:48:00-04:00

Back in November the Evergreen community was discussing the desire for a mobile catalogue, and expressed a strong opinion that the right way forward would be to teach the current catalogue to be mobile-friendly by applying principles of responsive design. In fact, I stated:

Almost all of this can be achieved via CSS, possibly with some changes to the underlying HTML (e.g. tables to divs or whatever so that "Place Hold" appears under the bib info instead of way over to the right).

I have this bad habit of talking more than doing. So when I saw the Beanstalk mobile catalogue resurrected again at the Evergreen 2013 lightning talks, it bugged me that I still hadn't put any effort into a proof of concept of what was possible with CSS media queries. Thus, today, on the last day of my holidays, I spent a few hours trying things out on our development server and came up with this *rough* branch to work towards making the exact same HTML that we serve up for desktops provide an experience similar to that of the Beanstalk generation of catalogues for mobile, just via CSS.

As you can see from the commits, I made one change to the HTML to define a viewport, and added one set of CSS rules wrapped in a media query; in essence:

...<head>...<meta content="initial-scale=1.0,width=device-width" name="viewport"><style>@media only screen and (max-width: 600px) {    #header {        padding: 0px;        margin: 0px;    }    .facet_sidebar {        display: none;    }    ...}</style><head>...

Results and trade-offs

Here are a few example URLs from our test server (which is slow, and might get wiped any day, so test them quickly if you have a mobile device around!):

Search results - sacrificed facets, per-item actions, and the language picker
Record details - sacrificed per-item actions, flattened the item table vertically

In general, I removed a lot of the frippery from the header, while trying to retain the most valuable pieces. However, some bits are broken: Another Search doesn't actually let you do another search because the search bar is totally hidden. Other bits haven't been touched (Advanced search is still overwhelming, and My Account, while functional, could be much prettier.

What I've done so far is oriented towards our 2.3-ish lightly customized Laurentian skin (we force full details in search results, for example) but the principles should be applicable to an out-of-the-box Evergreen catalogue. In working through some of the challenges, I've determined that I was pretty much on target back in November; with a few HTML tweaks that would improve the layout for desktops as well, we could keep the per-item actions and facets around, but just move them to a different location.

Less talk, more action

So who's with me? What we have to gain is a single set of HTML to support for TPAC, and a single set of CSS, all available from the same URL, rather than trying to maintain overlays and monkeying about with mobile-vs-desktop URLs and the like. Feel free to dig in and start pushing branches with improvements over my rough attempts and let's make this thing happen for Evergreen 2.5.

With thanks to Firefox...

I would be remiss if I did not mention the marvellous Responsive Design View introduced in Firefox 15, along with the Style Editor; together, these tools (built into Firefox) made my developing and testing work so much easier.

If you want to live on the cutting edge of Firefox, you want Aurora - go and get it

Making the Evergreen catalogue mobile-friendly via responsive CSS

2013-04-22T02:48:00-04:00

Almost all of this can be achieved via CSS, possibly with some changes to the underlying HTML (e.g. tables to divs or whatever so that "Place Hold" appears under the bib info instead of way over to the right).

As you can see from the commits, I made one change to the HTML to define a viewport, and added one set of CSS rules wrapped in a media query; in essence:

...<head>...<meta content="initial-scale=1.0,width=device-width" name="viewport"><style>@media only screen and (max-width: 600px) {    #header {        padding: 0px;        margin: 0px;    }    .facet_sidebar {        display: none;    }    ...}</style><head>...

Results and trade-offs

Here are a few example URLs from our test server (which is slow, and might get wiped any day, so test them quickly if you have a mobile device around!):

Search results - sacrificed facets, per-item actions, and the language picker
Record details - sacrificed per-item actions, flattened the item table vertically

Less talk, more action

With thanks to Firefox...

If you want to live on the cutting edge of Firefox, you want Aurora - go and get it

Structured data: making metadata matter for machines

2013-04-12T19:11:00-04:00

Update 2013-04-18: Now with video of the presentation, thanks to the awesome #egcon2013 volunteers!

I've been attending the Evergreen 2013 Conference in beautiful Vancouver. This morning, I was honoured to be able to give a presentation on some of the work I've been doing on implementing linked data via schema.org in Evergreen. I think I did a good job of explaining the potential value of linked data and arguing for improving Evergreen's schema.org publishing ninja skills.

My slides, with a reasonable number of useful speaker notes to provide context, are available in LibreOffice format.[1]

In addition, the amazing organizers of the conference also streamed most[2] of the talk and the recording will be available on the conference web site in a week or two.

Footnotes

I felt pretty dirty not using HTML5 + RDFa Lite to actually mark the whole thing up; there was some question close to the time of the conference as to whether anything but PPT or perhaps PDF would be an acceptable format... a concern that was subsequently removed, but a little too late to be worthwhile changing course.
The room was standing-room only (well, sitting-on-the-floor-room only), and one of the organizers accidentally sat on and unplugged the Ethernet cable, so something like ten minutes were lost. Heh!

Structured data: making metadata matter for machines

2013-04-12T19:11:00-04:00

Update 2013-04-18: Now with video of the presentation, thanks to the awesome #egcon2013 volunteers!

My slides, with a reasonable number of useful speaker notes to provide context, are available in LibreOffice format.[1]

In addition, the amazing organizers of the conference also streamed most[2] of the talk and the recording will be available on the conference web site in a week or two.

Footnotes

I felt pretty dirty not using HTML5 + RDFa Lite to actually mark the whole thing up; there was some question close to the time of the conference as to whether anything but PPT or perhaps PDF would be an acceptable format... a concern that was subsequently removed, but a little too late to be worthwhile changing course.
The room was standing-room only (well, sitting-on-the-floor-room only), and one of the organizers accidentally sat on and unplugged the Ethernet cable, so something like ten minutes were lost. Heh!

Introducing SQL to Evergreen administrators, round two

2013-02-16T02:32:00-05:00

Three years ago I was asked to create and deliver a two-day course introducing SQL to Evergreen users. Things went well and I was able to share the resulting materials with the Evergreen and PostgreSQL community. Perhaps one of my happiest moments at the Evergreen conference last year was when one of the participants in that course, told me that many of his fellow participants were still successfully writing SQL queries and getting work done. Huzzah!

Time goes by and another group, OHIONET, was running into difficulties getting started with PostgreSQL and Evergreen. They asked me if I would be willing to give the same sort of training I had given a few years back. "Sure", I said, thinking it would be a great opportunity to polish the materials and add some updates to cover new features in PostgreSQL and Evergreen. We also opted to skip the travel and do an entirely virtual training session via Google Hangouts, which worked out rather nicely (but that's a different story).

As it turned out, I probably ended up putting about four days worth of effort (crammed into lots of late nights, weekends, and vacation days) into overhauling the instruction materials. But the results were worth it, in my opinion; I'm rather proud of the content, and while I believe it stands up on its own, the guidance that I was able to provide during the live instruction sessions was well-received by the participants.

Thus, I am pleased to be able to offer to the broader community the latest version of the Introduction to SQL for Evergreen Administrators, under a Creative Commons Attribution-ShareAlike 3.0 (Unported) license.

Reference documentation--30 pages introducing SQL with examples drawn from the Evergreen schema: (HTML) (PDF) (ePub) (AsciiDoc)
Presentation: (LibreOffice Impress) (PDF)
Solutions to exercises: (Day 1) (Day 2)

So, a huge thanks to OHIONET for giving me the impetus to overhaul this material, and for giving me a chance to introduce them to the wonders of SQL with PostgreSQL, and to the inner workings of the Evergreen schema. It was a blast! And thanks for agreeing to let me share these materials with the broader community.

Introducing SQL to Evergreen administrators, round two

2013-02-16T02:32:00-05:00

Reference documentation--30 pages introducing SQL with examples drawn from the Evergreen schema: (HTML) (PDF) (ePub) (AsciiDoc)
Presentation: (LibreOffice Impress) (PDF)
Solutions to exercises: (Day 1) (Day 2)

Leaving SELinux in enforcing mode with Evergreen on Fedora 17

2012-09-02T05:26:00-04:00

Ever since I switched over to Fedora a few years back (hi Fedora 13!), I've been guilty of a dirty secret: to run Evergreen, I've had to run setenforce 0 to disable the most excellent SELinux security policies before I could start up the Apache web server to serve up the Evergreen goodness. This worked for development purposes, but tonight something snapped and I decided that it was no longer acceptable to throw away a great layer of operating system security simply for the sake of hacking on Evergreen. So... I stepped into the world of what had formerly seemed to be inscrutable SELinux concepts, and came out with something that seems to work (at least for my fairly limited purposes thus far of searching the TPAC catalogue).

This was a pretty iterative process that involved trying to start the httpd.service, then checking /var/log/messages and /var/log/audit/audit.log for clues as to why httpd.service was either not starting, or (once I passed that hurdle) was simply returning internal server errors.

First, due to my recent experience with running a web.py script under Fedora, I had learned that the httpd SELinux policy had a number of booleans for enforcing or allowing particular behaviours, so I immediately ran the following command to enable httpd to connect to the network:

setsebool httpd_can_network_connect on

I then needed to change the labels on many of the OpenSRF and Evergreen files that were installed and which Fedora gave a default type of unconfined_t, which is understandably restrictive:

# Mark web content as, well, web contentchcon -R --type=httpd_sys_content_t /openils/lib/javascriptchcon -R --type=httpd_sys_content_t /openils/var/webchcon -R --type=httpd_sys_content_t /openils/var/templates*chcon -R --type=httpd_sys_content_t /openils/var/datachcon -R --type=httpd_sys_content_t /openils/var/xslchcon --type=httpd_sys_content_t /openils/conf/opensrf_core.xmlchcon --type=httpd_sys_content_t /openils/conf/fm_IDL.xml # Mark the custom Apache modules chcon --user=system_u --type=httpd_modules_t /usr/lib64/httpd/modules/mod_xmlent.so chcon --user=system_u --type=httpd_modules_t /usr/lib64/httpd/modules/osrf_*# Mark the dynamic libraries we need to load# "-h" changes the context of symlinks as well as fileschcon -h --type=lib_t /openils/lib/*# Mark executable scriptschcon -t httpd_sys_script_exec_t /openils/bin/openurl_map.pl chcon -t httpd_sys_script_exec_t /openils/bin/offline-blocked-list.pl # Might not have been necessarychcon -R --user=system_u /usr/local/share/perl5/chcon --user=system_u /etc/httpd/conf.d/eg.conf chcon --user=system_u /etc/httpd/startup.pl chcon --user=system_u /etc/httpd/eg_vhost.conf chcon -R --user=system_u /etc/httpd/ssl/

*Note:* I'm aware that simply running chcon won't survive a relabelling of the files. We really need to turn this into a policy, or alternately use semanage to make the changes permanent...

Next, I opted to finally start running Apache as the stock apache:apache user/group rather than as the opensrf user. This turned out to require only a few steps:

Change the User setting in /etc/httpd/conf/httpd.conf back to apache, reverting the change we made following the default install documentation.
To avoid errors writing to the /openils/var/log directory, cut over to using syslog - which, on Fedora, is provided by rsyslogd.
1. Copy the very handy Open-ILS/examples/evergreen-rsyslog.conf file that Bill Erickson created into /etc/rsyslog.d/
2. Restart the rsyslogd service with systemctl restart rsyslog.service.
3. Edit /etc/httpd/eg_vhost.conf and /openils/conf/opensrf_core.xml to use syslog instead of writing to log files.
4. Restart the OpenSRF services.
One more restart of the httpd service and I was in business.

So this is a start. I think this has broader implications than for just Fedora; we should stop using the opensrf user to run the Apache service in the default configuration on all distributions (we've discussed this several times in the past, but never really done anything about it).

I hope to update the README accordingly, and I also hope to take the SELinux work a step further to provide a modified policy so that Fedora and Red Hat (and derivative) distributions can offer a more secure environment for running Evergreen.

Oh, and some handy resources:

Leaving SELinux in enforcing mode with Evergreen on Fedora 17

2012-09-02T05:26:00-04:00

setsebool httpd_can_network_connect on

I then needed to change the labels on many of the OpenSRF and Evergreen files that were installed and which Fedora gave a default type of unconfined_t, which is understandably restrictive:

# Mark web content as, well, web contentchcon -R --type=httpd_sys_content_t /openils/lib/javascriptchcon -R --type=httpd_sys_content_t /openils/var/webchcon -R --type=httpd_sys_content_t /openils/var/templates*chcon -R --type=httpd_sys_content_t /openils/var/datachcon -R --type=httpd_sys_content_t /openils/var/xslchcon --type=httpd_sys_content_t /openils/conf/opensrf_core.xmlchcon --type=httpd_sys_content_t /openils/conf/fm_IDL.xml # Mark the custom Apache modules chcon --user=system_u --type=httpd_modules_t /usr/lib64/httpd/modules/mod_xmlent.so chcon --user=system_u --type=httpd_modules_t /usr/lib64/httpd/modules/osrf_*# Mark the dynamic libraries we need to load# "-h" changes the context of symlinks as well as fileschcon -h --type=lib_t /openils/lib/*# Mark executable scriptschcon -t httpd_sys_script_exec_t /openils/bin/openurl_map.pl chcon -t httpd_sys_script_exec_t /openils/bin/offline-blocked-list.pl # Might not have been necessarychcon -R --user=system_u /usr/local/share/perl5/chcon --user=system_u /etc/httpd/conf.d/eg.conf chcon --user=system_u /etc/httpd/startup.pl chcon --user=system_u /etc/httpd/eg_vhost.conf chcon -R --user=system_u /etc/httpd/ssl/

*Note:* I'm aware that simply running chcon won't survive a relabelling of the files. We really need to turn this into a policy, or alternately use semanage to make the changes permanent...

Next, I opted to finally start running Apache as the stock apache:apache user/group rather than as the opensrf user. This turned out to require only a few steps:

Change the User setting in /etc/httpd/conf/httpd.conf back to apache, reverting the change we made following the default install documentation.
To avoid errors writing to the /openils/var/log directory, cut over to using syslog - which, on Fedora, is provided by rsyslogd.
1. Copy the very handy Open-ILS/examples/evergreen-rsyslog.conf file that Bill Erickson created into /etc/rsyslog.d/
2. Restart the rsyslogd service with systemctl restart rsyslog.service.
3. Edit /etc/httpd/eg_vhost.conf and /openils/conf/opensrf_core.xml to use syslog instead of writing to log files.
4. Restart the OpenSRF services.
One more restart of the httpd service and I was in business.

Oh, and some handy resources:

Enabling mod_wsgi with LDAP access under Fedora 17

2012-07-11T14:23:00-04:00

Continuing my path of new problem to solve = opportunity to try something new, I opted to give web.py a shot as a Web front-end for an existing script I had put together to provision users in our Evergreen library system based on their LDAP entry. The goal was to provide access to the functionality of the script, without having me as a single point of failure... something I have intended to put in place for a long time, but which jumped up in priority once I went on vacation and received a few requests (surprise, surprise).

Creating a web.py front end was easy enough. It was a bit more challenging to put it into production, because my production box for this task runs Fedora 17, and that means SELinux. In the past, my knee-jerk reaction during development would be to setenforce 0 and be done with it, but exposing it to more than just me at the terminal means taking some care. So, fortunately, it was pretty easy to sort out (thanks largely to the assistance gleaned from this Packtpub.com article, minus the compiling mod_wsgi from source bits).

The pertinent bits for my case were:

Install mod_wsgi and web.py: yum install mod_wsgi python-webpy
Configure /etc/httpd/conf/httpd.conf to include the appropriate WSGIScriptAlias line
Fix the SELinux label on the WSGI files: chcon -R httpd_user_content_t patron-load
Allow Apache to connect to an LDAP server: setsebool -P httpd_can_connect_ldap=1

And poof: my server still has the protection of SELinux, and my desired functionality works. Yay!

Enabling mod_wsgi with LDAP access under Fedora 17

2012-07-11T14:23:00-04:00

The pertinent bits for my case were:

Install mod_wsgi and web.py: yum install mod_wsgi python-webpy
Configure /etc/httpd/conf/httpd.conf to include the appropriate WSGIScriptAlias line
Fix the SELinux label on the WSGI files: chcon -R httpd_user_content_t patron-load
Allow Apache to connect to an LDAP server: setsebool -P httpd_can_connect_ldap=1

And poof: my server still has the protection of SELinux, and my desired functionality works. Yay!

Running libraries on PostgreSQL: PGCon 2012 talk

2012-05-20T17:57:00-04:00

On Friday, May 18th I gave a talk at the PGCon 2012 conference on the use of PostgreSQL by the Evergreen project. My talk fell in the case study track, which meant that I had been asked to describe to PostgreSQL developers what Evergreen was, why it was a project they might want to care about, enumerate the advantages that Evergreen gets from using PostgreSQL, and where our project has some difficulties with PostgreSQL.

I have given a lot of talks before, but I’m used to being on the developer side of the discussion. In this case, the tables were turned; with noted PostgreSQL contributors like Josh Berkus, Chris Brown, Simon Riggs, and Robert Treat in the audience, I was a user talking to the developers of something that I was very much dependent on and which I understood at a much more basic level than they did. This was both liberating and humbling; it definitely adds some perspective to my experiences as a developer in the Evergreen project.

Along with my slides, the whole talk has been professionally recorded - both video and audio - thanks to Heroku’s sponsorship, so you will be able to relive each and every word if you really want to. I’ll summarize the main points that I wanted to convey to the PostgreSQL developers:

I was quite candid that most libraries can’t afford dedicated database administrators, and that therefore the more that PostgreSQL can provide reasonable out-of-the-box configuration settings, the better. For example, results from the survey that I sent out at the last minute (THANK YOU to the nine sites that responded!) showed many sites running with a default statistics target of 50, whereas the default had been increased to 100 back in PostgreSQL 8.1 and much higher settings are often recommended to help the planner make its decisions. That said, my survey didn’t ask for table-level statistics settings (did you know that you could change the statistics for particular tables?), so perhaps some sites are using higher statistics levels for particular tables and a lower default threshold.
It was probably hokey, but I noted that as libraries are often called the heart of their community, that PostgreSQL was effectively the heart of Evergreen — and I invited the PostgreSQL community to help our heart beat faster. With the Evergreen Oversight Board contemplating a strategic investment fund for initiatives that will have a long-term benefit to Evergreen, this might be an avenue for getting PostgreSQL experts to assist us on areas that represent particular bottlenecks (beyond helping us out of the goodness of their own hearts). As well, I invited the PostgreSQL community to join in advocacy efforts to get their local libraries to consider adopting Evergreen.
I described, at a high-level, many of the PostgreSQL features that Evergreen relies on (full-text search, stored procedures, Hstore, inheritance) and tried to convey why our schema takes up 355 tables (and counting) to deal with what, from outside a library perspective, must seem like a relatively simple problem to deal with. And of course I gave most of the credit for Evergreen’s PostgreSQL-savviness on multiple levels to Mike Rylander.

The talk was well-received, based on a number of people who approached me afterward to continue the discussion. Josh called it one of the first times he had seen a presentation designed to solicit assistance directly from the developers in attendance (I probably overplayed the "help us poor harried library system administrators" hand) and thought that it hit the mark for a case study; similarly, Simon was quite interested in Evergreen’s adoption patterns with (I suspect) an eye towards offering possible consulting in administration and optimization efforts.

On the "immediate takeaways" from that talk:

For straightforward connection pooling, pgbouncer is the current recommendation over the more flexible but more complicated pgpool-II.
Recent versions of Slony have lifted limitations that bit us in the past, like the inability to replicate a TRUNCATE command.
Solr, as a potential alternative to PostgreSQL’s full-text search, is seen as fast but brittle to manage, and adds in overhead to maintain consistency with the contents of the database. (I’m not so sure about the brittleness, given Hathitrust’s ability to run a massive Solr index, but it is worth following up on…)
Streaming replication in 9.1 has improved significantly over 9.0, although you’ll still want to have WAL archiving in case of disaster.

I have a lot more to say about the intersection of the PostgreSQL and Evergreen communities in general, but on the whole I think that a closer relationship has been long overdue. I was delighted that Ben Shum and Robin Isard were both able to attend the conference, and I firmly believe that building more PostgreSQL development and administration expertise within the Evergreen community is critical to our long-term success. While I have long been an advocate of pointing community members to the documentation of the underlying infrastructure components for specific administration recommendations, I believe that effective PostgreSQL tuning and administration is so critical to the successful implementation of a production Evergreen site that we should add a section to the Evergreen documentation containing a small set of considerations and/or processes for going into production—and I hope to start that relatively soon.

Running libraries on PostgreSQL: PGCon 2012 talk

2012-05-20T17:57:00-04:00

I was quite candid that most libraries can’t afford dedicated database administrators, and that therefore the more that PostgreSQL can provide reasonable out-of-the-box configuration settings, the better. For example, results from the survey that I sent out at the last minute (THANK YOU to the nine sites that responded!) showed many sites running with a default statistics target of 50, whereas the default had been increased to 100 back in PostgreSQL 8.1 and much higher settings are often recommended to help the planner make its decisions. That said, my survey didn’t ask for table-level statistics settings (did you know that you could change the statistics for particular tables?), so perhaps some sites are using higher statistics levels for particular tables and a lower default threshold.
It was probably hokey, but I noted that as libraries are often called the heart of their community, that PostgreSQL was effectively the heart of Evergreen — and I invited the PostgreSQL community to help our heart beat faster. With the Evergreen Oversight Board contemplating a strategic investment fund for initiatives that will have a long-term benefit to Evergreen, this might be an avenue for getting PostgreSQL experts to assist us on areas that represent particular bottlenecks (beyond helping us out of the goodness of their own hearts). As well, I invited the PostgreSQL community to join in advocacy efforts to get their local libraries to consider adopting Evergreen.
I described, at a high-level, many of the PostgreSQL features that Evergreen relies on (full-text search, stored procedures, Hstore, inheritance) and tried to convey why our schema takes up 355 tables (and counting) to deal with what, from outside a library perspective, must seem like a relatively simple problem to deal with. And of course I gave most of the credit for Evergreen’s PostgreSQL-savviness on multiple levels to Mike Rylander.

On the "immediate takeaways" from that talk:

For straightforward connection pooling, pgbouncer is the current recommendation over the more flexible but more complicated pgpool-II.
Recent versions of Slony have lifted limitations that bit us in the past, like the inability to replicate a TRUNCATE command.
Solr, as a potential alternative to PostgreSQL’s full-text search, is seen as fast but brittle to manage, and adds in overhead to maintain consistency with the contents of the database. (I’m not so sure about the brittleness, given Hathitrust’s ability to run a massive Solr index, but it is worth following up on…)
Streaming replication in 9.1 has improved significantly over 9.0, although you’ll still want to have WAL archiving in case of disaster.

The State / Stats of Evergreen development: 2011-2012

2012-04-30T00:56:00-04:00

On Thursday, April 26, I was part of The State of Evergreen talk, organized by Grace Dunbar, that also included sections by the dynamic combo of Kathy Lussier, Ben Hyman, and Tara Robertson. We opened the Evergreen 2012 conference and lead into the day's featured keynote speaker Mr. Jono Bacon (who, by the way, gave a good talk about community at an important time in Evergreen's growth).

My assigned mission was, with a time limit of 5 minutes, to give the audience an update on the progress in Evergreen development since the 2011 conference. Naturally, I turned to gource to generate a visualization of the changes committed to the Evergreen git repository since April 2011.

With the visualization running in the background, I ran over the following numbers (statistics is probably too strong of a word) with the audience...

Let’s go with *Stats of Evergreen development*

Code contributors

Over the past year, we have seen:

2209 commits from a total of 29 different authors (8 active core committers)
9 contributors outside of the core committer group with 5 or more commits:
- Jason Stephenson - 48
- Michael Peters - 26
- Scott Prater - 20
- Joseph Lewis - 19
- James Fournie - 16
- Robin Isard - 12
- Liam Whalen - 6
- Ben Shum - 6
- Steven Callender - 5
One female contributor - Sarah Chodrow (More, please!)

Source code visualization

Features

Autosuggest for searches
TPAC - a sane, fast, functional catalogue - Print & email & SMS record details - Opt-in circulation & hold history
Authentication proxy - with example support for LDAP authentication in JSPAC
Custom library hierarchies, library visibility, and copy location groups
Staff client enhancements: secondary sorting columns, row numbers, double-clickery, configurable toolbars
Patron statistical categories: defaults, freetext control, required-ness
Acquisitions, MARC Batch Import/Export, and serials UI enhancements
Circulation limits

Policies and procedures

Master is always stable
- To avoid time-wasting regressions, every commit must be reviewed
  
  and tested by a second developer
Timed releases - for predictability
- One major release every six months, starting with 2.2.0
- Patch releases - no timed policy as of yet
Community support policy
- Each major release gets 12 months of full support, followed by 3
  
  months of security patches
- Therefore, sites should plan on one major upgrade per year
Database upgrade script sanity

Communication

Developer mailing list - 970 messages
Internet relay chat (IRC) channel - 76,476 lines and other stats
- tsbere and dbs in a neck-and-neck race with 13,474 and 12,062 lines, respectively
- 26 people averaged more than one lines per day
Developer IRC meetings - 19 meetings held

Documentation

Since last year:

12 meetings
200 commits, covering 2.0, 2.1, and 2.2
Conversion from DocBook to AsciiDoc
Single sourcing install documentation and release notes

Kudos to:

Karen Collier for direction and organization
Robert Soulliere for tirelessly formatting and publishing
Yamil Suarez for picking up the torch from Karen
Many other members of the Documentation Interest Group (DIG)

Releases

2.0 series

April 2011 - 2.0.5

May 2011 - 2.0.6

June 2011 - 2.0.7

August 2011 - 2.0.8, 2.0.9

October 2011 - 2.0.10, 2.0.10a

2.1 series
- October 2011 - 2.1.0, 2.1.0a
- November 2011 - 2.1.1
2.2 series
- November 2011 - 2.2 alpha1
- March 2012 - 2.2 alpha2, 2.2 alpha3
- April 2012 - 2.2 beta1, 2.2 beta2

Why I donated to the Software Freedom Conservancy

2011-12-26T14:15:00-05:00

A few days ago I made a small donation to the Software Freedom Conservancy, a 501(c)(3) non-profit organization registered in the United States. There are many organizations to which I could have donated, and indeed Lynn and I have donated to a number of charities again this year, but I felt it was important to direct some funds to the Conservancy for a number of reasons - which I will attempt to describe and hopefully convince you as well.

First, for those who know that the Evergreen open source integrated library system is a member project of the Conservancy and the the project on which I invest much of my professional and person time, an obvious question might be: "Why didn't you just donate to Evergreen?". Donating to Evergreen does result in a small percentage of those funds being directed to the Conservancy. Currently, Evergreen directs 5% of its income to the Conservancy, but I feel that even with $20,000 passing through the project's hands for the purposes of the 2012 Evergreen conference, that $1,000 that goes to the Conservancy is far below the value our project has received in return in the form of Conservancy services. One of those services is the provision of a trusted third-party home for project assets such as the aforementioned finances, but also including domain names, trademarks, logos, and (if desired) copyright. While distributed ownership of these assets is not a problem for projects when everything is going fine, personal disputes, a change of business strategy, or new ownership of a contributing company can lead to severe difficulties for a project. Evergreen's sister project, Koha, found itself forced to change its domain name and fight trademark battles over its very name when one company adopted an aggressive business strategy.

Another service from which Evergreen has thus far derived great benefit is access to legal counsel familiar with software freedom issues. In September the Conservancy added Tony Sebro as General Counsel to offer basic legal assistance to its member projects. The Conservancy was most recently involved in a discussion about Evergreen documentation licensing that evolved from an unfortunately adversarial position to, shortly after the Conservancy became involved, a mutually satisfactory agreement. I believe this result was due not only to Conservancy's legal expertise and familiarity with the specific licenses in question and the general mechanism of granting licenses, but also with their ability to understand the goals of the project and its participants in helping to guide all parties to their desired goals.

The Conservancy also has a wealth of experience to draw upon to offer guidance expertise on many matters that free software projects have in common, but which each project tends to rediscover on its own. For example, the Evergreen project has been able to run conferences on an annual basis for the past three years, but has historically relied on Equinox's willingess to assume the financial risks when signing venue contracts. This year, due to the positive results of the previous conferences, the Conservancy was able to provide the deposit for the Evergreen 2012 conference in Indiana. While personally I deeply appreciate the role that Equinox has played in helping to build such a core part of our community experience, it is an important step for our project that the Conservancy be able to assume this role.

In addition, the Conservancy's experience with various conference management packages and the payment fees associated with online financial services such as Google Checkout and PayPal provided some important guidance early on in the Evergreen conference 2012 planning process. That advice probably paid for itself!

I expect that the Evergreen project will continue to benefit from our membership in the Software Freedom Conservancy as we work towards a mechanism for electing members of the Evergreen Oversight Board and continue growing and evolving the project. The $1,000 or so that the Conservancy has earned as a result of the 5% of revenue that Evergreen directs its way is far below the value that we have derived from our relationship thus far, and that is why I have chosen to donate to the Conservancy again this year.

P.S. As a 501(c)(3) non-profit, donations to the Conservancy are tax-deductible for American citizens. As a Canadian, this particular benefit does not apply to me - however, the rest of the benefits that the Conservancy provides to free software projects are international in scope and deserve to be supported.

Current state of academic reserves support for Evergreen

2011-09-08T03:09:00-04:00

One of the relatively frequent questions that I run into with Evergreen is "Does Evergreen have an academic reserves module?" And the answer is: well, yes, and no. There is no official academic reserves module that is part of the standard Evergreen package that you download and install from http://evergreen-ils.org.

However, I am aware of two free-and-open-source modules that are available as extensions to Evergreen:

A relatively simple, straightforward module, written by my colleague Kevin Beswick, is in use at Laurentian University and recently was adopted by the emily carr university of art + design. It builds on Evergreen's bookbags feature to organize reserves of physical items by class code and instructor name. The module for that code--a mix of PHP, Dojo, and SQLite--is available on Github, and you can see it in action at Laurentian University.
- UPDATE 2012-12-21: See the version I forked at https://github.com/dbs/library/tree/master/reserves with updates supporting TPAC integration
Syrup is a more sophisticated reserve system (you know it's a serious project when it has a name!), which supports all kinds of features - such as mixes of electronic and physical materials, organizing course content by arbitrary groupings (e.g. readings per week), limiting user access to the content of specific courses based on LDAP integration, and much much more. You can see a running instance at the University of Windsor and the code (primarily written in Python) is freely available from the Syrup git repository on Evergreen's git server. If you need help getting up and running, Syrup's mailing list is probably a good place to start.

So, there are at least two choices for academic reserves for Evergreen. Go ahead and pick the one that meets your needs!

Current state of academic reserves support for Evergreen

2011-09-08T03:09:00-04:00

However, I am aware of two free-and-open-source modules that are available as extensions to Evergreen:

A relatively simple, straightforward module, written by my colleague Kevin Beswick, is in use at Laurentian University and recently was adopted by the emily carr university of art + design. It builds on Evergreen's bookbags feature to organize reserves of physical items by class code and instructor name. The module for that code--a mix of PHP, Dojo, and SQLite--is available on Github, and you can see it in action at Laurentian University.
- UPDATE 2012-12-21: See the version I forked at https://github.com/dbs/library/tree/master/reserves with updates supporting TPAC integration
Syrup is a more sophisticated reserve system (you know it's a serious project when it has a name!), which supports all kinds of features - such as mixes of electronic and physical materials, organizing course content by arbitrary groupings (e.g. readings per week), limiting user access to the content of specific courses based on LDAP integration, and much much more. You can see a running instance at the University of Windsor and the code (primarily written in Python) is freely available from the Syrup git repository on Evergreen's git server. If you need help getting up and running, Syrup's mailing list is probably a good place to start.

So, there are at least two choices for academic reserves for Evergreen. Go ahead and pick the one that meets your needs!

The wonderful new OpenLibrary Read API and Evergreen integration

2011-06-02T20:06:00-04:00

Back in early May, I was in San Francisco for Google I/O. I had booked an extra day with the hopes of either doing some site-seeing or meeting up with the OpenLibrary team. After firing off an email to find out if anyone there was interested on working on some tighter integration between OpenLibrary and Evergreen, the answer from George Oates was an enthusiastic "Yes!". So, we spent a beautiful sunny day inside the Internet Archive headquarters discussing possible directions for this integration. Alcatraz, you can wait for my next trip...

As it turned out, the timing was great. I had spent a day hacking on the OpenLibrary "added content" module for Evergreen during the Evergreen hackfest (which I spent in an airport due to an eight-hour fog delay... different story), so I was quite familiar with the existing OpenLibrary Book API and their patterns of use were fresh in my brain. The biggest problem with the existing Book API, from my perspective, was that I had to make two calls for each work that I was interested in retrieving information about; one call returned the data (stable elements) and one call returned the details (unstable, but quite interesting elements like the table of contents, excerpts, etc).

The OpenLibrary team had this in their sights as well - but they wanted to tackle a bigger target. Rather than making one or more calls per work, they wanted to expose an API that would let users request info for multiple works in one shot: the Shotgun API (known amongst more polite company as the Read API). Loosely modelled on the Hathitrust API, it would also focus on exposing URLs for reading or borrowing (using the relatively recent OpenLibrary borrowing program) exact matches or similar editions. It sounded great, and we spent the afternoon fleshing out how we wanted it to look and work. My role was largely that of the third-party developer - the API customer - and we had great discussions.

Working code wins

Of course, discussions are one thing, and working code is another. OpenLibrary developer Mike McCabe was riding shotgun on the development of the Read API, and once he had enough working code in place, he contacted me to ask me to start developing against it. It was the usual development process: I started with a hard-coded sample JSON output, then as Mike pushed more functionality into a server environment I was able to test and expand my client-side code.

So where are we now? I can vouch that working with the all-in-one Read API, as a developer, is sweet. All of the data elements are readily visible in sweet, sweet JSON, in a single call, and it is utterly simple to pull the bits that you want to expose. I had been trying to pull together ebook links and the like from the Books API, and the use of the items list makes that absolutely painless for developers. Kudos!

Evergreen has a largely rewritten OpenLibrary added content module built against the Read API sitting in the Evergreen working repository user/dbs/openlibrary-read-api branch. As the Borrow and Read functions depend on IP address range matching, I have added the ability to proxy the Read API requests via the Evergreen server - so that if an Evergreen institution has special access rights to the OpenLibrary collection, their patrons will see the appropriate levels of access in the catalogue. Oh yes, the catalogue; as we were already using OpenLibrary by default for cover art, tables of content, and excerpts in Evergreen since the 2.0 release, the major difference that will be visible to Evergreen users will be in search results:

As you can see, if you have left the OpenLibraryLinks variable turned on in the result_common.js file, Evergreen will search for a matching record in OpenLibrary and tell you if an online version is available. It tells you whether the online version is an exact match, or similar, and will also expose items that you can borrow from OpenLibrary. Given the preponderance of print materials that still remains in our collections, and our users' general preference for anything electronic, I think this will be an extremely popular feature.

Moving forward

There are a number of areas that could use more polish and tender loving care.

First and foremost, OpenLibrary supports matching based on ISBNs, LCCNs, OCLC numbers, and OpenLibrary IDs; right now, the Evergreen support is based strictly on ISBNs, which of course don't exist for many of the older materials in our collections. So a fruitful direction would be to take the regular dump of data that OpenLibrary thoughtfully provides (yay for open data) and use that to augment our records to include OpenLibrary ID numbers to use as a match point.

There is the small matter of merging these changes back into Evergreen proper.

I developed against the Evergreen 2.0 branch because I wanted to be able to put this code into production as soon as possible, so there will be a tiny bit of merging pain to get this into master and backported properly. However, the changes are quite localized and should be agreeable, so hopefully this will not sit in a branch for too long.

At this early stage in the Read API's release, I have also found that it can be a bit slow to respond to requests containing a number of identifiers (or perhaps a large number of records and items). It is to be expected that functionality comes first and optimization comes later, so I have great hopes for improved performance once the Read API settles down.

Of course, once you have the Read API, you need an Write API - and I hope to be able to help pilot that as well, because the potential communal benefit of a Write API for library systems that have integrated with OpenLibrary is huge. Imagine a system where, when you ask for added content based on a given identifier, if the system says "Huh, I don't know anything about that identifier" it follows up with "Hey, can you POST what you know about it to this URL?".

OpenLibrary could then run its algorithms and either add an edition to an existing work or generate a new work. We should also be able to expose OpenLibrary's metadata editing tools for our users, so they can flag bad cover art, or add a table of contents to works that they are passionate about, or post a favourite excerpt... Enabling a bi-directional give and take between systems has the potential to quickly make OpenLibrary a huge knowledgebase of open data. It would be a great boon for libraries, and I hope we can make it happen.

Update 2011-06-02 21:54 EDT: The omission of Mike McCabe's name has been corrected. Also, I forgot to thank my employer, Laurentian University, and the University of Windsor for allowing me to invest some of my time on strengthening Evergreen's ties to OpenLibrary. I believe this is the beginning of a solid, mutually beneficial partnership!

Reducing cached content pain after Evergreen upgrades

2011-05-23T15:16:00-04:00

If you have been through an Evergreen upgrade, you know that the days after the upgrade can be painful. Users complain that the catalogue doesn't work right, there are mysterious glitches that happen on some machines and not others (even though the browser and operating systems are identical on each machine!), rebooting doesn't help... and then eventually the problem goes away.

The problem isn't all that mysterious, really, it's the result of the browser caching content. Normally, browser caching is a very positive experience: when a browser requests a file from a Web server, the Web server tells it to how long the browser should hold onto the file via a Cache-control directive. This means that if a page on your Web site is dozens of hundreds of images and CSS and JavaScript files, your browser doesn't have to download every one of those files on every page you visit; as long as the file hasn't expired, the browser can just serve it up from the local cache and only the fresh content needs to be fetched from the server. It's how the Web works, and it's really important for performance reasons.

However, if your Web server has told your browser to cache files for a month, and then during that month you upgrade your Web site so that there is new JavaScript and CSS files that your fresh content depends on, then you can run into trouble until those cached files expire. And that is exactly the case that we run into with Evergreen upgrades - only the problem is amplified by how heavily the Evergreen catalogue (which is just a Web site) relies on JavaScript for basic operations.

On the user side, you can handle the problem a few ways:

Doing a hard refresh to force the browser to fetch fresh versions of all the files in its cache. You can force a hard refresh on most browsers by holding down the Shift key and clicking the Refresh or Reload button.
Emptying the browser cache.

Neither of these user-side approaches is particularly convenient. Doing a hard refresh may work for one page, but as the user navigates to a different page that uses different CSS and JavaScript, they will have to do another hard refresh... and so on, which in the case of Evergreen means users will have to refresh around a half-dozen different pages (home page, search results, record details, account, advanced search). Hard refreshes are also not reliable, as resources fetched by XHR are not actually refreshed (this is a long-standing bug with Chrome and Firefox). If you don't know what XHR is, just know that Evergreen uses a lot of them. And emptying the browser cache is both painful (every browser has a different way of emptying browser cache) and overkill (you just want to discard the cache for one site, but most browsers will discard the cache for every site they have visited).

The "right" solution is to have the server tell the browser to fetch a new version of the resource. You could change the caching settings to be very short-lived - for example, change the cache time from one month down to one day for JavaScript and CSS - but unless you upgrade your site very frequently, that would mean that 99% of the time your users' browsers will be making unnecessary requests, and their experience of your catalogue will be that it is slower to load than other sites on the Web. Not so good.

The other approach is to change the pathname for the cached resources at upgrade time so that the browser doesn't find a match in its local cache and has to fetch the new version. There's some good news: some work has been going on in the Evergreen 2.1 release to tackle this problem, but it is not yet complete. And most sites are only looking at moving to 2.0 right now. As it happens, we made the jump from 1.6.1.8 to 2.0.6+ yesterday and boy howdy the browser cache was a problem after the upgrade, as one would expect. I took a quick stab at identifying the most likely paths that needed to be refreshed and threw together some shell commands to "munge" our catalogue skins so that browsers would be forced to pick up the new versions of the content.

Post-upgrade panic, I refactored those commands into a Perl script named cache-munger.pl (well, more precisely, a Perl script that generates shell commands). The Perl script has two hardcoded variables: a datestamp (which is really any uniquely identifying string that can appear in a directory name and URL) and a list of catalogue skins to munge. When you run the script, it generates a set of shell commands that you should be able to run on your Evergreen 2.0 instance to force browsers to cache the new version of your catalogue's JavaScript and CSS files.

Some limitations: I haven't written a script to convert your skins back to pristine mode (that's mostly a matter of updating the ack-grep commands and reversing the sed commands). And I haven't written a script to update a munged set of skins. And, I'm not 100% sure that I've hit every set of JavaScript and CSS that needs to be refreshed after an upgrade from 1.6 to 2.0. But it's a reasonable start, in my opinion, and hopefully it helps inform the Evergreen 2.1 effort so that we can have a standard, supported, painless means of telling browsers to fetch new resources as an automatic part of any upgrade in the future.

Authority support in Evergreen 2.0

2011-04-29T15:09:00-04:00

I'm at the Evergreen 2011 conference in balmy Decatur, Georgia... which wasn't a sure thing yesterday, given that the day started with an eight hour delay at the Sudbury airport due to fog - not to mention having to fly through the storm that spawn a tornado in Alabama. After all that, though, it's great to be back in the same physical space as the vibrant Evergreen community!

Yesterday afternoon I gave a presentation on Authorities in Evergreen 2.0, covering (as the title suggests) Evergreen's support for authority records in the 2.0 release (as well as a peek at the future of Evergreen 2.2).

The session appeared to be well-received - yay! - and I tried recording it on my colleague Rick Scott's Sansa Clip+. Hopefully that worked out and I'll be able to update this post with the audio, so you can have the full-on audio and slide experience.

The presentation is available under the Creative Commons Attribution Share Alike license, in the hopes that others will be able to use it for training purposes, to extend and improve it, and generally help out with the adoption of Evergreen.

Evergreen's continuous integration servers - past, present, and future

2011-03-14T02:13:00-04:00

tldr version: the Evergreen project now has a continuous integration server and build farm and needs testcases to make the best use of that infrastructure to help us provide higher-quality releases in the future.

Evergreen buildbot - past

Back in November 2009, Evergreen developer Shawn Boyette launched the Evergreen buildbot - a continuous integration server that ran basic tests with every commit to the OpenSRF and Evergreen repositories and created nightly tarballs of the code. It was a promising start towards a system that would provide us with instant feedback about the state of our code - at least as much as we had tests for it. Unfortunately, the server ran for only a few months before disappearing when Shawn parted ways with Equinox in early 2010.

I always thought it was a shame we had lost this piece of the development infrastructure, but Equinox had offered accounts on a server for anyone in the Evergreen community interested in taking on the task of setting up a new continuous integration test server - and through the rest of 2010, nobody stepped up to take on that responsibility. Most of us were busy developing and testing Evergreen 2.0, I suspect. So, in January of 2011, when I had a bit of breathing room, I scoped out the current state of continuous integration frameworks and discovered that the buildbot project (no relation to Shawn's code, other than a serendipitous name) was written in Python and therefore was much more approachable to me than the other leading alternative, Hudson... so I wrote up my findings and a quick proposal.

Evergreen buildbot - present

A few days later I had the buildbot running on the server provided by Equinox, providing reports on the status of the OpenSRF builds on Ubuntu Lucid. After putting out a call to the community for build servers to provide coverage for Evergreen on different operating systems, I had enough responses to focus my mind on improving the Evergreen build. Evergreen now has the same standard layout for Perl modules that we adopted a year ago for OpenSRF, along with some basic sanity tests in Perl (such as are there any syntax errors in this module?).

So, thanks to Equinox for providing the testing server that serves as the mothership for controlling all of the build tests. And many thanks to the University of Prince Edward Island Robertson Library and the Georgia Public Library Service for providing build servers for the build farm. We now have Evergreen test coverage on the Ubuntu Lucid and Debian Squeeze Linux distributions (huzzah) and OpenSRF test coverage on Ubuntu Lucid. If you have an interest in getting test coverage for a different distribution and have a server to spare, please feel free to contact me and we can get your server added to the build farm.

Checking build status

You can check the current state of the code for various OpenSRF and Evergreen branches at any time by visiting the Evergreen buildbot page and choosing one of the menu options.

Recent builds provides a simple list of the success or failure of the 20 most recent builds.

Waterfall, on the other hand, provides the detailed status of every tested combination of Linux distribution and code branch.

Evergreen buildbot - future

We still have work to do to deliver on the promise of the buildbot. Most important, I think, is that a continuous integration server can only run the tests that it has been given - and we have not given it many tests.

It kills me that people discovered some fairly fundamental problems with the Evergreen 2.0 release (some recent examples include most identifier searches not working and limitations with Unicode in patron names). Now that we have a continuous integration server, we need a testing framework so that it becomes easy to add tests along the lines of "Import a set of sample bibliographic records, then run the following sets of searches (ISSN and ISBN with and without hyphens; EAN; UPC...) and ensure that the returned results match these expected results". It should be a human's job to set up that automated test once so that we're forever confident in the future that we're not screwing up those basic features, no matter what we change in our database schema or underlying code.

Now, there are very few people that can currently create that sort of a test. There might be none at the moment, in fact, because we need that previously mentioned testing framework to be sorted out and integrated into the buildbot.

However, in the short term we can create these testing scenarios so that humans can reproduce them during testing blitzes, until such time as we have the testing framework sorted out and can begin automating these tests.

Otherwise, I fear that we'll go into the Evergreen 2.1 alpha/beta/release candidate cycle and get reports from testing that indicate that all is well - but only because some of the more complex tasks haven't actually been attempted - and we'll find ourselves scrambling once again after the release to fix problems that become evident when sites actually start moving to the release.

Beyond tests, we need to teach it to create cleanly packaged tarballs on a regular basis - although that should arguably be nothing more than, or not much more than, the equivalent of running make package rather than pushing all kinds of specialized packaging logic into the buildbot itself.

Autotools wizards, your assistance would be greatly appreciated.

Spreading Evergreen buildbot knowledge

To ensure that our project can survive the loss of the current master build server (or me, for that matter!), I've been committing a password-sanitized copy of the buildbot configuration to the examples directory of the OpenSRF repository. In addition to reducing the dependency on one person and one server, it also gives anyone else interested in contributing to the Evergreen buildbot the ability to easily define a build master and build slaves in a local environment.

Evergreen 2.0.0: What it has (and does not have)

2011-02-05T15:08:00-05:00

Back in early 2010, I responded to the call for proposals for the OLA SuperConference with the following proposal for a session called Evergreen 2.0: What doesn't it have?:

The first release of the Evergreen library system in September 2006 brought circulation, cataloguing, reports, and a modern OPAC. Evergreen 2.0, expected in early 2011, promises deep support for acquisitions, serials, telephony, and more. The range of features will be highlighted and weaknesses exposed.

Talk about great timing! My talk was accepted and scheduled for February 3rd, and of course Evergreen 2.0.0 was released exactly one week before that! So not only was I able to accurately predict when Evergreen 2.0 would be available, I was actually able to deliver a presentation based on reality. I believe I provided a balanced look at Evergreen's current strengths and weaknesses, and as with my sessions in previous years at the OLA SuperConference, all of the seats in the room (25 or so) were filled. There were unfortunately a number of people who poked their heads in the door and, seeing the lack of available seats, moved on to some other presentation. So, interest in Evergreen remains strong amongst the Ontario crowd - and maybe next year I can swing a larger venue! I was also really fortunate to meet several people after the session who expressed interest in contributing to the Evergreen project in various ways; I'm always eager to welcome new members to the community of Evergreen contributors, so here's hoping that works out

If you were one of the people who couldn't get a seat, or you're just interested in catching up with the state of Evergreen at the 2.0.0 release, the presentation itself is available in HTML form (ahh Slidy). I have also made the ASCIIDOC source and screenshots for the presentation available in a Bazaar repository. The presentation is licensed under the Creative Commons-Attribution-ShareAlike license in the hope that others in the Evergreen community may find the material useful for learning and sharing with their own libraries, and may want to fill in some areas where I may have left gaps (feel free to fork the repository and send patches my way!). It would be great if we could collectively pull together a kick-butt presentation for Evergreen advocacy, and I would be delighted if my material served as a starting point for that effort.

Standard Social Sharing and Aggregation on the Go: Access 2010 presentation

2010-10-17T14:44:00-04:00

Earlier this week, I had the honour of speaking at the Access 2010 conference in Winnipeg, Manitoba. The title of my talk was rather unwieldy, but what it boiled down to was:

An environmental scan of how libraries are currently offering users of their services the ability to share their thoughts and to connect with one another around library activities
A brief overview of some relevant emerging standards for socially-enabled applications (Activity Streams, XHTML Friends Network (XFN), and the HTML5 browser geolocation API)
Some of my thoughts about how library software could adopt these standards to knit together experiences across library system boundaries, and outside of library systems altogether
Some findings from an initial implementation of one of these standards (Activity Streams) in the Evergreen library system

Here are the slides (OpenDocument, PDF) and the accompanying recording (OGG Vorbis, MP3). Thanks to Bill Denton for the use of his recorder for the audio!

One quick reflection is that, in the interest of using a familiar example, I think I focused too much on sharing and aggregating objects (such as reviews) between libraries and didn't make a good argument for the value of enabling connections between people based on their activities.

On avoiding accusations of forking a project

2010-09-29T02:16:00-04:00

Sometimes forking a project is necessary to reassert community control over a project that has become overly dominated by a single corporate rules: see OpenIndiana and LibreOffice for recent examples. And in the world of distributed version control systems, forking is viewed positively; it's a form of evolution, where experimental branches that lead to new features or a stabler system or better performance get grafted back onto the accepted authoritative branch.

Yet a negative connotation can also be associated with forking a project, particularly if the word is whispered behind closed doors as an accusation of the behaviour of one or more parties in the community. Particularly in a small community, where development resources for a project built on the principles of software freedom from the ground up are relatively scarce, the spectre of a development effort based on that project that is not publicly visible can be troubling and opens the door to the accusation: FORK! Organizations that have staked their customers' satisfaction, and their own reputation, on free software that they expected to see flourish as others joined in the development effort, fret and worry that they'll be left behind with just the base for another organization's project and no easy way to reconcile the two.

In the Evergreen community, we're fortunate that we're small enough that we should be able to avoid these concerns. The Evergreen "trunk" code repository has been hopping; just take a peek at the revision log to see the rather torrid pace of development. Some major features are taking shape, such as acquisitions with EDI support, first-class serials management, outbound telephony, and more - evident in the Evergreen 2.0 alpha 3 release that the development team put together today. This is not a minor release!

And yet, and yet... during the exciting KCLS migration live-blog, Lori Ayre felt it necessary to write:

The answer is that much of it is already in trunk, and if it isn't there now, it will be very soon. None of this work is being held back. There is no KCLS fork. This is all Evergreen and anyone who knows how to download from trunk will be able to get at this code in very short order.

Well, I know that everyone involved with the KCLS enhancements are good people, and that that it is certainly their intention to make any of the enhancements available, and there is no intention to fork Evergreen. I know! Ironically enough, however, due to the prior actions of proprietary companies such as Relais' "announce that we will open source our ILL product in 2008; freeze the market; announce in 2010 that maybe we'll have something by the end of 2010" strategy, the broader library community has become more skeptical and susceptible to disinformation and FUD. I can't imagine who would want to sow discontent among the community of a rapidly maturing ILS project, other than perhaps proprietary competitors who have forgotten how to compete on the merits of their product rather than negative marketing. (Just a guess, mind you!)

Still: until the code for any remaining enhancements is available under an open source license, the possibility that those whispering, Saruman-like voices could be right remains an actual possibility. My suggested remedy for the easiest way to dispel those concerns, now and in the future for any project (Evergreen or otherwise), is to simply develop in the open:

Create a public repository - SVN (Evergreen contributions), or Bazaar (Evergreen LaunchPad), or git (Gitorious), or what have you. Put a README in the top directory of the repository specifying that the contents are licensed under the "GPL v2 or later" or GPL-compatible license.
Announce the repository on the Evergreen development mailing list. If you tuck your repository in an obscure location and don't tell anybody about it, it might technically be open, but that's not really the spirit of openness. You're also depriving your effort of possible collaborators, and possibly duplicating effort if somebody else is working on the same feature.
Watch the rumours disappear and the fame, glory, and accolades roll in. (Oh, and don't forget to invite us to integrate the fruit of your labour into the core of Evergreen!)

Sure, there might be some material that you don't want to share: trademarked institutional logos or the like. But the bulk of what we collectively create should be able to be openly shared, not just when things are perfectly baked, but all the way through the process. Release early, release often, and keep the spooky whisperers at bay.

Responding to the Evergreen "research" article in Information Technology and Libraries

2010-09-20T02:25:00-04:00

Update 2010-09-28: Fixed link

The home page for ***Information Technology and Libraries*** states:

*Information Technology and Libraries* ( ITAL) (ISSN 0730-9295) is a refereed journal published quarterly by the Library and Information Technology Association (LITA), a division of the American Library Association.

The September 2010 issue of ITAL contained an article by Sharon Q. Yang and Melissa A. Hofmann called "The Next Generation Library Catalog: A Comparative Study of the OPACs of Koha, Evergreen, and Voyager". As an Evergreen developer, I wonder just how much refereeing happened before this article was published. Certainly I am biased, but there are a number of problems with the study from my perspective:

The article stated "The latest releases at the time of the study was Koha 3.0, Evergreen 2.0, WebVoyage 7.1." Grammatical problems with that sentence aside, the first alpha release of Evergreen 2.0 was created on August 23, 2010. For an article published in September 2010, I find it highly unlikely that the authors were able to find any running instances of this version of Evergreen on which to base their information. Which leads to a problem with the methodology:
The stated methodology in the article was

The OPACs used in this study included three examples from each system. They may have been product demos and live catalogs randomly chosen from the user list on the product websites. ... In case of discrepancies between product descriptions and reality, we gave precedence to reality over claims. In other words, even if the product documentation lists and describes a feature, this study does not include it if the feature is not in action either in the demo or live catalogs.

This sounds like a thorough, pragmatic approach. But the product versions associated with each of the chosen examples are not listed. So while the article mentions the latest releases of each product, the actual reported experience might be based on an outdated version of the product. In the case of Evergreen, one of the chosen examples is two major versions behind the actual current stable release of 1.6.1.2, and another of the chosen examples is one major version behind the current stable release. In addition, one of the desired features of modern OPACs is customizability: not just the ability to turn features on and off, but also the ability to change the user experience significantly as a small matter of programming. Depending on which example OPACs were chosen for each system, the features the authors were looking for might not have been turned on or exposed.
On the "Single Point of Entry for All Library Information" feature, the authors state:

While WebVoyage and Evergreen only display journal-holdings information in their OPACs, Koha links journal titles from its catalog to ProQuest’s Serials Solutions, thus leading users to full- text journals in the electronic databases.

As far as I can tell, however, this is not a special integration feature of Koha; it appears to just be the use of an 856 with a URL that points to a link resolver for a lookup of a given ISSN. While it is a reasonable cataloguing practice, any other library system should be capable of that; Evergreen certainly is. However, check out this link for an example of how one can make an OPAC work harder by bringing resolver results right into the OPAC display. I built an Evergreen service, called open-ils.resolver, for caching resolver requests for ISSNs and used that service as the basis of a developer tutorial for writing Evergreen services. The idea isn't new; Jonathan Rochkind wrote about doing this back in 2007. But having a caching server-side implementation freely available for your library system is relatively novel. We've been using it since the summer of 2009. If you use Evergreen, then you can add this feature to your system too; it is written up in the developer workshop and is licensed under the GPL v2 or later, but if there's interest I can add it to Evergreen's core.
On "Enriched Content", the authors found that Evergreen offered only cover art. Of course, enriched content depends heavily on the content supplier and the chosen item. Since launching in 2006, Evergreen has provided enriched content such as cover art, abstracts, author notes, reviews, and tables of contents from Syndetic (requiring a Syndetic subscription, of course). In addition, Evergreen has offered Google Books integration in the form of partial & full previews (if available) inline in the detail page since Evergreen 1.6.0.0, thanks to the initial efforts of Alexander O'Neill at the University of Prince Edward Island. And as of Evergreen 2.0, the default content provider for cover art and tables of content will be OpenLibrary. Here's an example from our catalogue that brings in enriched content including cover art and a book review from Syndetic and a Google Preview.
On "RSS Feeds", the authors make the bold statement: "Koha provides RSS feeds, while Evergreen and WebVoyage do not". In the case of Evergreen, that's a laughable statement, because significant parts of the OPAC are built on RSS feeds. For example, in any Evergreen system, click on the "Basic catalogue" link and you'll find that it is nothing more than an RSS feed with a simple search form. If you are using Internet Explorer or Firefox on an Evergreen site, you might notice the search source selector widget is highlighted; that's because Evergreen is an OpenSearch provider, so you can easily add an Evergreen site to your browser as a search source. The OpenSearch results, of course, are built on RSS/Atom just like the examples in the OpenSearch description document. The default format that a user's custom bookbags are exposed in is also an RSS feed. I suppose the authors didn't find an RSS feed icon lighting up in the search results in the dynamic Evergreen OPAC and made the assumption that no RSS feeds were provided. To address this gap, I have added the one line of JavaScript to the default OPAC skin that adds the Atom feed link necessary to make the RSS feed icon light up. Not that many humans actually use RSS feeds directly - but it will help make it easier to find for future feature comparison articles.
On "Relevancy", while Evergreen does not currently use circulation data or "popularity" to affect relevancy rankings, I would happily argue that the out-of-the-box relevancy ranking algorithm is good enough to keep relevancy as the default sort option, while the relevancy algorithm of our previous ILS was simply terrible. Combine that with your ability to customize the relevancy algorithm, and I think an argument could be made that, while "Relevancy has not worked well in OPACs", it works well in this one.

As the article mentioned the latest release of Evergreen was 2.0, let me show you a screenshot of the default OPAC in Evergreen 2.0 as of the upcoming alpha2 release. Notice a few things:

Those facets on the left are much closer to what the world has come to expect from a faceting interface. You get a narrowing effect on your current search results, rather than firing off a brand new search. There are pros and cons to this, but oh well.
Notice that the RSS feed icon is lit up in the URL bar. Yes, Virginia, Evergreen has RSS feeds for search results, amongst many other things.
The inline advanced search interface shows that Evergreen 2.0 offers an OR option, and clearly labels the relationships between the search terms.
The OpenSearch source has been added to the list of Firefox search sources in the top right box, just by clicking on the icon and selecting "Add Evergreen catalogue"

Evergreen 2.0 inline advanced search interface showing AND and OR options

Evergreen on FLOSS Weekly: the aftermath!

2010-09-01T02:50:00-04:00

Update 2010-09-28: pedantic XHTML fix

The recorded version of the Evergreen episode of the FLOSS Weekly show was released over the weekend. I'm happy to say that Lynn watched it without looking too pained at any given point, and the Evergreen project has already had several responses to our plea for assistance so far, particularly on the packaging front, which is fantastic! Just having one more skilled helping hand makes all the preparation for and stress about the show worth it.

Several points that amused me about the show as I glanced over Lynn's shoulder:

In Randal's introduction, he said that I "worked for Coffee|Code", full-stop. Aside to Leila Wallenius, the University Librarian of Laurentian University: no, there's nothing I need to tell you, I'm still a full-time employee at the University and I'm not planning on going anywhere! (That said, Coffee|Code Consulting is a registered sole proprietorship that provides small blocks of consulting services for Evergreen software in my spare time).
For the first half of the show, my affiliation was shown as the (misspelled) http://coffecode.net. So of course I immediately ran out and bought that domain.
Co-host Dan Lynch expressed a wish that his own show, Linux Outlaws, had a guest list like FLOSS Weekly. Oddly enough, some time ago when the subject of librarians and their fanatical devotion to open access to information came up on Linux Outlaws, I had submitted a feedback form on their site saying (essentially) "hey, if you want to talk to a Linux-loving free software-developing librarian some time, I'm around..." but I think that comment went into the ether.

If I ever do a video interview like this again, I'm going to try to:

Prop my laptop up on a couple of LCSH books or attach a separate web cam at a proper height so it doesn't appear that I'm looking down on the viewer.
Cut down on the "uhm"s and "ahh"s and stare a bit more robotically at the camera instead of rolling my eyes as I rack my brains to come up with an answer
Stop rambling and trust the interviewers to take the show in the direction that their audience will be interested in instead of trying to jam in points that I think are important or interesting
Ensure that my erstwhile partner in crime has a better Internet connection

Non-stop Evergreen, or "What I'm doing on my summer vacation"

2010-08-26T00:40:00-04:00

Last week, I started my summer vacation with a weekend at a friend's cottage. By Tuesday I was deeply engrossed in some Evergreen enhancement work for the International Institute of Social History. I'm building an authorities management user interface that properly exposes Evergreen's powerful authority support in the 2.0 release: browsing authority lists, editing authorities and having the updates ripple through to the bibliographic records with controlled fields, merging and deleting authorities... here's a screenshot of the interface in progress: |image0|. The numbers represent the number of bibliographic records linked to each authority record. These are still early days, but I think there are some cataloguers that are going to be pretty excited about this functionality when they get their hands on it.

This week, I'm on location in the Robertson Library at the University of Prince Edward Island doing some Evergreen consulting work for them. The good people at UPEI have put my family and I up in a nice cottage on the island, so I'm toiling away at improving Evergreen during the day while my family explores the island. Melissa Belvadi and Grant Johnson have put together a list of pain points that they would like me to address that happen to mesh nicely with general pain points that have come up over the years on the Evergreen mailing lists. My first priority has been to make working with spine labels a little less aggravating. I'm happy to say that after a day and a half, I've been able to teach the spine label editor how to (*gasp*) move up and down with the arrow keys and (*ooh-ahh*) insert and delete new lines and (*w00t*) have the spine label defaults come from library settings that only have to be set once instead of being individually set by each cataloguer. Oh, and I've added font size, font weight, and font family to those settings so that you can have 20 pt. bold Helvetica spine labels if you want them.

All of this code is being committed to Evergreen trunk as I hit functionality milestones; much of the authority work has made its way into the Evergreen 2.0 alpha release that was cut on Monday (although not yet announced officially). On Monday I also cut the OpenSRF 1.6.0-alpha release and uploaded a virtual image built on Debian Squeeze reflecting the OpenSRF/Evergreen alpha releases to http://evergreen-ils.org/~denials/Evergreen_trunk_2010_08_23.zip (note that it's 500 MB, and does not come with X installed, so it's primarily aimed at users that are already familiar with Evergreen and just want to see the new stuff without having to go through the entire install process).

I did take some time off of Evergreen development this afternoon, as I was honoured to be one of the two guests on the FLOSS Weekly podcast. Mike Rylander and I were there to discuss Evergreen with the hosts, Randal Schwartz and Dan Lynch. Unfortunately for Mike, me, and the audience, Mike's Skype connection kept dropping and I had to do the bulk of the talking. Despite missing the contributions from Mike's massive brain, I'm told that the show went well. So if you're interested in hearing a bit about Evergreen and why I do what I do, keep an eye open for the interview at http://twit.tv/floss132 - it should be edited and online by Friday, August 27th at the latest. I tried not to swear too often so they wouldn't have to do much editing work - heh.

Finally, somewhere in there I celebrated another birthday. Oh yeah! Older? Yes! Wiser? Probably not.

Classification scheme-aware call number sorting in Evergreen

2010-08-09T01:37:00-04:00

As a librarian who works at a library that primarily uses the Library of Congress classification scheme, I have been interested for a long time in teaching Evergreen to be aware of call number schemes other than Dewey. The problem, in a nutshell, is that Evergreen simply applies an alphabetical sort against the the uppercased version of the call number when generating call number browser displays - resulting in LC call numbers that sort incorrectly, like:

K 215 .E53 W37 1997
K 22 .U748 v.18

When the subject recently came up on the open-ils-general mailing list, I decided to follow up with some code. So, as of this weekend, Evergreen trunk now has a generalized infrastructure for generating sort keys for call numbers. The broad strokes of the current implementation are:

The classification scheme is set the level of the call number.
Classification schemes are defined in the asset.call_number_classification table with a pointer to a database function to call to generate a normalized sort key for the given call number.
Three classification schemes are available out of the box:
- Generic (the default) - a simple normalization approach that produces reasonable results in the absence of special rules for Cutters, etc
- Dewey (DDC) - a normalization routine taken from the Koha C4::ClassSortRoutine::Dewey Perl module
- Library of Congress (LC) - a normalization routine that simply wraps Bill Dueber's excellent Library::CallNumber::LC Perl module
and adding more classification schemes is just a matter of adding another row to the asset.call_number_classification table and the appropriate sortkey-generating database function.

Note that this is the first time, to my knowledge, that Koha code has been adopted directly by Evergreen. I included attribution for the copyright holders in both the Generic and Dewey normalization functions. I wrote the Generic implementation in Evergreen from scratch shortly after taking a look at Koha's approach, so in some corners my work would be considered a "derived work". Koha's Dewey normalization function was (somewhat surprisingly) the only open-source implementation that I could find for Dewey, so it made perfect sense to me to adopt that for use in Evergreen. Many thanks to Koha for their use of the GPL v2 or later licence!

There are still some limitations and low-hanging fruit that I hope to address in the near future:

Right now you can only manipulate classification schemes via SQL. The Holdings Maintenance dialogue needs to give cataloguers the ability to set the classification scheme for each call number, because I'm sure they don't want to drop down to the command line. This setting should probably be sticky during a given session, so that if they're processing a cart of government docs, they won't have to change the scheme from the default to CODOC for each item.
Speaking of defaults, each library needs to be able to define a default classification scheme - so your consortium can have a Dewey library and an LC library and a SUDOC library, and their preferences won't trample each other. This can just be a simple org-unit setting.
Following on Mike Rylander's advice, the asset.call_number_classification table should gain a new column that lists the field/subfield combinations used to find the appropriate call number (if any) for each scheme in a given bibliographic record. Then the Holdings Maintenance dialogue can offer the appropriate call number based on the classification scheme.

Authorities in Evergreen: an Amsterdam trip report

2010-07-19T18:52:00-04:00

As part of the informal partnership between the International Institute of Social History (IISH) and Project Conifer, I was pleased to be able to spend the last two weeks in Amsterdam, working side-by-side with one of the Institute's developers, Ole Kerpel, on augmenting the support for MARC21 authorities in Evergreen. To prepare for the work session, I had posted a blueprint for the authorities work on the Evergreen Launchpad instance and circulated the list of requirements we had been asked to address to the broader Evergreen development community. We were fortunate to have the attention of Mike Rylander on the proposal, who not only supplied suggestions for how to implement some of the items, but also committed significant code contributions to the effort that greatly assisted our efforts. Here is a summary of the goals we accomplished in the current development branch of Evergreen (targeted for the 2.0 release), followed by a list of the outstanding items and my finger-in-the-air estimate of how much more time it would take to accomplish each of the tasks:

Accomplishments

Controllable control numbers

While not, strictly speaking, a requirement for authority control in and of itself, the ability to ensure that the behaviour of the 001/003/035 fields all conformed to the MARC21 specifications was an important requirement for IISH. They plan to provide external access to their authority and bibliographic records, so making the official identifier fields linkable based on the underlying record ID was an important aspect of the work. We implemented this feature as an optional database-level trigger to ensure that the control numbers and control number identifiers are always perfectly in sync with the internal identifier of the particular system on which the records are stored.
Links

Where having Mike Rylander participate in your review process pays off, part one... Before I even arrived in Amsterdam, Mike implemented a tricky database trigger that tracks the links between a given bibliographic record and the authority records to which it links. The links are tracked at the database level, as well as directly in one or more 0 subfields in each field that is controlled by an authority record. Yes, a given field in a bibliographic record can be controlled by two authority records and it all works. Nice, Mike!
Syncs

Where having Mike Rylander participate in your review process pays off, part two... Mike also implemented the bulk of the logic for automatically updating bibliographic records that are linked to a given authority record when that authority record is modified. Yes, folks, when you add a death date to an authority record, it will automatically appear in the corresponding bib records.
Control an uncontrolled set of bibliographic records

You may have dealt with library systems in the past that use some sort of string matching to implement authority support. As noted above, Evergreen is not like that. However, this means that many of us, when migrating to Evergreen, have bibliographic records lacking the 0 subfields that are required for full authority support. Towards that end, I wrote a script that will walk through a set of bibliographic records, search for matching authority records for each controllable field in each bibliographic record, and add the required 0 subfields to the bibliographic records. It certainly won't be a fast solution, but you should only need to do it once, and it worked on the limited test cases that we had ready at hand.
Teach the MARC editor about authority records

The MARC editor knew all about fixed fields for bibliographic records, and provided a handy grid for editing those fields. However, it didn't even know how to recognize authority records, and presented a fixed field grid that was absolutely meaningless. I spent a chunk of time laboriously transcribing the fixed field rules from MARC documentation into the MARC editor and now the MARC editor presents a reasonable fixed field grid for your editing convenience.
Merge authority records

Something that often happens in a library is that two authority records are created that identify the same thing. Eventually somebody notices the problem and wants to merge the authority records together. Towards this end, I added a database-level stored procedure that supports the merging of authority records, such that the linked bibliographic records will automatically point to the winning authority record.
Authority browse interfaces

Where having Mike Rylander participate in your review process pays off, part the third... Mike also implemented basic browse interfaces that presents a series of authority records in MARCXML format matching your requested authority type (author, title, subject, topic) and the matching substring at the /opac/extras/browse and /opac/extras/startwith URL entry points. While still raw at this point, these can provide the basis for classic authority browse interfaces for those who desperately desire them.

Remaining to-do items

Note that any estimates are based on how long I think it would take me to implement, based on my own familiarity with MARC and Evergreen and all things Perl and JavaScript and PostgreSQL, and provided with the granularity of no less than one day. Actual implementation times may vary, of course; if related work items are worked on consecutively, then it is likely to take less time to achieve than if the items are tackled sporadically.

Add an authority in the flow

When you're working in the MARC Editor and you find that there is no match for an entry that you really think should be controlled, IISH wants to make it easy for a cataloguer to add an authority record for that entry. We thought that there might be two options that we would want to expose - a direct "create an authority record from this field" option that takes no further input, and a "create an authority record from this field and open it in another MARC editor to let me tweak it" option. Estimate: 2 person days
Highlight controlled fields

This is really a two-part problem. First, for uncontrolled fields, we want to teach the Validate button to offer the kind of automatic matching that the script does and add the required 0 subfield. Second, we want to highlight fields that are explicitly controlled by authority records with a subfield differently from fields that simply match an authority record, but which are not controlled by it. Estimate: 1 person day
Simplify authority record selection

This two-part requirement would mask many of the fields that are currently offered as options when you right-click on an uncontrolled subfield to display matching authority records. For example, it is a little weird to offer a "See from" heading to a cataloguer; we're trying to avoid adding new records with those headings, right? Heh. Second, we want to introduce the ability to invoke the authority browse list in this interface so that the cataloguer can see a given set of headings in context and select the heading to apply from there. Estimate: 2 person days
Delete authority record

There is currently no cataloguer-friendly way to delete authority records. We need to expose a list of authority records (probably reusing that browse list again) and make it possible for cataloguers to delete an authority record. When that record is deleted, all bibliographic records that link to it need to have their links removed - and ideally, the cataloguer would be able to tell how many bibliographic records link to that authority before the delete takes place. Estimate: 1 person day
Edit and merge authority records

Although the database-level support now exists for merging authority records, we need to expose a means for cataloguers to select the authority records that they want to edit or merge. This could just be a slightly evolved version of the "Delete" interface. Estimate: 1 person day
Expose authority records via SRU/Z39.50/crawlable interface

One of the goals of the IISH is to be able to share their authority records with other institutions. One of the standard methods is SRU + Z39.50 server support; we should be able to build on the SRU/Z39.50 server support for bibliographic records in Evergreen to provide a basic solution for authority records. Interest has also been expressed in having a crawlable implementation that would give the linked data crowd something to play with. Estimate: 2 person days for an SRU/Z39.50 server, 1 person day for a very basic crawlable linked-data implementation

In summary - hurray for Mike Rylander for helping us out to such an extent, and many thanks, again, to IISH for giving me an opportunity to focus on Evergreen development for an extended period of time, and to Laurentian University for supporting my efforts. I hope that between Ole and myself that it will be possible to finish the rest of these work items prior to the Evergreen 2.0 release. It has been exhilarating to see far Evergreen's authority support has come in less than a month, and given a little more time I suspect that Evergreen's authority support will be the envy of other library systems.

Got funds for enhancing Evergreen? Looking for places to spend it?

2010-07-02T19:37:00-04:00

As an Evergreen developer, I believe our project has a few significant gaps that projects like RSCEL might be able to help address for the overall good of the community by bringing in outside resources to the project. Or perhaps there are skills within the community that don't feel like they've been called on yet; when I say that we lack skills, I'm basing that on the lack of patches and offers of assistance that I've seen in these areas. I would be delighted to be proven wrong! Either way, I submit this for the community's consideration.

3rd party security audit: Before Conifer adopted Evergreen, I had hoped that we would be able to fund a security audit of the code by a trusted and competent 3rd party like OmniTI (from a previous life, I believe that OmniTI employs some of the best people in the business, thus the plug - but there are certainly other options out there). As developers, we try our best to avoid vulnerabilities, but as the recently disclosed vulnerability in open-ils.pcrud attests, we're not experts in security. An audit of the public-facing interfaces (the catalogue, feeds, etc) would be a great help to the project. I would expect a prioritized list of areas that need to be addressed, along with recommendations on how to address those problems (whether they be cross-site scripting, session fixation attacks, authentication encryption attacks, etc). Our community's process (or lack thereof) for reporting and addressing security vulnerabilities might be an appropriate subject for an audit as well.
Testing framework: Our project is woefully short on tests, either human-powered or automated, for determining the state of the code at any given point in a release cycle. Thus, we have put out release after release that either won't install cleanly, or won't upgrade successfully from a previous release. The trunk version of the code had a error that meant that Evergreen couldn't be compiled; that problem existed for three weeks before somebody noticed and fixed it. I'm not pointing fingers, here; if I did that, I wouldn't have enough fingers to point back at myself for all of the problems I've introduced that other people have had to fix. Johnathan Nightingale in The Most Important Thing … or How Mozilla Does Security and What You Can Steal provides a great overview of Mozilla's philosophy about and approach to testing. There is all kinds of goodness in this presentation, but one of the most interesting points is that "money can be exchanged for services" -- that is to say, if your existing development team doesn't have the skills or time to implement a testing infrastructure, there are companies that do have the ability to put together a test infrastructure for a given project. Once that infrastructure is in place, it tends to get extended and used by the existing development team because it makes their lives easier; they don't need to manually test a given code path every time in the future, or deal with regressions that aren't noticed until months in the future when the changes they were making are no longer fresh in their minds. It sometimes requires a culture change, though.
Continuous integration: Hand in hand with a testing framework is a continuous integration server that provides testing feedback on every commit to the Evergreen repository for a given set of branches. Even without a testing framework, it is possible to have a continuous integration server run through the process of installing all prerequisites, configuring the code, building and installing the code, and creating the database schema to at least determine whether the basics can be accomplished successfully to confirm that a branch is ready for release. This arguably also goes hand in hand with a team's process for addressing a security vulnerability: if you have a continuous integration server that can tell you if a given fix does not introduce basic build and install errors, then you can get a new release out with much more confidence that you're not going to be encouraging your users to jump to a broken package. Note that Equinox ran a continuous integration server for OpenSRF and Evergreen trunk for a while, but that was killed and replaced by a call for volunteers to build a new continuous integration service (I can't find a more to-the-point call for volunteers, so perhaps it just hasn't been advertised widely enough - or again, perhaps we lack the skills in the community to get a standard CI service like Hudson running.)
Packaging: To decrease the difficulty of installing and configuring Evergreen, we need more investment in packaging Evergreen and all of its unpackaged dependencies. The idea is that a user should be able to run "aptitude install evergreen" or "yum install evergreen" and have the entire system installed and configured, and then run "aptitude upgrade" or "yum upgrade" to have newer versions installed. Right now the process is still rather onerous and requires a great deal of manual effort, although it has improved significantly since the early days of 2007. Again, this requires a particular set of skills that the Evergreen community does not appear to possess in depth: autoconf, automake, APT and RPM packaging - and perhaps some redesign of elements like skins to make local customizations easier to incorporate and keep up to date. This would be a natural complement to a continuous integration service, but much of the effort could also be done on its own.

Random useful Evergreen database queries

2010-07-02T16:20:00-04:00

Occasionally I drop down to the database level to generate some reporting information. You could probably get the same information through the reporter but I like the precision of SQL. Here are a couple of queries that I've put together recently.

List titles for periodicals published by "Human Kinetics" with subscriptions owned by library ID "OSUL"

SELECT rsr.id, rsr.title   FROM metabib.full_rec mfr    INNER JOIN metabib.rec_descriptor mrd ON mfr.record = mrd.record    INNER JOIN asset.call_number acn ON acn.record = mrd.record    INNER JOIN reporter.super_simple_record rsr ON rsr.id = mrd.record    INNER JOIN actor.org_unit aou ON aou.id = acn.owning_lib  WHERE mfr.tag = '260'     AND mfr.subfield = 'b'    AND mfr.value ilike 'Human Kinetics%'    AND mrd.bib_level = 's'    AND aou.shortname = 'OSUL';

Strip out URLs for an online resource to which we no longer subscribe

Occasionally we drop subscriptions to an online resource that we happened to catalogue with an inline 856 field. Our new approach relies on just-in-time results from our link resolver to display accurate access to online resources (or at least consistent representations of what we have access to!), but our legacy records placed all of that information directly in the 856 field in the corresponding bibliographic record. The PostgreSQL regexp_replace() function lets you use regular expressions to match subsets of the MARC record and replace it with... well... nothing, in this case.

As we want to subsequently reingest the MARC records, and we're not running Evergreen trunk yet in which a reingest will automatically be triggered by an update to the biblio.record_entry table, I first push the list of affected IDs into a scratch table. This also lets me put limits on the MARC records that I'm going to touch, so that I don't inadvertently destroy content in another library's set of bibliographic records.

CREATE TABLE scratchpad.urls_to_delete (id BIGINT);INSERT INTO scratchpad.urls_to_delete   SELECT acn.record    FROM asset.uri au      INNER JOIN asset.uri_call_number_map aucnm ON au.id = aucnm.uri      INNER JOIN asset.call_number acn ON aucnm.call_number = acn.id      INNER JOIN actor.org_unit aou ON acn.owning_lib = aou.id    WHERE au.href ILIKE '%/search.ebscohost.com/direct.asp?db=rch%'      AND aou.shortname = 'OSUL';BEGIN; UPDATE biblio.record_entry  SET marc = regexp_replace(    marc,    E'<datafield tag="856" ind1="4" ind2="0"><subfield code="z">Available online from Ebsco.*?search.ebscohost.com/direct.asp\\?db=rch.*?</datafield>',    ''  )  WHERE id IN (SELECT id FROM scratchpad.urls_to_delete);

Note that the UPDATE statement is preceded by a BEGIN statement so that we can check our results and issue a ROLLBACK if we inadvertently changed too much, or created mangled records. Once you check your work with a SELECT statement or two, you can issue a COMMIT statement to make the changes take effect.

OpenSRF article in code4lib Journal has been published

2010-06-22T18:44:00-04:00

Many of you have undoubtedly seen previous drafts of this article as I worked on it over the past three or four months, but I'm pleased to say that the Easing Gently into OpenSRF article has now been officially published by the code4lib Journal. The goal of the article is to introduce the OpenSRF infrastructure for building applications on a scale-out architecture -- which is a high-faluting mouthful -- using a 10-line Perl module that implements a standalone OpenSRF service as the entry point. Along the way, the article covers a little bit of the Evergreen-specific functionality that is built on top of OpenSRF; hopefully enough to act as a teaser for follow-on articles in the future. My naked desire is to get more development talent to join us at the OpenSRF + Evergreen tables. The buffet is rich and the food (and available tasks) are plentiful!

I would be remiss if I did not profusely thank my editors, Jonathan Rochkind and Gabriel Farrell, for their probing questions, requests for more content and examples, and suggestions. They helped shape a much more comprehensive and useful article than I would have produced on my own.

Building more informative record displays in Evergreen with BibTemplate

2010-04-23T11:14:00-04:00

Update: 2011-04-24 Just noticed the link to the Laurentian detail file was broken - that's what I get for posting early in the morning under the influence of a cold, eh? All fixed up now, though.

This is a quick link to the updated version of the presentation (OpenOffice.org) (PDF) I'll be giving in a few hours (no, for real this time!).

Also of interest is the customized rdetail_summary.xml file used in the Laurentian University catalogue - which, with one minor change to the ISSN display (you don't really want to be displaying electronic holdings for Laurentian in your catalgoue, do you?) should be a drop-in replacement for any Evergreen 1.6 site.

Evergreen self-serve password reset interface coming in 1.6.1.0

2010-04-22T19:28:00-04:00

Update: 2010-04-22 16:24:56: Evidently, "in a few minutes" means tomorrow morning... avid Coffee|Code readers get the early scoop.

I'm going to give a lightning talk in a few minutes about the self-serve password reset mechanism that I added to Evergreen last month, that should see the light of day in the Evergreen 1.6.1.0 release in May 2010. Here's the presentation in OpenOffice.org Impress format

Setting up secure self-check connections using SIP tunneled through SSH

2010-04-16T19:17:00-04:00

Updated 2010-04-27: Fix corrupted characters introduced by copying from my GroupWise client. Thanks to Joe Atzberger for pointing that out.

I set up a secure SIP connection from our self-check machine to our Evergreen server located about 450km away, and thought I would put together a quick blog post on how things are working in production with SIP in Conifer. It seems a lot of sites run SIP without a secured connection, based on how our self-check sales rep and technical support person talked to me on the phone as though they were talking to someone with two heads when I mentioned my concerns about security - and they had no advice to offer on setting up an encrypted connection. So I guess the subject doesn't come up too often.

That doesn't excuse us as proper systems librarians from protecting as much patron information from exposure as possible. So here's how we do things at Laurentian University - some hostnames / IP addresses changed to protect the innocent:

The SIP server runs on one of our Evergreen server boxes; let's call it carbon.example.com. carbon itself has no direct access to or from the Internet.
carbon has been set up with an iptables rule allowing access via port 6001 from starburst.example.com. starburst lives out in the demilitarized zone of our ISP.
starburst has been set up to allow access via port 22 from two specific addresses at our library - no VPN connection required. We're keeping this as locked down as possible, hence the source IP address restriction. We opted for no VPN connection because most VPN clients require manual steps to authenticate, and we need the self-check to make the connection automatically when it boots up. Don't worry, we'll get to the encryption part.
From the self-check machine, we set up port-forwarding of carbon:6001 to localhost:6001 via the sipuser user on starburst. I have set up a hostname called "sip.example.com" that points at starburst.example.com; our ISP sysadmin has added a local user on starburst named "sipuser". We have then set up the SSH authorized_keys file so that SSH logins can't actually log in, and in fact the only thing they can do is forward port 6001 on carbon.

In /home/sipuser/.ssh/authorized_keys, each entry should therefore begin with:
```
command="/bin/false",no-port-forwarding,no-agent-forwarding,permitopen="carbon:6001" <key-type> <key> <name>
```
1. On the self-check machine, I used ssh-keygen to generate an SSH key and then appended the public key to /home/sipuser/.ssh/authorized_keys on starburst to enable logins without using the UNIX password.
2. On the self-check machine, the SSH command looks like:
```
ssh -f -N -L 6001:carbon.example.com:6001 sipuser@sip.example.com
```
3. Our self-check machine is running Windows Vista inside, so I've actually implemented it using Cygwin's "run" command in a shortcut and dropped it into the user's Start folder so that it automatically sets up the connection at startup time. The shortcut command is:
```
C:\cygwin\bin\run.exe -p /bin ssh -f -N -L 6001:carbon.example.com:6001 sipuser@sip.example.com
```
We're using SIP2 over raw sockets to communicate. We found that we had to supply the SIP username and password in the 3M self-check software. Apparently authentication is unnecessary for Unicorn's SIP implementation, and also apparently no library has ever been concerned about SIP2 being a clear-text protocol before!
And all of that has worked exactly once so far, starting from a cold boot. I'm going to be giving it a bunch of tests tomorrow, but I'm very excited to have an end-to-end encrypted connection working out of the box.

Well - that was the substance of the email I wrote four months ago. Since then, the self-check has been turned off every night and has connected flawlessly every morning - with the exception of one weekend when we brought down Evergreen for system maintenance and someone *cough our vendor cough* forgot to start the SIP server again. I'm happy with the results, and it's really not that complicated. If your library uses self-check machines and runs SIP over the network in clear-text, isn't it time your library beefed up its security?

Adjusting relevancy rankings in Evergreen 1.6, some explorations

2010-03-17T21:13:00-04:00

Update: 2010-03-18 - I just realized that now that I have a separate keyword index for titles, I can assign a stronger boost in general to keywords that appear in the title for a keyword search; see update: index weight below.

One of my colleagues just asked me:

So, how does relevancy ranking work in Evergreen, anyway?

I've been poking around in the area recently, as one of our users complained about the relevance of some results with a basic keyword search, so I thought I would throw my thoughts out there. It might give other people a good jumping off point, and it provides a bit more of an answer to questions like these on the Evergreen mailing lists. There are a number of factors, but cover density plays a significant role - how often the terms you're looking for appear within the target index, where index = keyword, author, title, subject, or series (at least, those are the indexes that Evergreen supplies you with out of the box). Then there are a number of tweakable boosts that appear in the search.relevance_ranking table:

full_match: for an exact match of the terms you're looking for, from beginning to end, in the target index
first_word: for a match of the first search term with the first term in the target index
word_order: for a match between the order of the search terms and the order of the terms in the target index

The problem with searching the out of the box "keyword" index is that there's no way of boosting the ranking for terms appearing in, say, the title or subject, because out of the box there's just one keyword|keyword index. For a keyword search, you can't tell Evergreen that terms that appear in the title should be more relevant than terms that appear in something like the content notes. In comparison, the title index is actually composed of a number of separate indexes: title|proper, title|uniform, title|alternative, title|translated, etc, that collectively form the title index. You can see this in the config.metabib_field table.

Given some relatively horrible results for a keyword search like "programming languages" that returns Regular expression recipes for Windows developers as the most relevant hit (are you kidding me? No, it's because "Programming languages" appears in the subjects about 10 times... sigh), on our test server I added a keyword|title index that is identical to the title|proper index, and then added some entries to the search.relevance_adjustment table to modify the relevancy ranking accordingly, as follows:

-- Clone the title|proper index to create a keyword|title index-- 6 = the title|proper indexINSERT INTO config.metabib_field (field_class, name, xpath, weight, format, search_field, facet_field)  SELECT 'keyword', 'title', xpath, weight, format, search_field, facet_field    FROM config.metabib_field    WHERE id = 6;-- Populate the keyword|title index with a set of index entries cloned-- from the metabib.title_field_entry table;-- 6 = the title|proper indexINSERT INTO metabib.keyword_field_entry (source, field, value)  SELECT source, 17, value    FROM metabib.title_field_entry    WHERE field = 6;-- Bump the relevance when the first search term appears first in the title in a keyword search-- 17 = our new keyword|title indexINSERT INTO search.relevance_adjustment (active, field, bump_type, multiplier)   VALUES (true, 17, 'first_word', 5);

It feels dirty, because we're creating such a massively duplicated set of rows. But it works... at least the first_word relevance adjustment works. When I tried using a multiplier of 1000 for the word_order relevance adjustment, it did not affect the search results in the least. Perhaps there's a bug there?

In any case, by combining some of the findings of this post with my previous post on adding more granular indexes, perhaps this will help people get deeper into customizing the search experience for their Evergreen installations.

` <>`__Update: adjusting search weight of terms in title in general: So, now that we have the keyword|title index, we can boost the relevancy ranking for records in which the search terms appear in the keyword|title index rather than the general keyword|keyword index. Here's how to shake things up:

-- Boost the relevance for search terms appearing in the title in general-- 17 = our new keyword|title indexUPDATE config.metabib_field  SET weight = 10  WHERE id = 17;

Some quick testing suggests that a weight of 10 works reasonably well... but that is obviously going to be subject to further testing and tweaking. But hey: we have the ability to tweak now! Yay!

More granular identifier indexes for your Evergreen SRU / Z39.50 servers

2010-03-10T04:27:00-05:00

In June of 2009 I was moaning about how “Evergreen, by default, has no identifier index for limiting searches by ISBN / ISSN / LCCN / OCLCnum” and that “if [fixing this problem] requires work from me, it will probably be 2010 before any of it happens”. Due to some of the tools our consortium relies on, we really needed a solution for identifier searches in Z39.50 that was better than just a general keyword search: we were returning too many false positives that cause extra work and frustration for everyone.

Well, here it is, 2010, and as of today Conifer's Evergreen server now has a very handy identifier index. Most of the required pieces were already there, in one form or another, but they all needed to be brought together. This blog post is going to try to do that (and serve as documentation for my ever-decaying brain, too). At the time of this post, we're running a 1.6.0.4-ish Evergreen system; you'll need to be running 1.6.0.4 to get ISSN searching to work properly, too.

First, we need to create the identifier index. Evergreen comes with the following indexes out of the box:

author
title
series
subject
keyword

Pretty standard. With the exception of keyword, each of these indexes is composed of more granular indexes; for example, the title index is composed of the following specific indexes, with the XML format that the MARCXML is converted to and then the XPath expression that extracts the text from the pertinent XML format:

abbreviated - MODS32 - //mods32:mods/mods32:titleInfo[mods32:title and (@type='abbreviated')]
translated - MODS32 - //mods32:mods/mods32:titleInfo[mods32:title and (@type='translated')]
alternative - MODS32 - //mods32:mods/mods32:titleInfo[mods32:title and (@type='alternative')]
uniform - MODS32 - //mods32:mods/mods32:titleInfo[mods32:title and (@type='uniform')]
proper - MODS32 - //mods32:mods/mods32:titleInfo[mods32:title and (@type='proper')]

Aside: You can search against these more granular indexes in the Evergreen OPAC, by the way, by appending the granular index name to the index class name with a | as a delimiter. For example, a search query of title|uniform: canada will search only the uniform titles for the term "canada". Okay, sorry for that detour, but I bet you weren't aware of that - we haven't done a good job of exposing some of the magic that has been there for a long time in Evergreen in the OPAC interface.

Back to understanding the configuration - as you can see above, the conversion to MODS does the heavy lifting in pulling out the fields of interest to us from the MARCXML. The full set of indexed fields and their definitions is visible in the database via the query:

SELECT * FROM config.metabib_field;

For our purposes, we're interested in pulling the raw 010 (LCCN), 020 (ISBN), and 022 (ISSN) a subfields directly from the MARCXML source. Our first step is to add an entry to the config.metabib_field table defining our new index. We'll create a new granular index under the "keyword" index class and call it "identifier", because that's what it is, right? That's as easy as:

INSERT INTO config.metabib_field (field_class, name, xpath, weight, format, search_field, facet_field)  VALUES ('keyword', 'identifier',     '//marcxml:datafield[@tag="010" or @tag="020" or @tag="022"]/marcxml:subfield[@code="a"]',     1, 'marcxml', true, false);

Next, we need to restart the open-ils.storage and open-ils.ingest services to make them aware of this new entry. Go ahead, I'll wait while you run osrf_ctl.sh -a restart_perl or use opensrf-perl.pl to restart the services individually. Done? Good.

We have to make up for lost time, now, as all of the bibliographic records in your system didn't have this definition in place when they were first ingested. The easiest thing to do is to just pull the pertinent data directly from the metabib.full_rec view (which is a shredded version of the source MARCXML from your bibliographic records, with one tag/subfield value per row. Ergo:

-- Get the ID from the row that you just inserted for the new index;-- we'll use this in the INSERT statementSELECT id   FROM config.metabib_field  WHERE field_class = 'keyword' AND name = 'identifier';-- Let's say the ID was 18; we'll use that to identify the index in the SELECT statementINSERT INTO metabib.keyword_field_entry (field, source, value)  SELECT 18, record, agg_text(value)   FROM metabib.full_rec  WHERE tag IN ('010', '020', '022')  AND subfield = 'a'  GROUP BY 1, 2;

All right! Now you can run some test searches in the OPAC for ISSNs, ISBNs, and LCCNs in your OPAC using the keyword|identifier: some_identifier prefix. Cool. So that's part one, mostly lifted from the "magic spell" in the Evergreen wiki.

Part two is configuring SRU to use the new identifier index. The bulk of the Evergreen SRU implementation is contained in the Perl module OpenILS::WWW::SuperCat.pm (located in your install directory in /openils/lib/perl5/OpenILS/Application/SuperCat.pm). Get out your patch tool or open up the Perl module in a text editor, we're going to make a few changes. The pertinent diff follows:

--- old/OpenILS/WWW/SuperCat.pm        2010-03-09 17:26:20.000000000 -0500+++ new/OpenILS/WWW/SuperCat.pm     2010-03-10 00:11:58.000000000 -0500@@ -1410,6 +1410,7 @@     'bib.titlealternative'  => 'title',     'bib.titleseries'       => 'series',     'eg.series'             => 'title',+    'eg.identifier'             => 'keyword|identifier',      # Author/Name class:     'eg.author'             => 'author',@@ -1438,7 +1439,7 @@     'srw.serverchoice'      => 'keyword',      # Identifiers:-    'dc.identifier'         => 'keyword',+    'dc.identifier'         => 'keyword|identifier',      # Dates:     'bib.dateissued'        => undef,@@ -1497,6 +1498,7 @@                        subject         => ['subject'],                        keyword         => ['keyword'],                        series          => ['series'],+                       identifier      => ['keyword|identifier'],                },                dc => {                        title           => ['title'],@@ -1504,7 +1506,7 @@                        contributor     => ['author'],                        publisher       => ['keyword'],                        subject         => ['subject'],-                       identifier      => ['keyword'],+                       identifier      => ['keyword|identifier'],                        type            => [undef],                        format          => [undef],                        language        => ['lang'],

Essentially, we've defined a new qualifier (eg.identifier) and pointed it and the dc.identifier indexes at the new, more specific keyword|identifier index. Once the updated file is in place, reload your Apache configuration (/etc/init.d/apache reload) and SRU requests using those qualifiers will now point at the identifier index. FABULOUS.

Our last step is to teach our simple2zoom-based Z39.50 configuration about the new index by mapping the corresponding BIB-1 attributes to the new eg.identifier qualifier, like so:

<database name="FOOBAR">     <zurl>http://localhost/opac/extras/sru/FOOBAR/holdings</zurl>     <option name="sru">get</option>     <charset>marc-8</charset>     <search>       <querytype>cql</querytype>       <map use="4"><index>eg.title</index></map>       <map use="7"><index>eg.identifier</index></map>       <map use="8"><index>eg.identifier</index></map>       <map use="9"><index>eg.identifier</index></map>       <map use="21"><index>eg.subject</index></map>       <map use="1003"><index>eg.creator</index></map>       <map use="1018"><index>eg.publisher</index></map>       <map use="1035"><index>eg.keyword</index></map>       <map use="1016"><index>eg.keyword</index></map>     </search>   </database>

Kill your simple2zoom processes and restart simple2zoom and you should be in heaven - farewell, false positive matches! Oh, and about that SFX target parser for Evergreen; now you can remove all of the gimmickry around exact searches and worrying about ISSNs that contain an 'X' and just point at the identifier index. For example:

if (defined($ISSN)) {    $searchString .= "keyword|identifier: $ISSN";  }   elsif (defined($ISBN)) {    $ISBN =~ s/-//g; # Most of our ISBNs are normalized to no hyphens    $searchString .= "keyword|identifier: $ISBN";  }

Things still aren't perfect in Evergreen identifier-land: we still need to do some work to normalize hyphenation of our ISBNs, for example, and ensure we have 10-digit & 13-digit ISBN equivalents. But we're a lot closer to perfection now - and with the work that Mike Rylander is doing in trunk, normalization of that kind should be relatively straightforward to implement on both the indexing and query-parsing side.

Evergreen 1.6: Z39.50 target servers for academics

2010-03-05T02:21:00-05:00

UPDATE 2010-03-05 I just backported Warren's patch for sorting Z39.50 servers to rel_1_6_0 (it counts as a bug fix), so expect to see it in the Evergreen 1.6.0.4 release. Yay!

In Evergreen 1.6, Z39.50 target server configuration (for copy-cataloguing targets) moves into the database. This makes it pretty easy for sites to share their Z39.50 target servers with one another.

I recently added a number of target servers to our configuration, and thought that other academic Evergreen sites might be interested in our set (because we're primarily pointing at other academic libraries) - particularly if they haven't added many of their own yet. You can find a PostgreSQL dump of our current configuration in the ILS-Contrib repository at conifer/branches/rel_1_6_0/tools/config/config_z3950.sql.

I generated this dump of the data using the following command:

pg_dump --data-only --table config.z3950_source --table config.z3950_attr evergreen > config_z3950.sql

(where evergreen is the name of the Evergreen database, naturally!). You should be able to load the data into a clean Evergreen database via psql inside a transaction as follows:

BEGIN;\i config_z3950.sqlCOMMIT;

If you already have other Z39.50 servers in your database configuration, you might need to adjust the ID values in the config.z3950_attr rows. Just prepending a 1 to them ought to do the trick, unless you have masses of Z39.50 servers. In which case, you probably don't need ours!

Oh, one final tip: when you start adding a bunch of Z39.50 target servers, you'll notice that the order in the Import from Z39.50 screen is random; it will drive your cataloguers crazy. Quite some time ago, Warren Layton from Natural Resources Canada submitted a patch for sorting the servers alphabetically that has been committed to trunk and the 1.6 branch, but which hasn't made its way into a 1.6.0 release yet. If, at the time you're reading this, you're on a 1.6 release but your list isn't sorted, get the file and drop it into /openils/var/web/xul/server/cat/z3950.js - your cataloguers will thank you. You, in turn, can thank Warren.

Fun with Evergreen and SQL: representative record samples

2010-03-04T04:35:00-05:00

Let's pretend your national library asked you to submit a set of records with holdings representing all of the various formats in your library system. Let's also pretend that you're really lucky and you're running Evergreen. Here's what you would do to get one example of each combination of item type, item form, bibliographic level, literary form, cataloguing form, and video recording format into a scratch table for a given library (ID = 103) in your system:

CREATE TABLE scratchpad.osul_export (record BIGINT); INSERT INTO scratchpad.osul_export   SELECT record FROM (    SELECT DISTINCT ON (mrd.item_type, mrd.item_form, mrd.bib_level, mrd.lit_form, mrd.cat_form, mrd.vr_format)         mrd.record, mrd.item_type, mrd.item_form, mrd.bib_level, mrd.lit_form, mrd.cat_form, mrd.vr_format     FROM biblio.record_entry bre       INNER JOIN asset.call_number acn ON acn.record = bre.id       INNER JOIN asset.copy ac ON ac.call_number = acn.id      INNER JOIN metabib.rec_descriptor mrd ON mrd.record = bre.id     WHERE bre.deleted IS FALSE AND acn.deleted IS FALSE AND ac.deleted IS FALSE AND acn.owning_lib = 103    ORDER BY mrd.item_type, mrd.item_form, mrd.bib_level, mrd.lit_form, mrd.cat_form, mrd.vr_format  ) AS formats  ORDER BY record;

And then, because you were asked to provide a total of 2000 records for this representative sample, you might fill up the remaining 1800 records as follows:

INSERT INTO scratchpad.osul_export  SELECT bre.id   FROM biblio.record_entry bre    INNER JOIN asset.call_number acn ON acn.record = bre.id    INNER JOIN asset.copy ac ON ac.call_number = acn.id    INNER JOIN reporter.super_simple_record rsr ON rsr.id = bre.id  WHERE bre.deleted IS FALSE AND acn.deleted IS FALSE AND ac.deleted IS FALSE AND acn.owning_lib = 103    AND bre.id NOT IN (      SELECT record        FROM scratchpad.osul_export    ) AND substring(bre.id::text from (length(bre.id::text)) for 1)::int = 8    AND bre.id % 17 = 0  ORDER BY rsr.author DESC  LIMIT 1800;

... which, of course, gives you the records with a record ID ending in '8' and (to whittle it down further) records where record ID modulo 17 is 0 - and finally, just the first 1800 records ordered by author name in descending order.

All of this will give you 2000 record IDs in scratchpad.osul_export that you can then extract into a text file and feed into Evergreen's Open-ILS/src/support-scripts/marc_export script to dump the MARC records with holdings in the 852 field from your system. Beautiful, eh?

Wrap-up: Evergreen developer workshop at OLA SuperConference 2010

2010-03-01T23:48:00-05:00

To summarize the results of the Evergreen developer workshop at the OLA SuperConference, I think things went pretty well. The primary focus this time was on the nuts and bolts of building a minimal OpenSRF service and I saw the lights go on in a number of faces as I broke it down. Things got a little hand-wavy in the final half-hour when I leapt into the Dojo JavaScript widgets that have been custom-built for Evergreen interfaces such as the administration and acquisitions functionality. In retrospect, the first half of the session deserves its own half-day, and the second half of the session similarly deserves its own half-day, and something had to give this time around.

I focused on getting hands-on, and for the most part I think it was a success - although even though I had packaged up a virtual image, we still ran into some problems getting it running on some laptops. And due to some communications problems, about half of the participants weren't ready for a hands-on session (read: no laptop, or a netbook that couldn't handle a virtual image). I have real hopes that we'll see some contributions in the next few months from some of the participants, which would be a huge win for Evergreen.

Without any further ado, here are the materials for the session (all of which are made available to you under a Creative Commons By Attribution-Share-Alike Canada 2.5 license):

Slides: (OpenOffice.org Impress) (PDF)
Workshop tutorial: (HTML) (PDF)
JavaScript and Perl files: OLA_2010_files.zip

OLA SuperConference 2010 - Evergreen developer workshop update

2010-02-23T03:31:00-05:00

Hey all - if you're coming to the Evergreen developer workshop at the OLA SuperConference 2010, there's one thing you can do to prepare. As this is a hands-on workshop (how else can you learn!), I'm hoping many or most of you will have laptops. And ideally, your laptop will have a current version of VirtualBox or VMWare installed on it, as I plan to bring a virtual image for the attendees to use.

I'm hoping the virtual image will sidestep the configuration hassles people seem to run into with installing OpenSRF / Evergreen natively and enable us to just focus on the code and architecture during the limited time we will have together. *sniff*

Introduction to SQL for Evergreen administrators

2010-02-20T10:16:00-05:00

I've been a bit quiet for the last two weeks, ostensibly because I've been on vacation. However, much of the time I was preparing to deliver a two-day introduction to SQL for Evergreen to the good people at Bibliomation. On Wednesday I flew down to Middlebury, CT - Bibliomation central - and on Thursday and Friday of this week, I led nine great people* through the ropes of SQL: from understanding the basics of how SQL databases operate all the way through inner and outer joins and set operations. I also walked though a set of SQL queries I had developed to help Bibliomation with the recurring reports they need to provide to their member libraries.

Other than an episode of grievous illness on Thursday night that led to zero food intake and very little sleep on my part, I think things went well; it was gratifying to see lights go on in people's heads as we worked through hands-on exercises and tackled the same problem with different (but valid) approaches, and (with a few minor adjustments) the canned SQL queries seemed to meet their requirements. The feedback I received was positive, and by the time I left I had the sense that they had significantly increased their confidence in their ability to understand the queries I had written for them and to create their own queries. The major remaining learning curve is understanding how all of the pieces of the Evergreen database schema fit together, and through the two days I had tried to bring together pieces like the user tables, the library tables, the circulation and holds tables, and the record / call number / copy tables to help them find the right tables to bring together to meet their needs.

I am happy to say that Bibliomation agreed to my condition that I be allowed to release the materials for this workshop under a CC-BY-SA license, so others can take these materials, adapt or enhance them, and deliver similar training to other Evergreen libraries (as long as the attribution remains and the materials are offered under the same share-alike license). Many thanks to Bibliomation for this contribution to the community! Without further ado, here are the materials:

Reference documentation (25-ish pages introducing SQL, ending with the canned SQL queries Bibliomation required): (HTML) (PDF)
Presentation: (OpenOffice.org Impress) (PDF)

* Including people like Kate Sheehan, Melissa Lefebvre, and Benjamin Shum who I previously only knew from the Evergreen mailing lists and other online presences

Evergreen developer workshop at OLA SuperConference, February 24, 2010

2010-01-28T20:45:00-05:00

Given the the awards that Project Conifer will be presented with at the OLA SuperConference, this might be a good opportunity to mention the Customizing and Extending Evergreen: a guide for geeks workshop that I'll be giving on Wednesday, February 24th. The workshop description promises:

Together, we will break OpenSRF down into its constituent parts (JSON, XMPP) and put it back together again in Perl, Python, and JavaScript so that you can define new services, or integrate existing services into other applications and websites. You will learn how PostgreSQL underpins Evergreen's search indices and how to access and modify any data in the system with permission-based storage APIs; plus we will build new interfaces with the Dojo JavaScript framework Evergreen extensions.

That's a hefty agenda for a half-day workshop, but I promise to do my best to deliver on that promise...

Conifer garners two awards from the Ontario Library Association

2010-01-28T20:28:00-05:00

The Ontario Library Association (OLA) announced its 2010 OLA and OLA Divisional Award winners today, and to my great surprise Project Conifer was named the winner of two awards:

The Ontario College and University Library Association (OCULA) Special Achievement Award
The Ontario Library Information Technology Association (OLITA) Award for Technical Innovation

All of the libraries in the Project Conifer consortium have been listed in the award announcement, and for good reason: everyone using the Evergreen library system since May 2009 has contributed to the project, be it by bug reports, or suggestions for enhancement, or sharing approaches to solving problems, or contributing code. This has been a real team effort, and make no mistake: the road has been bumpy at times, and there's a lot of road left to travel before we get to our destination. Dan furtively glances at the open list of requested enhancements on the Conifer ticket system and gets back to finishing off this blog post... The continuing support of staff and librarians across the consortium has been critical to keeping things moving in a very positive direction, and I'm delighted that they're being recognized for their efforts.

Doing useful things with periodical holdings, part 2: comparing with print holdings in Evergreen

2009-11-17T17:05:00-05:00

Doing interesting things with Evergreen serials data

Update: 2010-05-31 Running through the process again, I found a few typos in the pg_dump commands, so I fixed those up.

I'm working on a project to compare our electronic journal holdings with our print journal holdings. This is probably a task that most academic libraries have been working on over the past few years, as collection space dwindles, the duplication of holdings in electronic and print formats increases, and electronic delivery and 24/7 access becomes the default expectation of our patrons.

In my previous post, I worked through the hoops required to get our SFX holdings into a usable database for query purposes. In this post, I'll walk through the steps required to get the serials holdings from Evergreen into the same database so that we can generate reports based on the authoritative sources for both our electronic and print holdings.

We'll start by dumping the schema for the biblio.record_entry and serial.record_entry tables from our Evergreen database. In the previous post, we could have added the tables from the SFX export to the Evergreen database, but I don't like mixing these more experimental projects with our production system - so we'll work with the a database named periodicals instead.

pg_dump --no-owner --schema-only --table biblio.record_entry \    --table serial.record_entry evergreen > bre_sre_schema.dump

We have to munge the schema to not create the indexes on the tables - should lead to faster loads. Also, remove any triggers that point at other stuff that doesn't exist in this limited subset of data. Then create the schema in our periodicals holdings database:

psql -f bre_sre_schema.ddl -d periodicals

Now dump the data for those tables from the Evergreen database. If you have a large set of bibliographic records like we do, make sure you have a few gigabytes of space available in the output location.

pg_dump --no-owner --data-only --table biblio.record_entry \    --table serial.record_entry evergreen \    > bre_sre_data.ddl

Okay, now you can load the data into your serials holdings database:

psql -f bre_sre_data.dump -d periodicals

And now we add the indexes that we previously culled from the schema. You can be more selective in the indexes you create, if you know what you're doing.

For some reason, I opted to play with PostgreSQL's support for XML as a native column type and converted the plain text marc column into an XML column:

ALTER TABLE biblio.record_entry     ALTER COLUMN marc     SET DATA TYPE XML USING marc::XML;

Now we add the Evergreen holdings to the holdings.conifer table. We use the xpath() function to retrieve the desired values from the MARC XML in biblio.record_entry, and wrap the results in the unnest() function to return the nodeset as a plain text string, rather than an array of values. The WHERE clause restricts the holdings to those owned by the library in which I am interested.

CREATE TABLE holdings.conifer (    record BIGINT,     issn TEXT,     coverage TEXT,     call_number TEXT);INSERT INTO holdings.conifer (record, issn)    SELECT bre.id, UNNEST(XPATH('//*[local-name()="datafield"][@tag="022"]' ||            '/*[local-name()="subfield"][@code="a"]/text()', bre.marc))        FROM biblio.record_entry bre INNER JOIN serial.record_entry sre        ON sre.record = bre.id        WHERE sre.owning_lib = 103;

We'll populate the call number based on the 852 field in the serial record. We could pull this from the asset.call_number table, but this will be good enough for the first pass.

UPDATE holdings.conifer     SET call_number = UNNEST(      XPATH(        '//*[local-name()="datafield"][@tag="852"]/' ||            '*[local-name()="subfield"][@code="h"]/text()',         (            SELECT sre.marc::xml            FROM serial.record_entry sre              INNER JOIN holdings.conifer hc ON hc.record = sre.record            WHERE hc.record = holdings.conifer.record            LIMIT 1        )      )    );

Now we need to generate usable holdings statements for the print. Evergreen includes a great MFHD parsing library written in Perl, and PostgreSQL thankfully enables you to create functions written in Perl, but to get the following to work on a non-Evergreen machine, I had to copy Open-ILS/src/perlmods/OpenILS/Utils/MFHD/* to /usr/local/share/perl/5.10.0 and edit the occurrences of OpenILS::Utils::MFHD::* to *.

CREATE OR REPLACE FUNCTION holdings.parse_mfhd ( xml TEXT ) RETURNS TEXT AS $_$    use MARC::Record;    use MARC::File::XML;    use MFHD;    my $xml = shift;    my $text;    my $captions;    my $marc = MARC::Record->new_from_xml( $xml );    my $mfhd = MFHD->new($marc);    foreach my $field ($marc->field('866')) {        my $holdings = $field->subfield('a');        if ($holdings) {            my $public_note = $field->subfield('z');            if ($public_note) {                $text .= "$holdings - $public_note";            } else {                $text .= "$holdings";            }        }    }    foreach my $cap_id ($mfhd->captions('853')) {        my @curr_holdings = $mfhd->holdings('863', $cap_id);        next unless scalar @curr_holdings;        foreach (@curr_holdings) {            if ($captions) {                $captions .= ', ';            }            $captions .= $_->format();        }    }    if ($text and $captions) {        $text = "$text / $captions";    } else {        $text = "$text$captions";    }    return $text;$_$ LANGUAGE PLPERLU;

And update the table:

UPDATE holdings.conifer SET coverage = (    SELECT holdings.parse_mfhd(marc)    FROM serial.record_entry    WHERE serial.record_entry.record = holdings.conifer.record    LIMIT 1);

That almost works, but it only retrieves the coverage from a single serial holdings record for a given bibliographic record, even though there might be multiple serial holdings records. To amend that, we'll create a PL/pgSQL function that concatenates all of the coverage statements from all of the pertinent serial holdings records for a given bibliographic record:

CREATE OR REPLACE FUNCTION holdings.print_coverage(marc_record BIGINT)    RETURNS TEXT AS $$    DECLARE         r RECORD;        coverage TEXT;    BEGIN        -- If coverage is NULL to begin with, then concatenating to it results in NULL        coverage := '';        -- RAISE NOTICE 'marc_record = %', marc_record;        -- Loop over the serial records attached to the targeted bib record        FOR r IN SELECT marc FROM serial.record_entry             WHERE record = marc_record            ORDER BY id        LOOP            coverage := coverage || holdings.parse_mfhd(r.marc);            -- RAISE NOTICE 'r.marc = %', r.marc;        END LOOP;        -- RAISE NOTICE 'coverage = %', coverage;        RETURN coverage;    END$$ LANGUAGE 'plpgsql';

And we'll use this fancy new function to update the print holdings statements again with the more complete coverage:

UPDATE holdings.conifer SET coverage = (    SELECT holdings.print_coverage(record)        FROM serial.record_entry        WHERE serial.record_entry.record = holdings.conifer.record        LIMIT 1    );

Now the payoff: generating a list of matching ISSNs from the electronic holdings and our print holdings, with the coverage statements for each, for a subset of the SFX collections to which we have access:

-- Set the display to expanded format for easy reading\x-- Basic report for perusalSELECT hsfx.issn AS "ISSN", hsfx.title AS "Title",        hsfx.collection AS "SFX Collection",        hsfx.coverage AS "Electronic Coverage",        hc.coverage AS "Print Coverage", hc.call_number AS "Call Number"    FROM holdings.sfx hsfx        INNER JOIN holdings.conifer hc ON hsfx.issn = hc.issn    WHERE (hsfx.collection ILIKE '%JStor%' OR hsfx.collection LIKE '%Scholars%')        AND hc.coverage > ''    LIMIT 5;

That results in:

-[ RECORD 1 ]-------+--------------------------------------------------------------------------ISSN                | 0142-2774Title               | Journal of Occupational BehaviorSFX Collection      | JSTOR Arts and Sciences 4Electronic Coverage | Available from 1980 until 1987. Print Coverage      | Vol. 1 No.  - Vol. 8 No. 4 (1980-1987)Call Number         | DESM-PER-[ RECORD 2 ]-------+--------------------------------------------------------------------------ISSN                | 0741-6261Title               | The Rand Journal of EconomicsSFX Collection      | JSTOR Arts and Sciences 2Electronic Coverage | Available from 1984 until 2006. Print Coverage      | V.17 (1986) - v.23 (1992)Call Number         | DESM-PER-[ RECORD 3 ]-------+--------------------------------------------------------------------------ISSN                | 0002-8614Title               | Journal of the American Geriatrics SocietySFX Collection      | Scholars PortalElectronic Coverage | Available from 2001 volume: 49 issue: 1 until 2009 volume: 57 issue: 10. Print Coverage      | Vol. 1 - 37 (1953-1989)Call Number         | DESM-PER-[ RECORD 4 ]-------+--------------------------------------------------------------------------ISSN                | 0023-7639Title               | Land EconomicsSFX Collection      | JSTOR Arts and Sciences 7Electronic Coverage | Available from 1948 until 2005. Print Coverage      | v.62 (1986) - v.68 (1992)Call Number         | DESM-PER-[ RECORD 5 ]-------+--------------------------------------------------------------------------ISSN                | 0090-2616Title               | Organizational dynamicsSFX Collection      | Scholars PortalElectronic Coverage | Available from 1995 volume: 23 issue: 3 until 2009 volume: 38 issue: 3. Print Coverage      | Vol. 15 No.  - Vol. 23 No. 5 (Summer 1986-Spring 1995)Call Number         | DESM-PER

Looks pretty good to these eyes. Okay, now we'll get serious and dump the output to a tab-delimited file so we can easily open it in OpenOffice.org Calc or another spreadsheet:

-- Set delimiter to TAB (CTRL-V )\f '^V'-- Set the output to being unaligned\a-- Dump the output to a file\o /tmp/periodicals.tsv-- Generate URLs for quick catalogue lookupsSELECT 'http://laurentian.concat.ca/opac/en-CA/skin/lul/xml/rdetail.xml?r='             || hc.record || '&l=105&d=1' AS "URL",        hsfx.issn AS "ISSN", hsfx.title AS "Title",        hsfx.collection AS "SFX Collection",        hsfx.coverage AS "Electronic Coverage", hc.coverage AS "Print Coverage",        hc.call_number AS "Call Number"    FROM holdings.sfx_complete hsfx        INNER JOIN holdings.conifer hc ON hsfx.issn = hc.issn    WHERE (hsfx.collection ILIKE '%JStor%' OR hsfx.collection LIKE '%Scholars%')        AND hc.coverage > '';

And that's it. It might seem complex, but I've found that investing the effort into learning how to lean on PostgreSQL to do the hard work pays plenty of dividends. This exploration should help me contribute more functionality to Evergreen core; for example, I hope to use my experiments with the pl/Perl function to start populating the serial.bib_summary tables using an INSERT/UPDATE/DELETE trigger on serial.record_entry so that we don't have to generate the summaries for every item details request in the catalogue.

FSOSS 2009: Project Conifer update

2009-11-10T04:00:00-05:00

Update: 2009-11-24 James Forrester of the Ontario Academy of Art and Design has posted a short video (Internet Archive) of the presentation. Thanks, James!

On Friday, October 30th, I presented a status update on Project Conifer at the Free Software Open Source Symposium (FSOSS). This was a follow-up to the talk I gave with John Fink at last year's FSOSS, with the hopefully interesting twist that instead of talking about what we were going to do, I talked about what we had done, and the lessons learned along the way.

This was a slightly modified version of the talk I gave at the Lyrasis/NELINET open source conference earlier in October, aimed at a more general audience. The talk was recorded and will be posted online at the FSOSS site at some point.

Here are the slides in (ODP) and (PDF) format. The speaker notes on the slides will give you the meat of the content.

Evergreen development workshop at FSOSS 2009

2009-10-30T17:24:00-04:00

Update 2009-11-24 Robert Soulliere has also made the videos available via the Internet Archive - thanks again, Robert!

Update 2009-11-09 As promised, Robert Soulliere has posted the video recordings he made of the workshop - thanks, Robert!

Yesterday, I lead a three-hour Evergreen development workshop at the Free Software Open Source Symposium. I had promised Nick Ruest from McMaster that it wouldn't be three hours of me talking... but in prepping for the workshop, I ran out of time putting together the virtual image that was going to include all of the tutorial materials... and therefore, ended up talking for almost three hours. Not ideal. Interestingly, there were a number of non-library-world attendees who were interested in OpenSRF, so I was able to spend most of the first hour covering that framework and (I think) managed to successfully keep their attention for that period of time. I wasn't suprised to see them leave once we hit more library-centric content

That said, there is a stake in the ground now for developers who are relatively new to Evergreen. The assumption is that the developer is already comfortable with basic install and configuration of OpenSRF and Evergreen, at least as far as following the install instructions, and that the developer is comfortable writing one or both of Perl or JavaScript. I posit that such a person should be able to work through the workshop tutorial and follow the workshop slides through the evolution of a CGI program to an OpenSRF service that eventually taps into the Evergreen IDL (see workshop tarball).

In writing this down and trying to provide basic examples that can be building blocks for bigger applications, I surprised myself by how much I had to re-learn or in some cases learn for the first time. But now it's written down, and the re-learning path (because my brain is full and constantly rids itself of even painfully learned lessons) will be much shorter. And I hope that this makes it easier for others to become productive OpenSRF and Evergreen developers as well.

This content will continue to evolve and improve over time, as I'm betting that my fellow Evergreen developers will suggest improvements to the materials. Note that I'm delivering a four-hour workshop covering much of the same material at the OLA SuperConference in 2010. The extra hour should give us time to complete some hands-on exercises, and I'll incorporate the feedback that I've received from the FSOSS workshop for the OLA workshop. (Your feedback is always welcome, either in comments to this post or via email at dan@coffeecode.net). It would be great to see other people take these materials and improve and deliver them as well - they're under a CC-BY-SA license - so if there's interest, I'll be happy to check them into a public source repository (hmm, maybe a bzr branch at the Evergreen Launchpad project).

Oh! And Robert Soulliere from Mohawk College recorded the entire workshop and plans to make it available online. So if you need some sleep, those video segments will be available!

Presentation at the Lyrasis "Open Source in Your Library" conference

2009-10-10T17:34:00-04:00

On Friday, October, 9th, I had the pleasure of (along with Joe Lucia and Karen Coombs) speaking at the Lyrasis "Open Source in Your Library" conference at the Olin College of Engineering in Needham, MA. First, a note about Olin College - it is a very modern campus that makes an excellent venue for a single-track conference (New England #code4libber's take note!). Second, this had originally been a NELINET conference, but as of last week NELINET had merged with Lyrasis to create a regional library non-profit organization that spans most of the East Coast of the United States.

My presentation slides (with copious speaker notes) are available in OpenOffice.org Impress, PDF, and PowerPoint format

I had been asked to talk about Conifer's experiences implementing Evergreen, as there is certainly some interest on the part of Lyrasis member organizations in open source library systems. I chose to tell the unvarnished story of Conifer: how we decided to build a consortial academic library system on Evergreen, what steps we have taken in the past two years, and probably more importantly what missteps we have taken over the past two years. I told some cautionary tales that were hopefully useful to others considering the same path, and then discussed the state of the Evergreen community.

As a quick recap, the biggest challenges we hit on the road to adopting Evergreen were:

Finding skilled developer resources that could commit time to help us develop solutions for some of our requirements was challenging, even when we did have financial resources.
Our largest founding partner withdrew from the project months before we were set to go live.
Due to the effects of the recession on provincial and therefore university finances, and the increased burden on the remaining Conifer partners for the shared costs that weren't reduced after the partner's withdrawal, our collective budget was slashed and we ended up having to pay opportunity costs by focusing on migrating our own data rather than outsourcing that role and focusing several months of effort on development.

I noted that our efforts to build a reserves system (Syrup) have thus far resulted in a loosely coupled reserves system that none of us have been able to use - but that for the time being Evergreen's bookbags have served as a reasonable replacement for lists of monographic reserve items, and that the discussion about how to more tightly couple Syrup with Evergreen has resumed (and is currently waiting on me for a response)... so there's hope that we might be able to deploy the all-singing, all-dancing reserves system next term.

I confessed that we're using spreadsheets to track acquisitions while Evergreen's native acquisitions system solidifies (although, given the current state of our budget, spreadsheets are all that we need for the time being - sigh). Joe Lucia had remarked during his own presentation that an acquisitions system that can handle the rather complex requirements of academic institutions was a showstopper for his library. In the Evergreen 1.6 release, you can see that the acquisitions system is almost ready; we loaded six years of historical acquisitions data into a test server and were able to do most of what we need, subject to some refinements. I think it has been an extremely challenging balancing act for Bill Erickson to juggle the requirements of academic libraries with those of large consortial public library systems to come up with something that can make everyone happy (as happy as you can possibly be with acquisitions), but the progress over the summer has been encouraging.

On a more positive note, one of the great advantages of adopting a consortial library system is that I was able to take two months of parental leave and not worry about the state of the system at all. We have shared responsibilities across the consortial partners, such that I can actually turn off my cell phone when it's not my turn to respond to problem reports. And during my absence, my colleagues (Art Rhyno, Robin Isard, Kevin Beswick) all gained a lot of confidence in their own understanding of the system. This shared responsibility should also pay dividends when we put together processes for reporting records to our various consortial catalogues (such as AMICUS): rather than each of us having to rediscover the process on our own, we can collaborate and improve upon each other's work. It's a lot less lonelier being a systems librarian in a consortial library system, let me tell you!

I also shared our positive experiences with Evergreen's uptime and with Equinox as a support provider. The few times that we have had outages, they have been relatively brief and when we have opened a problem ticket with Equinox, they have responded quickly. Robin measured our uptime over the last two months at 99.5% - which isn't five nines, but is still far better than the 75% (maximum) that we had with our previous system due to the six hours it was down every night for backups. We also chalk up some of the downtime so far to learning experiences; we're refining the configuration of the system and improving our own knowledge of how to maintain the system without incurring an outage. So, I expect that we'll eke our way back up over the next few months to an even better uptime percentage.

On the topic of the Evergreen community, I compared several commonly-used objective measures of the health of a given open source community, such as mailing list volume, number of contributors and contributing organizations, and release frequency with Evergreen's track record. We're doing reasonably well on the mailing list front, and we've seen a small increase in the number of patch contributors, but I think we need to make the on-ramp to Evergreen development slightly easier to ascend. This is why I'm trying to create a set of tutorials for new developers, starting with basic OpenSRF, extending through database access methods such as open-ils.cstore and open-ils.pcrud, rounding off with the IDL-aware custom Dojo widgets that Bill Erickson has put together, and perhaps giving people enough XUL to know how to add a new menu entry to the staff client. (I really can't tackle XUL, too, in just one half-day workshop!) If our community has a broader set of developers capable of contributing to the project, then we can expect to see more customization and extensions available - and possibly more committers.

On the release front, I got a rueful laugh from the audience when I said that the Evergreen 1.6 release was expected within a few days - "just like we [the developers] said at the Evergreen International Conference". I acknowledged that we've had trouble getting high quality releases out the door - that it took months, and five point releases, before the 1.4 release was really usable out of the box, and that it had taken even longer to get 1.6 out for a release. But I also promised that we (the core committers) had been discussing ways that we can improve the release process; for example, Mike Rylander had committed resources from Equinox to help build a suite of regression tests so that we could have automated nightly builds with known pass/fail rates, and on the mailing list we had been discussing different approaches to bug-tracking and development (including the possibility of using distributed version control systems to do feature development in branches instead of trunk).

On the state of the community, I applauded the Evergreen Documentation Interest Group (DIG) for leading the charge in taking a team-based approach to tackling a problem. I pointed to this as a sign the community was maturing beyond its origins of a core set of contributors who did everything from maintaining servers to creating Web site content to development, to a set of more focused teams that would be able to achieve more through close collaboration on their objectives. We're seeing that in discussions about a Quality Assurance (QA) team, as well, that would be responsible for tracking and verifying bugs in a public repository and (probably) enhancing the tests that let us measure the quality of the project code at any given time. I can imagine other possible teams charged with Web site design and content maintenance, perhaps as a more focused spin-off of the DIG; an internationalization team, focused on enabling translations and managing contributed translations; and an infrastructure team responsible for maintaining the health of the project servers.

Speaking of the community, this is probably a good time to suggest The Art of Community by Jono Bacon (Ubuntu Community Manager) as an excellent read - at least based on the first half of the book that I've managed to get through during my travels.

So, with that, I head back home (thanks Boston Public Library for the free wifi). We have challenges to tackle in both Project Conifer and in the growth of the Evergreen community, but knowing the people involved in both of these efforts, I'm confident that we're going to make a huge amount of progress over the next few months.

Using nginx to serve static content with Evergreen

2009-10-04T04:51:00-04:00

Update 2009-10-04 Added a title to the post; oops!

A long time ago, when I discovered that Evergreen was chewing up and spitting out Apache backends at a furious pace because Apache was being used to serve up static content like CSS, JavaScript, and image files, I suggested that using nginx to serve up the static content and proxying the dynamic requests to Apache would be a good solution to a number of problems we were facing. Here we are, five months later, and I've managed to put in a few hours tonight (amidst stomach-wrenching laughter at SNL's "Threw it on the ground" tune) to get a proof of concept configuration working on the Ubuntu 9.10 beta release.

The following nginx configuration hasn't been tested in a production environment yet, and isn't tuned beyond the defaults that ship with Ubuntu Karmic, but it works on my laptop in a virtual image for both regular HTTP and SSL requests - so what could possibly go wrong?

Steps to get this working on Ubuntu Karmic, assuming that nginx and Apache are running on the same server:

Install nginx: sudo aptitude install nginx
Copy the configuration file, changing "192.168.69.107" to match your server's IP address or host name, into a file called /etc/nginx/sites-available/evergreen and create a symbolic link to the file at /etc/nginx/sites-enabled/evergreen
Modify /etc/apache2/ports.conf to change port 80 to 9080 and port 443 to 9443.
Modify /etc/apache2/eg_vhost.conf to change the "Listen 443" directive to "Listen 9443"
Restart nginx and Apache to put the new configuration in place
Enjoy!

As I said, there's probably plenty of room for improvement; I have only a few hours of experimentation with nginx under my belt at this point. But assuming no showstoppers turn up after further testing, I would expect to see this going into production in Conifer sooner rather than later, and potentially becoming a standard part of any production Evergreen system.

Evergreen Developer Basics Workshop at FSOSS 2009

2009-10-03T01:04:00-04:00

If you're working on or interested in working on the Evergreen open source library system, and you can be in the Toronto area on October 29th, 2009, you might want to spend $75 and register for the Free Software Open Source Symposium (FSOSS) to be held at the Seneca@York campus. You'll get a three hour workshop introducing you to Evergreen development out of the deal, plus your choice of another workshop on the 29th and the ability to attend all of the FSOSS presentations on the 30th. I attended FSOSS last year for the first time and was stunned at the high quality of the conference.

I apologize for the late notice that means that you missed out on the $30 early registration special; I did not hear until this morning that my workshop proposal had been accepted. This seems in keeping with this year's edition of FSOSS, as the conference Web site also seems to be a bit behind where one would expect with only four weeks to go (heh). The late notice will also mean that most of my spare minutes will be soaked up for the rest of the month preparing the workshop materials, but building a collection of Evergreen development tutorials for the community is high on my personal list of goals, so it will definitely be worth it. Expect a high-energy presentation!

Here are the particulars for the workshop:

Workshop title: Evergreen Library System Development Basics

Workshop description Over the past year, Evergreen has been

adopted by a number of libraries in Ontario. While it is built on a

flexible, scalable architecture and offers an impressive set of

features, the Evergreen community needs a broader base of developers

who are able to contribute to the base functionality and create

customized Evergreen instances. This workshop will provide developers

with the tools they need to contribute to the Evergreen project and

better serve their libraries, tackling subjects such as creating a new

OpenSRF service, accessing data with permission-based methods,

customizing the database schema and IDL, and building AJAX interfaces

with the OpenILS Dojo widgets.

Two podcasts of potential interest to Evergreen fans

2009-09-15T20:59:00-04:00

Most recently, the latest Software Freedom Law Show focuses on the subject of how to choose a license for your software project's documentation. The episode was a direct response to a dent I had sent to one of the hosts, Bradley Kuhn, suggesting the subject. I thought the Evergreen Documentation Interest Group might find it a useful treatment from two of the most knowledgeable folks in the free software licensing world. As a bonus, when I started listening to the episode today, I was pleased to hear Bradley lead in with a very positive mention of Evergreen. Many thanks, Bradley, both for the show and for the shout-out to Evergreen!

Also, back in July, I had the opportunity to travel to Algoma University in Sault Ste. Marie to spend a few days locked in a room with my fellow Conifer propeller-heads (Art, Kevin, and Robin) to dump the Evergreen-related content of my brain out onto the table in preparation for my parental leave. As part of the visit, we joined in the Tangential Convergence crew to put together a podcast about Conifer and Evergreen in the standard Tangential Convergence style: having a few beer while sitting around a table in Dave Brodbeck's backyard. We ended up veering off onto other subjects rather quickly, but such is the nature of the show!

Addendum @ 20:44

In the SFLC podcast, Bradley was riffing about my role in Evergreen based on his memory of my FSOSS presentation from almost a year ago, so to set the record straight - I'm a relative newcomer to Evergreen, having joined the project in 2007 after Mike Rylander, Bill Erickson, and Jason Etheridge had already accomplished the miracle of delivering the first release of Evergreen to the public libraries of the state of Georgia.
Also, in the opening moments of the SFLC podcast, there's a mention of how Evergreen filled a gap in the free software universe (library systems); one should note that Koha tackled that gap a lot earlier (starting in 1999) and is also a thriving project today. <p> </ol> </p>

SFX target parser for Evergreen and some thoughts about searching identifiers

2009-06-29T17:00:00-04:00

UPDATE 2010-03-10 See More granular identifier indexes for your Evergreen SRU / Z39.50 servers for some recommended enhancements to the target parser and Evergreen's identifier index capabilities

Laurentian University is part of the Ontario Council of University Libraries (OCUL), and a user of the centrally hosted Ontario Scholars Portal SFX link resolver, so one of the things we needed when we migrated to Evergreen was a target parser for our link resolver. This is the target associated with Search the library catalogue that is the last resort when the resolver fails to turn up any full-text resources for a given OpenURL - so hopefully it won't need to be invoked too often, as we have a very rich set of full-text electronic resources at Laurentian University.

The code

Here is a quick implementation of a target parser that generates search URLs based on ISSN, ISBN, book title, or journal title. Pretty impoverished from an OpenURL perspective, but it maintains the same level of functionality from our previous system. In TargetParser/Evergreen/Conifer.pm I created a target parser called Evergreen::Conifer that implements a subset of the Parsers::TargetParser API for SFX as follows:

package Parsers::TargetParser::Evergreen::Conifer;use Parsers::TargetParser;use base qw(Parsers::TargetParser);use strict;sub getHolding {  my ($this,$genRequestObj) = @_;    my $objectType = $genRequestObj->{'objectType'};  my $ISBN = $genRequestObj->{'ISBN'};  my $eISBN = $genRequestObj->{'eISBN'};  my $ISSN = $genRequestObj->{'ISSN'};  my $eISSN = $genRequestObj->{'eISSN'};  my $CODEN = $genRequestObj->{'CODEN'};  my $bookTitle = $genRequestObj->{'bookTitle'};  my $journalTitle = $genRequestObj->{'journalTitle'};  # Canonical search results URL for simple searches:  # http://laurentian.concat.ca/opac/en-CA/skin/lul/xml/rresult.xml?rt=keyword&tp=keyword&t=0895-2779&l=105&d=2&f=&av=  my $svc = $this->{svc};  my $egHost = $svc->parse_param('eg_host');  my $egLocale = $svc->parse_param('eg_locale');  my $egSkin = $svc->parse_param('eg_skin');  my $egOrgUnit = $svc->parse_param('eg_org_unit');  my $egDepth = $svc->parse_param('eg_depth');  my $path = "http://${egHost}/opac/${egLocale}/skin/${egSkin}/xml/rresult.xml?l=${egOrgUnit}&d=${egDepth}";  my $searchString = '&rt=keyword&tp=keyword&t=';  if (defined($ISSN)) {    if ($ISSN =~ m/x/i) {      # Current indexer doesn't deal well with ISSNs containing an X, so break it up      $ISSN =~ s/^(\d{4})-?(\d+)x/$1 -$2 x/i;      $searchString .= $ISSN;    } else {      $searchString .= "\"$ISSN\"";      # format 9999-9999 for MARC    }  }   elsif (defined($ISBN)) {    # Evergreen doesn't force ISBNs to be stripped of hyphens, so take whatever    $searchString .= "\"$ISBN\"";  }  elsif (defined($journalTitle)) {    # Restrict searches to title index, with bibliographic level = s    $searchString .= "ti:${journalTitle}&bl=s";  }  elsif (defined($bookTitle)) {    # Restrict searches to title index, with bibliographic level = m    $searchString .= "ti:${bookTitle}&bl=m";  }  return ($path . $searchString);}1;

And here's the help that I added to the corresponding Conifer.hlp file:

General Information

Target - LOCAL_CATALOGUE_EVERGREEN_CONIFER

Service - getHolding

Parser - Evergreen::Conifer

Information needed in the Target Service:

In the PARSE_PARAM field, replace the following information:

eg_host = $$LOCAL_CATALOGUE_SERVER

eg_locale = Locale (en-US, en-CA, fr-CA, etc)

eg_skin = algoma, default, lul, nohin, uwin

eg_org_unit = 103, 1, etc

eg_depth = 0, 1, 2, 3, etc

Findings and wishlists

While it's quite easy to set up Evergreen as a searchable resource, thanks to its straightforward URL syntax, one of the things that leaps out at me is that Evergreen, by default, has no identifier index for limiting searches by ISBN / ISSN / LCCN / OCLCnum. Ideally, we would disable full-text indexing on this index so that we can more accurately search for ISSNs that include an x. Right now we have to split ISSNs with an "x" into constituent parts and generate searches on those parts, which results in false hits from across the database. This would also be useful for limiting Z39.50 searches.

I would also like to teach Evergreen about ISBN-10/ISBN-13 equivalence, to broaden the search while maintaining precision. And I would like to automatically normalize ISSN and ISBN formats so that I don't have to worry about whether a cataloguer entered hyphens or not - and the same for incoming search terms.

Finally, to support services like xISBN that search for multiple formats and editions of a given work by generating a shotgun blast of ISBNs for all known representations, I would love to teach Evergreen how to accept a list of identifiers as search input.

Don't ask me when these things will happen, though; if it requires work from me, it will probably be 2010 before any of it happens.

Globalization presentation at Evergreen International Conference 2009

2009-06-05T02:12:00-04:00

I was fortunate to be invited to give a talk (OpenOffice.org Impress / PDF ) on Evergreen's progress on the

globalization front at the first ever Evergreen International Conference. My friend

Tigran Zargaryan from the Fundamental Science Library of the National Academy of Sciences of the Republic of Armenia gave a

talk at almost the same time about his library's progress in adopting

Evergreen. Tigran himself was responsible for the translation of the

Evergreen catalogue and staff client into Armenian, and he confided that he

also expected to make significant progress towards a Russian translation

during the lengthy layovers at airports that are part of his normal travel routine.

So, my goal was to provide an overview of the progress we have made in

taking Evergreen from its American English roots and enabling it to support

not just translated interfaces, but properly localized content display - and

to provide some pointers towards where we need to go next. We have been

making progress towards a more formalized translation process, so keep an

eye out for a call for translations in the next week or two when the Evergreen

1.6 release candidate is made available for testing. We currently sport

Armenian, Canadian English, Canadian French, and Czech translations, and

welcome both new translations and revisions to our current translations.

To make it easier for translators to collaborate, we need to take our

Pootle translation server from a

beta service running on my poor little VPS to a real server. We have some

technical challenges to overcome - providing translation support for the

Template::Toolkit framework, for example. And we have some basic grunt work

to do to replace the hard-coded display of numbers, currencies, dates, and times

with localized variations throughout our code.

I was pleasantly surprised by the number of people attending the session; I

hadn't expected such an interest in the topic, despite it nominally being an international

conference. My only regret was that I rushed off the stage without taking

questions in the mistaken belief that I had used up all of my time and was

eating into my successor's presentation timeslot; as it turned out, there

was a built-in 15 minute buffer that I had overlooked. Ah well. Thanks to

everyone who came out, and for everyone else who wasn't able to make it to

the session, I hope you'll find the slides a good introduction to the

state of globalization in Evergreen. And if you have the skills to contribute, please

consider pitching into the globalization enablement effort!

Evergreen International Conference hackfest results: Evergreen serials support

2009-05-27T18:23:00-04:00

Yes, all of a sudden and rather quietly, Evergreen has serials support.

A few weeks ago, I finished hooking up a rudimentary serials holdings display based on David Fiander's MFHD parsing code to our production instance of Evergreen. We loaded our MFHD records from our legacy system into Evergreen and that gave us enough breathing room to keep working on the problem. By rudimentary I mean:

limited to displaying one MFHD record per bibliographic record (a problem for journals for which you have separate sets of holdings in microfiche, print, etc)
serials holdings were displayed for a given bibliographic record no matter what library scope you were searching in (more of a problem in theory than in practice as we currently have one copy of a given bibliographic record per library... that will change over time...)
no way to edit the MFHD records, which is a problem as the issues we have received since migrating to Evergreen three weeks ago are starting to pile up
limited to English labels in the interface

Here's the rudimentary serials holdings display:

The operative phrase is was rudimentary. In the past two weeks, things have come a long way in Evergreen. The primary result of my afternoon of work at the Evergreen International Hackfest, with lots of help from Mike Rylander and Bill Erickson in navigating the impressive new Dojo toolkit-based Evergreen JavaScript widgets and services in the upcoming Evergreen 1.6 release, was to add an Edit button to the holdings display that shows up when the record is viewed in the staff client. When pressed, the Edit button invokes a MARC editor so that you can copy an 86[345] field and fill in the pertinent information; or collapse holdings in the 86[678] fields, etc. It seems like a minor victory, but it was a real result from the hackfest, and that cannot be discounted!

Here's the MARC editor in action:

Since then, I've been on fire... or maybe on a slow burn, as I put a few hours in here and there, and am happy to say that when Evergreen 1.6 is released, serials support will feature:

support for display unlimited MFHD records per bibliographic record
holdings display scoped by library search context - so you'll only see holdings for the part of the library hierarchy that you're searching, rather than the whole consortium
the Edit button for editing the raw MFHD record
internationalization support for interface labels, based on Dojo string substitution

I have already committed these features to the Evergreen trunk, but I hope to add a few more pieces to the mix before the Evergreen 1.6 release is cut. We need to display the 852 field contents to identify the location of each set of holdings, and we need to give cataloguers the ability to edit some of the attributes (such as owning library).

Here are |image2|the slides I presented (largely screenshots of the serials display and edit button) for the hackfest results lightning talk that I gave with Jeff Godin of Traverse Area District Library. Jeff did some interesting work in his own right on generating feeds for recently added titles based on copy location during the hackfest.

Conifer lives: Ontario launches a consortial academic library system built on Evergreen

2009-05-11T21:21:00-04:00

I awoke around 4:48 am today. At the time, I thought it was just our baby kicking away excitedly. However, later this afternoon, I realized that it had been almost exactly a week ago, around 4:30 am on Monday, May 4th that I sent a broadcast email message to librarians and staff at 24 different libraries. The Conifer consortial library system, built on the solid foundations of the Evergreen open-source library system, had gone live - and I was exhausted after a long weekend of migrating all of that data. I was proud to see the Laurentian catalogue sporting a completely different look and new functionality - reviews! book covers! sharable book bags! format & edition grouping! - and excited by the promise of more to come.

Conifer represents the first flowering of an effort that began back in July 2007 with a hand-shake agreement between Laurentian University, McMaster University, and the University of Windsor to build a provincial, primarily academic, library system on Evergreen. The system is centrally hosted by the top-notch IT team at the University of Guelph.

Things change, and along the way Algoma University and the Northern Ontario School of Medicine joined us as full partners, and McMaster University opted to continue contributing to the common development effort but withdrew from the centrally hosted system.

As noted, we went live on Monday, May 4th and we survived the first day. On Tuesday, May 5th we corrected a problem in our configuration that had caused some instability (thanks to Mike Rylander for providing the patch that set things straight). Since then, we have been slowly refining aspects of the system - setting up circulation rules, migrating records and items that had been missed over the weekend, polishing the Z39.50 server, fine-tuning the permissions scheme - but the core of the system is solid. We have a consortial system that stretches from the southern-most tip of Ontario to the north-west corner of the province (hello, Thunder Bay!), and so far connectivity seems good and the reliability of the system - which, upon launch, has probably become the second largest Evergreen implementation by number of bibliographic records - has been superb.

A few interesting statistics about Conifer... (have I mentioned how much I love that Evergreen is built on PostgreSQL because it becomes so simple to generate basic reports in plain SQL?):

Number of staff and user accounts per library in Conifer

conifer=# SELECT aou.name, count(au.id)
FROM actor.org_unit aou
INNER JOIN actor.usr au
ON aou.id = au.home_ou
GROUP BY aou.name
ORDER BY 2 DESC;

name                                       | count
-------------------------------------------+-------
Leddy Library                              | 19468
J.N. Desmarais Library                     | 11921
Algoma University, Wishart Library         |  2431
University of Sudbury                      |  1100
Hearst, Bibliothèque Maurice-Saulnier      |  1043
Huntington College Library                 |   834
Paul Martin Law Library                    |   592
Northern Ontario School of Medicine (West) |   284
HRSRH Health Sciences Library              |   261
Northern Ontario School of Medicine (East) |   224
Xstrata Process Support Centre Library     |   122
NOHIN                                      |   121
Instructional Media Centre                 |     9
Laboratoire de didactiques, E.S.E.         |     7
Vale Inco                                  |     4
Mines Library, Willet Green Miller Centre  |     2
Art Gallery of Sudbury                     |     1
Curriculum Resource Centre                 |     1
Sault Area Hospital                        |     1
Centre Franco-Ontarien de Folklore         |     1
Conifer                                    |     1
(21 rows)

Number of copies held per library in Conifer

conifer=# SELECT aou.name, count(ac.barcode)
FROM actor.org_unit aou
INNER JOIN asset.copy ac
ON aou.id = ac.circ_lib
GROUP BY aou.name
ORDER BY 2 DESC;

name                                       |  count
-------------------------------------------+---------
Leddy Library                              | 1373197
J.N. Desmarais Library                     |  614380
Paul Martin Law Library                    |  229391
Algoma University, Wishart Library         |  115156
University of Sudbury                      |   42154
Hearst, Bibliothèque Maurice-Saulnier      |   34276
Huntington College Library                 |   12517
Laboratoire de didactiques, E.S.E.         |   10284
Mining and the Environment Database        |    9940
HRSRH Health Sciences Library              |    7512
Music Resource Centre                      |    7511
Xstrata Process Support Centre Library     |    5477
Centre Franco-Ontarien de Folklore         |    4365
Northern Ontario School of Medicine (East) |    3779
Northern Ontario School of Medicine (West) |    3301
NOHIN                                      |    2647
Mines Library, Willet Green Miller Centre  |    2617
Curriculum Resource Centre                 |    2583
Sault Area Hospital                        |    2515
Art Gallery of Sudbury                     |    2237
Hearst Timmins, Centre de Ressources       |    2202
Hearst Kapuskasing, Centre de Ressources   |    2007
Vale Inco                                  |    1106
Instructional Media Centre                 |    1095
(24 rows)

What about acquisitions, serials, and reserves?

One of the reasons we had a hard migration date of early May was because it matches nicely with the fiscal year-end for those institutions who were running a traditional acquisitions system on their legacy ILS. We normally shut down all purchases for a period of weeks while we roll over the encumbrances into the next fiscal year and set up our budgets. This year, we're migrating all of the old financial data twice: first, and foremost, into the most sophisticated set of spreadsheets you'll ever see attached to a library system (as pulled together by the inestimable Art Rhyno); and second, into the Evergreen acquisitions system that will launch with Evergreen 1.6.

The first migration of a given set of data is always the hardest part, so once we have the fund / order / provider data in spreadsheets, the migration into Evergreen proper will be trivial. This will give us the summer to use both systems side-by-side and refine what we need from Evergreen. We have migrated all of our serials data from the legacy system, I just haven't enabled the display of that data in our live system. A prototype was running on my laptop for a few days until I accidentally blew it away - ah well, anything worthwhile doing is better the second time around anyway. This, too, will be part of the Evergreen 1.6 release, and will feature full MFHD compliance built on the code that David Fiander has been writing on behalf of Equinox. I should note that this first cut at serials is in some ways relatively basic; while the system in Evergreen 1.6 will be fully MFHD compliant, down to the point of letting you to edit an MFHD record to "check in" a new issue by adding a new 863 field, it won't associate barcodes with individual issues. Most of the database schema exists to support that, but there's still a large amount of code to be written on top of the schema and we need Something That Works Right Now I'm confident that that's coming not too far down the road, though.

Finally, what would an academic library be without reserves? Art Rhyno (again!) has been working with Graham Fawcett for the past six months on Syrup - a really impressive melding of the world of electronic reserves and traditional physical library system reserves that uses SIP and Z39.50 to talk to Evergreen. Syrup is just about at a full boil now, so in a few more weeks we should have it deployed so that we can savour its sweetness through the relatively slow summer months before ensuring that the taste is just right for all of our incoming students and faculty in the fall.

Evergreen iPhone application? Unnecessary!

2009-04-13T04:29:00-04:00

This Easter weekend I had the opportunity to play with someone's iPod Touch. Of course, the only thing I tried was the Evergreen 1.4 catalogue interface. Lo and behold, it came up just fine on Safari in all of its heavily dynamic JavaScript and less-than-XHTML-compliant glory - even sporting several Dojo widgets. Nice. So we don't have to worry about writing an iPhone-specific application to access Evergreen; users of such devices can just use the normal dynamic catalogue with full functionality.

Evergreen doesn't fare quite as well with Microsoft's rather decrepit PocketExplorer browser on my HTC Touch smartphone (it's a Windows Mobile monstrosity, sigh), but it does work well with the Opera Mobile 9.5 beta browser. I eagerly anticipate the first good release of Fennec for Windows Mobile (coming soon!), as I'm confident that's going to improve my mobile Web browsing experience even further.

I predict that in another year or two the idea of building mobile-specific Web portals to complement your full-function Web site will be pretty passé. I already get really irritated when Web sites think they're being helpful by automatically redirecting my smartphone to an extremely limited interface; in most cases, the full site runs fine. Give me the option, sure, but don't force me down that path. As hardware costs continue to drop, and 3G networks expand, and more people upgrade to more capable mobile devices, one full-function Web site will be all we need--as long as that site is written in (X)HTML and CSS and JavaScript.

Those sites that decide to push core functionality into Flash or SilverLight, on the other hand, can go straight to hell, thankyouverymuch. I'm looking at you, PTOnTheNet. This is a site to which Lynn has been a paying customer for years. It recently announced that it was revising the Web site, which is all well and good. What's not so good is that they adopted SilverLight: not just for pretty effects here and there, but as a core technology. Problem: Lynn has been using Linux at home since I introduced her to it somewhere around eight years ago, and last year bought one of the early models of the Linux-based Asus EEE netbook. Not only did the site redesign destroy the personal training programs she had set up for her clients over the years (breaking site redesign rule #1: Thou shalt not destroy your clients' data), but it also renders her netbook useless for that site.

Even with the Moonlight plugin installed, it looks like the cretinous site developers are using detection scripts to prevent the plugin from even trying to render the content. With Linux-based netbooks on the rise--and with netbooks being the right form factor and price for personal trainers who want to throw them into their backpacks and not weep too bitterly if their netbook suffers the misfortune of being knocked around or sweated to death--this seems very much like a technology choice that was not based on the needs of the customers. Worst of all, they deliberately chose to exclude Linux, when a (X)HTML, CSS, and JavaScript platform would have supported almost any modern platform: not just Linux netbooks, but other mobile devices like the iPhone and smartphones that are so well-suited to the personal trainer. So, at least one customer is going to be walking away, and if there's a competing Web site out there that caters to a broader clientele, I bet there will be far more customers moving in that direction.

One big library, one little device: Evergreen staff client on Nokia N810

2009-03-02T05:23:00-05:00

It's hard to take good photos of these devices

Almost exactly a year ago, Jason Etheridge (the primary developer of the Evergreen staff client) and I managed to get our hands on a developer edition of the Nokia N810 Internet tablet device. It's a nifty little handheld computer that packs 128 MB of memory, a touch screen, and a beautiful 800x480 screen, and I've had my hands on it from almost the beginning. The primary rationale of the Nokia developer program was to encourage developers to put together useful applications for their platform, of course... and as the months ticked by and I did nothing of interest, my guilt slowly grew.

Well, today I feel a little bit better. Here's what happened: when I was attending FSOSS 2008 at Seneca College, I ran into Madhava Enros. Madhava and I had worked together on some help UI designs back when we were both DB2 employees; since then, he had joined the Mozilla Foundation and was working on Fennec, the mobile version of Firefox targeting the N810 device (to begin with, at least). The first alpha of Fennec had been released to coincide with FSOSS 2008, so I gave it a shot a few days later. Madhava's team made some great innovative decisions for Fennec's UI, but what really caught my eye was that they had packaged a port of XULRunner-1.9 to the N810.

See, the Evergreen staff client is built on XUL, the same XML/JavaScript/CSS foundation as Firefox and Thunderbird and Fennec - and to run XUL, you need XULRunner. At the time, though, the Evergreen staff client needed the 1.8 version of XULRunner; it simply wouldn't work with 1.9. So, I stuffed the N810 back into its case and forgot about it for a few more months while I focused on other things like the never-ending effort to improve Evergreen's internationalization support.

Over the last few weeks, though, Jason has been steadily enhancing the staff client in Evergreen trunk - and the comment for one of his recent commits was “we're kicking xulrunner 1.8 to the curb with trunk”. I had a spare hour or two on my hands today, so I copied a staff client build from Conifer's Evergreen trunk test box to the N810, kicked off the XULRunner command, and waited... expecting failure. Instead, I found that the staff client worked almost exactly as it does on my laptop - the major difference being that some of the default function key mappings on the staff client conflict with the mappings of special buttons on the N810 (like the full screen toggle gets mapped to F6 - Record In-House Use on the staff client). Otherwise, the client did a great job of adjusting to the available screen width, and even Dojo-based interfaces like the Vandelay MARC batch importer/exporter and the pop-up calendar worked. Very cool!

So, if I can find a barcode scanner with a mini-USB attachment, I could have a nice little inventory tool on my hands. Or a mobile circulation station. All because the Evergreen developers made the decision years ago to build on XUL as a cross-platform framework... this should be sweet confirmation that they made a good choice. XUL continues to be ported to more platforms, and anyone using the Evergreen staff client benefits from the optimizations and bug fixes that go into XULRunner. Nice. When we cut a release from Evergreen trunk that supports XULRunner 1.9, I'll do my best to package up a version of the staff client for the N810, and some of my guilt will be assuaged. Yes!

*Updated 2009-03-02 10:35 am: Correcting Madhava's name; I shouldn't write past midnight without proof-reading! Sorry Madhava.*

Unicorn to Evergreen migration: rough notes

2009-02-08T21:32:00-05:00

Updated 2009-02-25 00:29 EST: Corrected setuptools installation step.

Updated 2009-02-08 23:39 EST: Trimmed width of some of the <pre> code sections for better formatting. Created bzr repository for unicorn2evergreen scripts at http://bzr.coffeecode.net/unicorn2evergreen

I did this once a long time ago for the Robertson Library at the University of Prince Edward Island. For our own migration to Evergreen, I have to load a representative sample of records from our Unicorn system onto one of our test servers. This has been a good refresher of the process... and a reminder to myself to post the other part of the Unicorn to Evergreen migration scripts in a publicly available location. Okay, they're posted to this bzr repository called unicorn2evergreen

Export bibliographic records from Unicorn using Unicorn's catalog key (basic sequential accession number) as the unique identifier (I plopped the catalog key into the 935a field/subfield combo). I use the catalog key because the "flexkey" is not guaranteed to be unique within a single Unicorn instance - and because the catalog key makes it easy for us to match call numbers and copies.
For each item, export call number / barcode / owning library / current location / home location / item type using the catalog key as the identifier.
Set up the organization unit hierarchy on your Evergreen system. You can dump it from an existing Evergreen system into a file named "orgunits.dump" like so:
```
pg_dump -U evergreen --data-only --table actor.org_unit_type \    --table actor.org_unit > orgunits.sql
```
Then drop all of the existing org_units and org_unit_types and load your custom data in a psql session:
```
BEGIN;SET CONSTRAINTS ALL DEFERRED;DELETE FROM actor.org_unit;DELETE FROM actor.org_unit_type;\i orgunits.sqlCOMMIT
```
Import bibliographic records using the standard marc2bre.pl / direct_ingest.pl / pg_loader.pl process. Point the --idfield / --idsubfield and --tcnfield / --tcnsubfield options for marc2bre.pl at 935a (yes, this sucks for title control numbers, but as noted above they are not guaranteed to be unique in Unicorn and we need uniqueness in Evergreen). We need the bibliographic record entry ID field to be the catalog key to set up subsequent call number/barcode matches.

Enable the subsequent addition of new bibliographic records by setting the sequence object values to avoid conflicting ID / TCN values by issuing the following SQL statements:

SELECT setval('biblio.autogen_tcn_value_seq',     (select max(id) from biblio.record_entry) + 100);SELECT setval('biblio.record_entry_id_seq',     (select max(id) from biblio.record_entry) + 100);

Process holdings records.
1. Call numbers might have MARC8 encoded characters, so process'em and convert to UTF8. Theoretically "yaz-iconv -f MARC-8 -t UTF-8 < holdings.lst > holdings_utf8.lst" should do it, but instead it eats linefeeds and creates an unusable field. Ugh. We use a little Python script instead that requires pymarc, which in turn requires a version of setuptools (0.6c5) newer than Debian Etch's packaged version (0.6c3). So:
```
wget http://pypi.python.org/packages/2.4/s/setuptools/setuptools-0.6c9-py2.4.eggsudo sh setuptools-0.6c9-py2.4.eggsudo easy_install pymarc
```
2. Now actually generate the 'holdings_utf8.lst' file.
```
cat holdings.lst | python marc8_to_utf8.py
```
3. Adjust parse_unicorn.py to match up the holdings fields (added flexkey to the start). Then parse the holdings_utf8.lst to generate an SQL file (holdings_eg.sql) that we can load into the import staging table.
```
python parse_unicorn.py
```
  Note that the holdings data for the item with barcode 30007007751786 didn't process cleanly and won't load. Weird - possibly a corrupt character in the item data? Augh, no - there are flexkeys and callnumbers that contain '|' characters (16 occurrences for "|z", 37 for "|b"), which is of course also what we are using as our delimiters. ARGH. I deleted it for now with:
```
grep -v '|z' holdings_utf8.lst > holdings_clean.lstgrep -v '|z' holdings_clean.lst > holdings_clean.lst2mv holdings_clean.lst2 holdings_clean.lst
```
  Adjust parse_unicorn.py to match the new input name and generate a new holdings_eg.sql.

Create the import staging table:

psql -f Open-ILS/src/extras/import/import_staging_table.sql

Load the items into the import staging table:
```
psql -f holdings_eg_clean.sql
```
We discover that some more of our data sucks - for example, one item ("Research in autism spectrum disorders", HIRC PER-WEB) has a create date of '0' which is not a valid date format because the barcode is "1750-9467|21". For now, grep it out as above and reload.

Investigate possibilities of collapsing unnecessary duplicate item types:

SELECT item_type, COUNT(item_type)FROM staging_itemsGROUP BY item_typeORDER BY item_type; item_type  | item_count ------------+------------ ATLAS      |        162 AUDIO      |        792 AUD_VISUAL |       1790 AV         |         69 AV-EQUIP   |        182 BOOK       |        996 BOOKS      |     581592 BOOK_ART   |          1 BOOK_RARE  |       4949 BOOK_SHWK  |          5 BOOK_WEB   |      49163 COMPUTER   |         33...(40 rows)

How about locations?

SELECT location, COUNT(location)FROM staging_itemsGROUP BY locationORDER BY location;  location  | count  ------------+-------- ALGO-ACH   |     13 ALGO-ATLAS |    148 ALGO-AV    |   1837...(212 rows)

Now we can collapse categories pretty simply inside the staging table. For example, if we want to collapse all of the BOOK types into a single type of BOOK:

UPDATE staging_itemsSET item_type = 'BOOK'WHERE item_type IN ('BOOKS', 'BOOK_ART', 'BOOK_RARE', 'BOOK_SHWK', 'BOOK_WEB', 'REF-BOOK');

Update legacy library names to new Evergreen library short names (we're using OCLC codes where possible). Some will be straightforward old names to new names. Others will require a little more logic based on location + legacy library name; we're splitting the DESMARAIS collection into multiple org-units (Music Resource Centre, Hearst locations, hospital locations, etc).
```
-- Laurentian Music Resource CentreUPDATE staging_itemsSET owning_lib = 'LUMUSIC'WHERE location = 'DESM-MRC';-- Hearst - Kapuskasing locationUPDATE staging_itemsSET owning_lib = 'KAP'WHERE location LIKE 'HRSTK%';-- Hearst - Timmins locationUPDATE staging_itemsSET owning_lib = 'TIMMINS'WHERE location LIKE 'HRSTT%';
```
Generate the copies in the system:
```
psql -f generate_copies.sql
```
Make the metarecords:
```
psql -f quick_metarecord_map.sql
```

Ah, recognize that any electronic resources (which don't have associated copies) won't appear. Check for 856 40 and change the bre source to a transcendent one mayhaps?

-- Create a new transcendant resource; -- this autogenerates an ID of 4 in a default, untouched systemINSERT INTO config.bib_source (quality, source, transcendant)VALUES (10, 'Electronic resource', 't');-- Make the electronic full text resources (856 40) transcendant-- by setting their bib record source to the new bib_source value of 4UPDATE biblio.record_entry SET source = 4 WHERE id IN (    SELECT DISTINCT(record)     FROM metabib.full_rec     WHERE tag = '856' AND ind1 = '4' AND ind2 = '0');

And no transcendence. Hmm. Oh well, worry about that later.

Evergreen Exposed: introduction to Evergreen development (OLA 2009)

2009-02-01T20:21:00-05:00

Update 2009-02-19: uploaded diffs from Evergreen 1.4.0.2 (EG_exposed.tar.gz) for adding details to record summary; and Bill Erickson's slides and code examples are also available for download

The slides: Evergreen exposed, part 1 (OpenOffice)

My second presentation at the OLA SuperConference 2009 was Evergreen Exposed: hacking the open library system, which promised to “take attendees on a tour of the architecture and source code of the Evergreen library system”. I was very fortunate to have Bill Erickson, one of the original Evergreen developers, agree to join me as a co-presenter. Given the hour-and-fifteen-minute time slot that we were allotted, we opted to take an incremental approach to introducing parts of Evergreen to the audience, starting with basic tasks and working up to more complex customisations. We also tried to focus on answering questions that had been posted to the Evergreen mailing lists to ensure that we would satisfy our target audience's interests.

Dan starts with the basics

I started the session with an introduction of how to create a different skin for the catalogue, starting with text, CSS, JavaScript, and images and extending to the translation and customization framework. We talked about how to future-proof your customizations against future upgrades and how consortia can use skins to provide not just different look-and-feel, but different functionality, for each member of the consortium. Not much more than XML entities defined by DTDs, massaged via Apache server side includes (SSI), but it's an important conceptual building block for both the catalogue and the staff client.

I then ran through the exercise of adding a new metadata export format that brought the Federal Geographic Data Committee's Content Standard for Geospatial Data Metadata (FGDC CSGDM) format to Evergreen's existing list of supported formats. On the one hand: big deal, another metadata format. Hold that thought in that one hand; we'll come back to it later.

I also walked through two other common requests on the mailing lists: how do I define a new index or tweak the behaviour of an existing index and how do I hide or show more information on the detailed record display page? I'll follow up with separate posts for each of these pieces to augment what you have before you in the slides; suffice to say that there's a lot of MODS, a little bit of JavaScript, a smidgin of XPath, a dollop of Evergreen's interface definition language (IDL), and a slice of Perl mixed together. Along the way, I peeled back the covers to show a bit of OpenSRF in operation, setting up Bill's part of the show...

Bill leads us into the promised land

Note I'll update this with a link to Bill's slides when he manages to post them!

Bill gave a quick "big picture" view of how OpenSRF operates, including a much clearer overview of Evergreen's object-relational IDL that maps objects to relational tables. He also covered the cstore OpenSRF application that offers access to the underlying database without requiring SQL but still with support for full transactions (commit/rollback) and sub-transactions (savepoints). During Bill's demonstrations of these features, he exercised srfsh in a way that was new to me - he used the introspect command with a partial method name to perform a left-anchored search for matching method names. Cool!

Oh, and he also showed that if OpenSRF would normally return a reference to an object defined in the IDL, you can ask it to flesh the object in-place with its complete set of attributes instead; and of course if any of those attributes are object references, you have the option of fleshing those as well. It's a lovely way to cut down on chattiness in your application.

From there, Bill whipped out DojoSRF, the OpenSRF-aware extensions for dojo, the JavaScript toolkit that Evergreen adopted as its core JavaScript framework in release 1.4. In 90 lines of HTML and JavaScript code, he implemented a basic but workable catalogue - and then, with a few more lines of code, he gave the audience the payoff for that FGDC CSGDM (geographic metadata) format that I had earlier hacked into Evergreen. As part of the transform separates out the geographic coordinates of the subject matter (in the case of our demo data, maps of Northern California), Bill was able, in just a few more lines of code, to easily extract the coordinates from the FGDC CSGDM representation of the bibliographic material and plot the bounding box for the coverage area on a Google Map image. Very cool.

We had about 15 to 20 people attend our session, and I was happy with that attendance given the extremely technical content and relatively niche product. If as a result we end up adding just one more developer to the Evergreen community, that would be a great outcome. And for myself, I was forced to learn much more of Evergreen - just in time for Project Conifer, I hope

Project Conifer update session at OLA SuperConference 2009

2009-01-30T16:28:00-05:00

*Updated 2009-02-02 to add PDF formatted slides because the free and libre formats just isn't good enough for some people - heh*

The slides, up front and center:

Last year I gave a presentation at the OLA SuperConference 2008 on The State of Evergreen. Yesterday, John Fink and I gave an update on the state of Project Conifer, the partnership between Algoma University, Laurentian University, Northern Ontario School of Medicine, and the University of Windsor to mount a consortial instance of Evergreen for our respective academic libraries.

McMaster University (John Fink's employer) is another Project Conifer institutional partner, albeit with a slightly different relationship. They are contributing resources towards development of academic features, but working towards their own Evergreen instance on their own timeline. Their relationship in the project changed the week before our presentation, so John and I had a fun time adjusting our presentation to match the new reality

In comparison to last year, which was largely an introduction to Evergreen and the state of its various features, this session was much more focused on Project Conifer. John gave the background of the project and the importance of having an open source library system at the core of our academic libraries, particularly given the short-term challenges that most of the Project Conifer participants face with their/our current library systems. I focused on the challenges and lessons learned in managing the project, with most of the challenges being the difficulty of getting skilled resources to work on our development requirements, and most of the lessons learned being in working out cost-sharing agreements and priority-setting procedures early on in the project.

The session was well-attended, and there is clearly growing interest in Evergreen as a viable option, as well a a bit of frustration at the pace of development of some of the features that academics in particular are interested in. These are "interesting times" for academic libraries - this week an announcement has been rippling through the Ontario library community that the BiblioCentre consortial library system that has served many Ontario college libraries since 2003 is being shut down. If Evergreen's academic features were already in place, it would have been a slam-dunk to put together a business case for a centrally hosted Evergreen system to serve the same constituency. As those features are still in active development, it's not quite as easy to make that business case.

Happily, Art Rhyno and Graham Fawcett have taken support for academic reserves for managing both print and electronic materials from ground zero to a reasonable interface in just a few months. They expect to start wiring in direct Evergreen support over the next few months so that we will have a functioning reserves system that goes far beyond our current library system's capabilities ("our" being Laurentian University, in this case).

After an exciting drive from Buffalo on a very snowy Wednesday afternoon, Bill Erickson of Equinox Software Incorporated gave Project Conifer participants a demo of the current state of acquisitions on Wednesday night, and it's not too far from meeting our base requirements. Equinox has hired a second developer to contribute to acquisitions development, documentation is being concurrently produced, and one of Project Conifer's contractors is working on adding EDI support. So we're optimistic that a functioning base acquisitions system will be in place in May - although, as one of our collection development librarians has wryly noted, our budgets might not have any room for book purchases in the coming fiscal year in any case.

A highlight of the session was when I asked Susan Downs, CEO of the Innisfil Public Library, to talk about their success story. In October 2008, Innisfil announced to the library world that they had migrated to Evergreen without any vendor assistance - certainly the first known instance in Ontario, and possibly the first self-migrated and self-supported public library on Evergreen in the world. It was great to meet the people behind that project and I was glad to let Susan share some of her energy, enthusiasm, and insights with our audience.

I had some feedback from one attendee who was happy to see a presentation on an in-process project, with warts and all exposed, rather than the usual post-project stories that quickly put the rough patches behind them (or forget them entirely). I'm happy to do as good a job as I can to represent an objective look at the project - for one thing, it's my job as project manager - and I hope that in some small way I've been able to help others prepare for similar projects.

Adding a new metadata format to Evergreen in a dozen lines of code

2009-01-26T05:29:00-05:00

Just like my last entry, this is a preview of one part of my upcoming session at the OLA SuperConference, Evergreen Exposed: Hacking the open source library system. We know from the last entry that Evergreen internally converts MARC21 to MODS to support item display; and in fact it also includes support for exposing records as OAI, RDF, SRW, and HTML. Today, we're going to be looking at adding support for an entirely new metadata format to Evergreen.

Back in November, 2008, George Duimovich requested "I would like to hear from anyone on the process for adding an additional supported format" in the specific context of the FGDC metadata format for digital geospatial data. George did a great thing to support his request and included links to the metadata format itself, along with a pointer to an XSLT stylesheet that the inestimable Terry Reese had written and published for converting MARC21 to FGDC XML. His request has been burning at the back of my mind since then, partially because I had quickly responded with the oh-so-helpful:

Assuming that we can get over the licensing hump, it should be a

relatively straightforward matter of dropping the transform into

Open-ILS/src/perlmods/OpenILS/Application/SuperCat.pm and

Open-ILS/src/perlmods/OpenILS/WWW/SuperCat/Feed.pm (using something

like MODS32 as a template).

Simple and straightforward, right? Well... yes and no. I had just gone through the process of adding MODS 3.2 support because I needed the more granular treatment of URLs to fix an item display problem, so I was pretty comfortable with the code at the time. After a few months, that familiarity goes away and one gets to go through the discovery process again. (Oh, and about a week after the MODS 3.2 support went in and Mike Rylander went the extra mile to update all of the indexes to use MODS 3.2, MODS 3.3 was released to the world. Sigh).

Without further ado, following are the diffs required to roughly support FGDC as a SuperCat format:

dbs@dbs-laptop:~/source/Evergreen-rel_1_4$ svn diff Open-ILS/src/perlmods/Index: Open-ILS/src/perlmods/OpenILS/Application/SuperCat.pm===================================================================--- Open-ILS/src/perlmods/OpenILS/Application/SuperCat.pm   (revision 11952)+++ Open-ILS/src/perlmods/OpenILS/Application/SuperCat.pm    (working copy)@@ -143,6 +143,18 @@    # and stash a transformer    $record_xslt{rss2}{xslt} = $_xslt->parse_stylesheet( $rss_xslt ); +   # parse the FGDC xslt ...+   my $fgdc_xslt = $_parser->parse_file(+       OpenSRF::Utils::SettingsClient+          ->new+           ->config_value( dirs => 'xsl' ).+        "/MARC21slim2FGDC.xsl"+  );+  # and stash a transformer+   $record_xslt{fgdc}{xslt} = $_xslt->parse_stylesheet( $fgdc_xslt );+  $record_xslt{fgdc}{docs} = 'http://www.fgdc.gov/metadata/csdgm/index_html';+ $record_xslt{fgdc}{schema_location} = 'http://www.fgdc.gov/metadata/fgdc-std-001-1998.xsd';+  register_record_transforms();     return 1;

If you're still with me after that whack of code, and you're counting, that's about 12 lines of code. Okay, I'm cheating - the diff doesn't include the MARC21 to FGDC stylesheet - for one thing, I'm still waiting to see a version of the stylesheet with a license attached to it. For another, do you _really_ want to see all that XSL? After you patch your copy of OpenILS::Application::SuperCat.pm, copy the MARC21 to FGDC stylesheet into /openils/var/xsl, and restart the Evergreen Perl services, you'll be able to take advantage of the new functionality. That's it!

What's going on in this code? This patch against Open-ILS/src/perlmods/OpenILS/Application/SuperCat.pm enables SuperCat (and therefore unAPI) support for the new format. We just add an entry to the hash of XSLT stylesheets that SuperCat knows about, and the rest is visible in URLs like:

http://localhost/opac/extras/supercat/formats/record - list of supported record formats
http://localhost/opac/extras/supercat/retrieve/fgdc/record/1 - display record #1 in FGDC format
http://localhost/opac/extras/unapi?id=tag:localhost,2009:biblio-record_entry/1 - display the record formats that unAPI can return
http://localhost/opac/extras/unapi?id=tag:localhost,2009:biblio-record_entry/1&format=fgdc - return record #1 in FGDC format via unAPI

So who cares about this? Well, George cares, and (I'm guessing wildly here), perhaps it's because his organization has tools that can import FGDC but that also want to maintain the data in their library catalogue because they love MARC. That might be sufficient reason. Another reasonable use case would be to use the FGDC transform to populate spatial data tables built on the geospatial extensions offered by PostGIS and index these for lightning-fast retrieval of maps and map data that cover a given range of coordinates.

I'm sure the same approach could be used for other specialized metadata formats. This is just one example of why I'm sold on Evergreen's capability as a platform for the future of our library.

Fetching item availability from Evergreen using the OpenSRF HTTP gateway

2009-01-20T15:57:00-05:00

This is a preview of one part of my upcoming session at the OLA SuperConference, Evergreen Exposed: Hacking the open source library system. In the Conifer implementation of Evergreen, at least one of the partners plans to use a decoupled discovery layer rather than the Evergreen OPAC. So we needed to answer the typical question "How do I retrieve the availability of copies for a given work at my institution?" Note that this mini-tutorial is based entirely on OpenSRF 1.0 / Evergreen 1.4; OpenSRF 0.9 will generate different JSON output, and the URL for the OpenSRF gateway will be different.

Learning from the old masters: how the Evergreen OPAC does it

The Evergreen OPAC itself relies heavily on JavaScript to dynamically flesh out item details and retrieve item status, so it's actually pretty easy to work out how to do this without even delving too deeply into OpenSRF. First, let's use the Firebug Mozilla extension to follow network requests for a given "title details" page in the OPAC search results for the title: The new world guide to beer. Open up Firebug, enable network monitoring for the OPAC site, and watch the requests flood past for the title details page. We can see that there are a number of POST requests to http://dev.gapines.org/osrf-gateway-v1:

POST request #1 parameters
- method = open-ils.search.biblio.record.mods_slim.retrieve
- service = open-ils.search
- locale = en-US
- param = 8526
This is how we retrieve the title / author / ISBN and other bibliographic details of interest for display; as we're talking about a decoupled discovery layer, we won't need to worry about this piece of the puzzle.

POST request #2 parameters

method = open-ils.search.config.copy_status.retrieve.all
service = open-ils.search
locale = en-US

This is how we retrieve the list of all possible copy statuses that have been configured for this Evergreen system; here's the response (truncated for legibility):

{   "status" : 200,   "payload" : [      [         {            "__c" : "ccs",            "__p" : [               null, null, null, "f", 3, "Lost", "f"            ]         },         {            "__c" : "ccs",            "__p" : [               null, null, null, "t", 0, "Available", "t"            ]         },         {            "__c" : "ccs",            "__p" : [               null, null, null, "t", 1, "Checked out", "t"            ]         },         {            "__c" : "ccs",            "__p" : [               null, null, null, "f", 2, "Bindery", "t"            ]         }      ]   ]}

We're getting a response in JavaScript Object Notation (JSON) format - the nice, compact, easy-to-read data interchange format that almost every programming language under the sun can interpret and generate. Yay!

POST request #3 parameters

method = open-ils.search.biblio.copy_counts.summary.retrieve
service = open-ils.search
locale = en-US
param = 8526
param = 1
param = 0

This is how we retrieve the call numbers, copies, and copy status for a given title. We pass in the the TCN input parameter ("8526"), the numeric ID of the organization being searched ("1" = "every branch"), and the depth of the organization ("0" = top of the hierarchy). The response for this request is:

{   "status" : 200,   "payload" : [      [         [            "127",            "663.42 JACKSON, MICHAEL",            {               "0" : 1            }         ],         [            "130",            "663.42 JACKSON, MICHAEL",            {               "0" : 1            }         ],         [            "125",            "663.42 JACKSON, MICHAEL",            {               "0" : 1            }         ],         [            "34",            "R 641.23 JACKSON, MICHAEL",            {               "0" : 1            }         ]      ]   ]}

Interpreting the HTTP requests and responses

Okay, so we've found a couple of requests that are pertinent to our goal. And you might be able to guess that the fifth element of the __p entry in the copy status response is the numeric identifier for the copy status, while the sixth element is the copy status name (which, as of OpenSRF 1.0 / Evergreen 1.4, if you pass a different locale value can return a translated value).

You might even be able to guess that the response from the copy_counts.summary request returns an array of responses consisting of the organization ID, the call number, and a hash of copy status and the respective counts for each copy status. And you would be guessing correctly. But why guess, when you can get an authoritative interpretation by looking up the class hint (the __c value in the copy_status response of "ccs") in Evergreen's intermediate definition language file /openils/conf/fm_IDL.xml:

<class id="ccs" controller="open-ils.cstore"   oils_obj:fieldmapper="config::copy_status" oils_persist:tablename="config.copy_status">  <fields oils_persist:primary="id" oils_persist:sequence="config.copy_status_id_seq">    <field name="isnew" oils_obj:array_position="0" oils_persist:virtual="true" />    <field name="ischanged" oils_obj:array_position="1" oils_persist:virtual="true" />    <field name="isdeleted" oils_obj:array_position="2" oils_persist:virtual="true" />    <field name="holdable" oils_obj:array_position="3"       oils_persist:virtual="false" reporter:datatype="bool"/>    <field name="id" oils_obj:array_position="4"       oils_persist:virtual="false" reporter:selector="name" reporter:datatype="id"/>    <field name="name" oils_obj:array_position="5"       oils_persist:virtual="false"  reporter:datatype="text" oils_persist:i18n="true"/>    <field name="opac_visible" oils_obj:array_position="6"       oils_persist:virtual="false" reporter:datatype="bool"/>  </fields>

So now, by taking our first steps into Evergreen's object persistence model, we can determine authoritatively that the order of values in the __p array maps to "isnew", "ischanged", "isdeleted", "holdable", "id", "name", and "opac_visible". As for the response from the copy_counts.summary call, well, these are not Evergreen objects (they don't have a __c class hint) - but you can use the OpenSRF shell "srfsh" introspect command to view the documentation for the applicable method:

bash$ srfshsrfsh# introspect open-ils.search... (truncated for legibility) ...Received Data: {  "__c":"OpenILS_Application",  "__p":{    "api_level":1,    "stream":0,    "object_hint":"OpenILS_Application_Search_Biblio",    "package":"OpenILS::Application::Search::Biblio",    "remote":0,    "api_name":"open-ils.search.biblio.copy_counts.summary.retrieve",    "signature":{      "params":[              ],      "desc":"returns an array of these: [         org_id,         callnumber_label,         ,         ,        ...      ]       where statusx is a copy status name.  the statuses are sorted by id.",      "return":{        "desc":null,        "type":null,        "class":null      }          },    "server_class":"open-ils.search",    "notes":"\treturns an array of these:\n\t\t[       org_id,       callnumber_label,       ,       ,      ...    ]    \n\t\twhere statusx is a copy status name.  the statuses are sorted\n\t\tby id.\n",    "method":"copy_count_summary",    "argc":0  }

The introspect output is a bit rough - it's really intended for the doxygen API help interface - but it's good enough for our purposes. If we want to dig into what's going on under the covers, we can follow the package_name value "OpenILS::Application::Search::Biblio" to read the source code for the OpenILS::Application::Search::Biblio Perl module, and look up the method "copy_count_summary" as indicated by the "method" value in the introspect output. That reveals that the input arguments are "($self, $client, $rid, $org, $depth)". Every OpenSRF method automatically receives $self and $client as the first two arguments, so $rid (record ID), $org (organization unit ID), and $depth (organization unit depth) are the variables over which we have control.

Zeroing in on the copies for a particular library or library system

If we want to retrieve the visible copies for just a single organization unit in the entire Evergreen system, we just have to adjust the values of the organization unit ID and organization unit depth parameters accordingly. If we ask for the visible copies for just org_unit ID "125" at depth "2", we narrow down our results to a single hit:

{   "status" : 200,   "payload" : [      [         [            "125",            "663.42 JACKSON, MICHAEL",            {               "0" : 1            }         ]      ]   ]}

So, with all of that ammunition at your disposal, you can write an Evergreen copy status lookup in any decoupled discovery layer that supports HTTP POST or GET requests. Which should be pretty much any discovery layer, right?

Frequently used tools and methods for Evergreen / OpenSRF hacking

Note, the first: you can easily play with different parameter values for the HTTP POST requests using the json_xs command to pretty print the JSON response:

curl -d service=open-ils.search   -d locale=en-US \  -d method=open-ils.search.biblio.copy_counts.summary.retrieve \  -d param=8526 -d param=1 -d param=0 \  http://dev.gapines.org/osrf-gateway-v1 | json_xs -t json-pretty

Note, the second: the OpenSRF gateway also supports GET requests; simply concatenate the request parameters in a single URL like this

Evergreen 1.4.0.0 RC2 and OpenSRF 1.0.1 are out

2008-11-21T03:41:00-05:00

As I announced on the Evergreen mailing lists last night:

One month after the first release candidate of Evergreen 1.4.0.0, the

Evergreen development team is pleased to announce the availability of

Evergreen 1.4.0.0, release candidate 2, from

http://open-ils.org/downloads.php

A partial overview of the changes since 1.4.0.0 RC1:

MARC importer / exporter enhancements
Improved support for marking long overdue items
Z39.50 client enhancements
An interface for switching locales in the staff client
Localization in every interface - although we have undoubtedly

missed a few strings
Bundled Armenian and French (Canadian) translations
Performance improvements for new and changed item feeds
Various staff client, build, and source tree fixes

The complete change log between 1.4.0.0 RC1 and 1.4.0.0 RC2 can be

found here: http://open-ils.org/downloads/ChangeLog-1.4.0.0rc1-1.4.0.0rc2

Please help us reach a solid 1.4.0.0 final release by testing out

1.4.0.0 RC2 with the freshly released OpenSRF 1.0.1 and reporting

problems, sending patches for improvements or fixes, or sending new or

updated translations to the Evergreen Development mailing list.

Coming soon for the 1.4.0.0 RC2 release:

Windows staff client
Updated install instructions at

http://open-ils.org/dokuwiki/doku.php?id=server:1.4.0.0:ubuntu804:install
VMWare image <p>

This release has been a long time in the making, and we'd love to have your help in testing it and flushing out bugs. Also, if you would like to contribute a translation, this is your chance to step up! We already have Brazilian Portugese (pt_BR), Georgian (ka), and Canadian English (en_CA) translations in the works, along with a commitment to update the Canadian French (fr_CA) translation. As this is the first real round of translations for Evergreen, I fully expect that there will be some work ahead of us to smooth out the translation process - but we have to take the plunge some time. Many thanks to Tigran Zargaryan and Natural Resources Canada for their respective contributions of the Armenian (hy_AM) and Canadian French (fr_CA) translations this summer; their willingness to be early guinea pigs for the translation process helped immensely.

Update: I noticed that the speedy Warren Layton beat me to the punch on the blog announcement of the releases. Warren's been very helpful with testing and suggestions for improvements to the documentation, so I don't mind being scooped at all

An Evergreen track at the OLA SuperConference 2009?

2008-10-28T20:10:00-04:00

Just poked at the OLA SuperConference 2009 schedule (January 28 - 31, 2009) and found four sessions listed that are all about Evergreen. Wow! Check this out:


Date	Time	Title	Description (may be abridged)	Presenters
Thursday, January 29	9:05 am	It.s Just a Little Bit of Programming Isn.t It?	“Follow the progress of the Library @ Mohawk.s development of the open source ILS Evergreen. Hear the trials and tribulations and learn from the mistakes and successes that have occurred along the way . we are truly a learning organization on this project. We went live in summer 2008 . come and hear about where we.ve been, where we are and where we hope to be soon.”	Robert Soulliere, Systems Librarian; Cynthia Williamson, Collection & Access Librarian, Mohawk College of Applied Arts and Technology
Thursday, January 29	3:45 pm	Project Conifer: Evergreen library system for Ontario Universities	“Find out how the Evergreen open source library system, originally developed for a public library consortium, is being adapted for academic libraries by three Ontario universities. Discussion will focus on the challenges, successes and mistakes (err, .learning opportunities.) of the project.”	John Fink, Digital Technologies Development Librarian, McMaster University; Dan Scott, Systems Librarian, Laurentian University
Friday, January 30	9:05 am	Evergreen exposed: hacking the open source library system	“Join an Evergreen developer on a tour of the architecture and source code of the Evergreen library system [...] Get ready to get your hands dirty with Evergreen . this will be a session filled with code!”	William Erickson, Vice President, Software Development & Integration, Equinox Software Inc; Dan Scott, Systems Librarian, Laurentian University
Saturday, January 31	10:40 am	Multilingual Language Issues of Open Source ILS	“Discover the Chinese version of Evergreen along with various multilingual issues related MARC standards, encoding, indexing, searching, and sorting especially associated with Chinese language.”	Jason Zou, Systems Librarian, Lakehead University; Guoying (Grace) Liu, Systems Librarian, Leddy Library, University of Windsor

I was responsible for the sole Evergreen presentation at OLA SuperConference 2008 - it's awesome to see a lot more people jumping in this year! I'm keenly anticipating this conference - we'll have to set up at least one Evergreen "Birds of a Feather" session.

Evergreen: deOSSification of library software

2008-10-23T17:45:00-04:00

In a few minutes I'll be giving a talk with John Fink at the Free Software Open Source Symposium at Seneca College on Evergreen: an enterprise-strength OSS solution for library ossification. I'm jazzed!

Here are the slides: (ODP format) (PDF format).

Access 2008 hackfest report: Zotero vs Evergreen

2008-10-07T04:36:00-04:00

Update: 2008-10-07 As of changeset 10774, the detailed record view in Evergreen's dynamic catalog is now recognized by Zotero.

I really like Zotero. And it works really well with Evergreen's current "basic search"

because it embeds unAPI links that enable Zotero to

consume MODS representations of the underlying

bibliographic records and generate a complete citation based on that.

However, Zotero doesn't work with Evergreen's current "dynamic search" interface - which

is a problem, because it is the default search interface. Evergreen embeds a link to the

unAPI server, and fills in the unAPI link via an AJAX call after the underlying XHTML

has been loaded - but it seems that Zotero doesn't

recognize that the DOM has been changed by the AJAX event and never discovers the unAPI

link. So... I had submitted a challenge to Hackfest to fix this, because I really want to

be able to use Zotero with Evergreen when Project Conifer launches.

And, as with every other Hackfest I have attended, I end up working on my own challenge.

In discussing the problem with William from canadiana.org and Walter Lewis from

Knowledge Ontario, I described how the dynamic interface doesn't use any templating (apart

from entity substitution for localization support), that there wasn't really any way to

inject content server side into the underlying XHTML, and that I really didn't want to have

to dig into the guts of Zotero to enable it to parse the DOM after events had completed.

William asked "so you can't even do a server side include?", which ended up breaking the

problem wide open - because yes, we already use server side includes to identify which DTD

to load for localization purposes.

Step 1 was to modify the detailed record display to put the unAPI link template in place,

and to modify the Apache configuration to pass in hardcoded values for each of the SSI

variables. A quick test and - it didn't work. Uh oh.

That led to much scratching of the head. Was Zotero getting tripped up by the masses of

XHTML elements in the dynamic template that are simply hidden? Did it give up after trying to

parse 100K or so of content? Were there differences in the content types being served up by

Apache? The next step was to compare the content of the "basic search" output against the

"dynamic search" output - and that led to one seemingly innocent difference.

The unAPI server link in the "basic search" output included an absolute link to the server,

while the corresponding link in the "dynamic search" output used a relative link to point

to the root of the server. I didn't think that would be a problem, but eliminating variables

is always good - and when I tested with a hardcoded server link, the Zotero hint icon lit

up and the mystery was solved. Between enabling the record unAPI link to appear in the

static XHTML via SSI and changing the unAPI server link to use an absolute value, Zotero and

Evergreen could work together in harmony.

I haven't committed the fix for this yet to the repository, as I haven't finalized the exact

SSI incantations that will be needed to embed the record ID in the unAPI link. But now you

know the solution, and could tackle the problem yourself if you get tired of waiting for me

and feel inspired. And once the problem is fixed, I'll update the post to let you know what

version of Evergreen carries the fix.

Oh, and my hackfest report slides are attached, in case anyone cares.

Access 2008 presentation: Project Conifer report

2008-10-04T23:13:00-04:00

On Friday, October 3rd, I had the honour of presenting the progress of Project Conifer with my colleague John Fink to my peers at Access 2008. Project Conifer is the effort to bring the Evergreen open source library system to a consortium of academic libraries in Ontario (Algoma, Laurentian, McMaster, Northern Ontario School of Medicine, and Windsor).

I'm just going to link quickly to the slides for now, as I'm a little bit brain-dead after the conference. John led off the talk with an overview of what Conifer is all about and why we were motivated to tackle such a large project - he has posted his slides via the SlideShare thingy. Editorial comment: I really enjoy John's presentation style and content. He's a hard act to follow!

And then I rambled on with an overview of the ups and downs of the project so far, the resources we have invested in the project, our progress towards our target go-live date (May 2009), and some sneak previews of the goodies that are included in the any-day-now-if-I-would-just-stop-going-to-conferences-and-apply-myself-for-a-few-days-dangit Evergreen 1.4 release. Well - they're not really sneak previews, because of course you could check the code out of the repository and built it yourself - but it's so much easier when somebody else already has it running, right?

Anyway, my slides are available in both OpenOffice.org Impress format and PDF.

Heating up Evergreen search

2008-08-25T16:23:00-04:00

So, after loading 3.7 million records into the Project Conifer test server, we have found that search can be slow. Not really a big surprise, because I've spent very little time tuning the database beyond running a VACUUM FULL and tweaking just a few parameters. But one of the extremely useful hints that Mike Rylander gave me about PostgreSQL a long time back is that it relies primarily on file system caching to cache access to data, from the reasonable perspective that your file system already knows which files are being accessed most often. PostgreSQL's data is stored in files that map back to individual tables and indexes; unlike some other database systems that I've worked with, you don't dedicate system memory specifically to caching those database files (hello, DB2 buffers!); instead, you just trust the file system to know what's best.

That caching approach works great on a system that's in production and getting a steady stream of queries reflecting what users actually search for on a day to day basis. However, if you've just loaded a test system, then it doesn't have much opportunity to cache and the first dozen (or hundreds, or thousands!) of queries will be slow as your database goes out and loads up files from disk. Even worse, if you have a system like ours where backups have temporarily been set up as "tar czf /backups/backup.tar.gz /", then on a nightly basis your file system cache is going to be filled with all kinds of irrelevant data.

So what are we to do? Well, actually, another extremely useful hint that Mike Rylander gave me was to just run the pertinent data files through /dev/null to load up the file system cache. On the surface, it seems like a dirty hack, but it's a smart one, and we can even make it elegant. Let's walk through the process:

You need to know where your data files are. You (or your system installer) will have created a PostgreSQL cluster. In my case (on Debian Etch), I can find it at /var/lib/postgresql/main/base. Then, by running "du -hs /var/lib/postgresql/main/base" I can see that one of our databases (represented by a directory name that's just an integer - "16385") weighs in at 60GB. That's our 3.7 million record baby. If you run an "ls" command on that directory, you'll see that it's filled with hundreds of files of differing sizes, most of them with just plain integers for their names. This is where the data is stored.

You need to know the base filenames that you want to use to warm up the file system cache. For my first stab at this, I decided to warm up the cache with the full-text search indexes, as I know those are frequently used by Evergreen's search. To figure out the base filenames for these indexes, we can query PostgreSQL's catalog of its own objects:

evergreen=# SELECT relfilenode, relname, relpagesevergreen-# FROM pg_class WHERE relname LIKE '%vector%'; relfilenode |                   relname                    | relpages -------------+----------------------------------------------+----------      648864 | authority_full_rec_index_vector_idx          |    59282      649137 | metabib_title_field_entry_index_vector_idx   |    29766      649149 | metabib_author_field_entry_index_vector_idx  |    20125      649161 | metabib_subject_field_entry_index_vector_idx |    23481      649173 | metabib_keyword_field_entry_index_vector_idx |    90709      649185 | metabib_series_field_entry_index_vector_idx  |     8682      649210 | metabib_full_rec_index_vector_idx            |   452980(7 rows)

relfilenode is the basename of the files that we want to load into the file system cache.

The maximum size of your file system cache cannot be more than the physical RAM installed on your system, so you'll want to tally up the size of the index data files to ensure that their total is less than the total amount of your physical RAM. Note that in the example from our system, below, I'm using "*" because database objects with lots of data will be split between multiple files with extensions like ".1" and ".2" in sequential order:
```
# cd /var/lib/postgresql/main/base/16385# du -hs 649185* 649210* 1065608*68M 6491851.1G   6492101.1G   649210.11.1G 649210.21.1G 10656081.1G  1065608.11.1G    1065608.2842M    1065608.3# du -hs 649207*1.1G 6492071.1G   649207.11.1G 649207.2467M 649207.3
```
Adding all of this up, we're getting close to the 16GB of RAM installed on our database server. If we add any more data, we will want to add more RAM to the system.

Now we warm up the cache by outputting the contents of each file into /dev/null.

# cd /var/lib/postgresql/main/base/16385# cat 648864* > /dev/null# cat 649137* > /dev/null# cat 649149* > /dev/null# cat 649161* > /dev/null# cat 649173* > /dev/null# cat 649185* > /dev/null# cat 649210* > /dev/null

After running through this relatively simple exercise, searches were definitely much snappier on our test system. I plan to automate the process so it runs after every one of those cache-killing backups. If there is interest, I could package it into a simple Perl script that other sites could use to assist with their testing - or to help warm up the file system cache after a large data load, for example.

Academic reserves for Evergreen: request for comments

2008-07-12T20:02:00-04:00

I've posted a second revision of the "academic reserves" requirements RFC. I'm not looking to boil the ocean with the first iteration of academic reserves for Evergreen (that's what third-party systems like ReservesDirect and Ares are for), but I am hoping that by engaging the community in a discussion we can ensure that we build something that satisfies the core set of requirements for academic institutions in the area of reserves. My lack of familiarity with what other institutions with more capable systems, or with local workarounds or third-party reserves systems installed, makes me nervous that I'm missing something obvious. So if you feel like weighing in on the discussion, please address your comments to the Evergreen General mailing list, add a comment here, or send me email if you prefer to keep your comments private.

The biggest change in the second revision of the RFC is the inclusion of a base set of requirements for electronic reserves. For physical items alone, the requirements expressed in the RFC go far beyond the capabilities of the ILS we currently use at Laurentian; getting even basic support for electronic reserves in Evergreen would be a huge win for us when we migrate.

That said, I'll probably start working on implementing a subset of the requirements real soon now; it should be easy enough to make a course correction should something significant turn up during the second round of comments.

(unofficial) bzr repositories for Evergreen branches

2008-07-12T19:46:00-04:00

I wrote a long blog post about the distributed version control workflow that the two Laurentian students working on Evergreen (Kevin Beswick and Craig Ricciuto) are using successfully this summer, only to lose the post to a session timeout and my own lack of caution (note to self: if writing directly in the browser text field, CTRL-A CTRL-C before hitting preview!). So the gist of the blog post was:

bzr, with the bzr-svn plugin, works quite well for cloning and updating from a centralized Subversion repository like Evergreen's; just watch out for memory consumption issues due to memory leaks in the Python bindings for Subversion (fixed in the development version of bzr-svn)
there's no compelling reason for Evergreen to move to a different version control system; it's easy to use a distributed version control workflow with the Evergreen Subversion repository as-is
you can tar up a bzr branch and untar it where ever you like and "bzr up" will immediately happily work (which is how I worked around the severe memory constraints on this server that ended up repeatedly running into the Linux out of memory killer when I was trying to create a bzr-svn checkout from scratch)
it's a hell of a lot faster to check out or branch from a bzr repository than it is from a Subversion repository, so if you're going to take this approach set up one clean bzr repository using bzr-svn and check out or branch from that using bzr, rather than repeatedly using bzr-svn to create new branches

To enable you to get a bzr repo of Evergreen quickly, I've set up (unofficial, of course, but updated hourly) bzr repositories of the most useful Evergreen branches as follows:

UPDATE 2009-10-14: I've stopped updating these repositories because the version of bzr-svn on my server is too old and decrepit to be able to handle the updates. Sorry

Evergreen trunk
Evergreen acq-experiment (acquisitions and serials branch)
OpenSRF trunk

Enjoy!

eIFL-FOSS ILS workshop on Evergreen, day one

2008-06-24T00:34:00-04:00

The following summary is taken almost directly from an email I wrote to one of the would-be participants who was, sadly, prevented from making it to Yerevan due to travel complications. I meant to clean this up earlier and post it, but have not yet found the time - so I might as well just post it as is with most names obfuscated and possibly some additional editorial comments. Those who are new to installing and configuring Evergreen might find this useful; and reading through it, I remembered a few challenges I planned to tackle

Shortly after I arrived on Monday, I was able to try out the

install of Evergreen 1.2.1.4 that A. and G. from the Fundamental

Science Library (FSL) had completed with only two email exchanges with me.

I was very happy to see that they had successfully completed the install!

There was only one minor problem with the structure of the "organizational

unit" hierarchy that I had to fix. After that, we confirmed that we were

able to import bibliographic records from Z39.50 and attached call numbers and

copies to those records. Finally, we tried searching for the records in

the catalogue and were delighted to see that everything was working as

we had hoped. That allowed me to sleep well on Monday, in preparation for the

first day of the workshop on Tuesday.

After the introductions of the workshop participants on Tuesday, I gave the

introduction to Evergreen presentation and Henri Damien Laurent of BibLibre

demonstrated Koha. Both Henri Damien Laurent and I showed our respective

library systems running with an Armenian interface, thanks to the translation

efforts of Tigran! Then we broke into separate Koha and Evergreen groups to

work together on our respective library systems. Of the attendees of the

workshop, E. was the most

interested in migrating his library (with 40,000 volumes) to Evergreen. A.,

from one of the 29 branches of the American University of Armenia (AUA), also

attended most of the Evergreen session. Even though his institution is mostly

interested in Koha, he wanted to be able to compare the two systems. Albert's

colleague S. attended the Koha training session so they would be able to

compare their experiences later. Our group also had R. from the Netherlands

and A., G., and A. from FSL -- apparently Tigran is considering

running Evergreen as a union catalogue, so his IT people are very interested

in learning more.

Our first exercise was to model the organizational unit hierarchy using the

configuration bootstrap interfaces in the /cgi-bin/config.cgi. We began by

drawing the hierarchy on a whiteboard. The "Yerevan Consortium"

represented the Evergreen system as a whole; we added the FSL, MSU, and AUA

systems as children of the Yerevan Consortium, and then added specific branches

as children of each of these systems. While we were creating this hierarchy, I

showed the participants how the organization unit type defines the labels used

in the catalogue as well as the respective depth in the hierarchy for each type.

We then ensured that the systems and branches in the hierarchy had the right

types, and that the types were defined with valid parent-child relationships. We

found a few types that were children of themselves, which causes a problem in

searching. There was also some confusion about the role of types to

organization units, resulting in the creation of types with labels like "FSL"

rather than "Library System". After a few minutes of explanation and working

through correcting the exercises, I think the participants were better able to

understand the relationship between types and organization units.

After we were satisfied with the structure of the organization unit hierarchy, I

ran the autogen.sh script to update the catalogue and staff client

representations of the hierarchy. Well, first I demonstrated how search in the

catalogue will quickly be broken if you do not run the autogen.sh script

Our next step was to register new users with the Evergreen staff client. This

helped introduce the participants to the staff client, as well as giving them

a quick introduction to some parts of Evergreen that still need to be localized

to allow regional variations on postal code formats, telephone numbers, and

forms of identification. The default Evergreen staff client still enforces

American conventions, but fortunately I have had to create patches for Evergreen

to support my own country's standards so I can assure you that it is relatively

easy to change or remove these format checks. In the future, it would be

wonderful to include a localization pack for each locale interested in using

Evergreen that supports regional variations on date formats, phone number

patterns, etc. The participants were pleased with the feedback mechanism in

the staff client that summarized all of the remaining problems with the current

patron record (missing address, invalid phone number, etc) and made it easy to

switch between screens without losing any of the data they had already entered.

Once we had registered new users for each of our branches, we went to work

importing new bibliographic records and attaching call numbers and copies to

those records. This gave us a good opportunity to see how changing the scope

of a search in Evergreen from "Everywhere" down to a specific branch changes

the search results, and demonstrated how the organization type labels are

displayed in the catalogue. As an aside, I should point out that in Evergreen

1.4 (due by the end of this summer), the labels are internationalized so that

different labels can be displayed depending on the locale in which you are

using the catalogue or staff client. Good news for those of us who work in

bilingual or multilingual libraries!

Now that we had records with copies attached and patrons registered in our

Evergreen instance, we were able to use the catalogue's "My Account" features

to try out features like sharable bookbags, account preferences, and the

account summary. Users also have the ability to specify their

own user names and to log in with those instead (which means that they can

simply remember their unique nickname rather than, say, a 14-digit barcode).

The first feature that the participants discovered, of course, was the

strong password enforcement feature. When a patron is registered, the system

automatically generates a random 4-digit password; however, this is not

considered to be a safe password, so when they log in they are forced to

change it to a longer password containing both numbers and letters.

At this point, we also discovered a data validation bug: in the staff client,

it is possible to enter a user barcode that consists of letters and numbers.

However, in the catalogue, user barcodes containing letters are considered

invalid and the system will not even attempt to log that user in; it simply

rejects the barcode. I plan to ask E. to report this bug to the Evergreen

mailing list; it would be an excellent outcome of the workshop if participants

felt comfortable reporting problems to the mailing list, and reporting this

problem in particular would help improve the quality of Evergreen.

Things were going reasonably well, but we noticed that the system was

running into a problem if you tried to edit a bibliographic record after

you had already created or imported the record. I had rather fortunately

already experienced this problem (it is a result of different behaviour

regarding XML namespaces between different versions of LibXML2) and knew

that it had been fixed in 1.2.2.1. So rather than trying to fix the problem

with the installed version of 1.2.1.4, I decided to try upgrading our

Evergreen system to the recently released 1.2.2.1 to demonstrate to the

participants that the upgrade process was fast, reasonably well documented,

and not nearly as complicated as the install process. This was, by the way,

something Randy had urged me to do, so I blame him for the subsequent problems

we experienced (hah!).

The first problem is that the change from 1.2.1.x to 1.2.2.x requires the

installation of a new Perl module from CPAN (JSON::XS). This is not much of a

problem in itself, as the module is very easy to install and compile; however,

given our internet connection I had to wait a long time for the CPAN

repository metadata to be downloaded. The participants were still able to use

the system while this was happening, but we ended up hitting the coffee break

still waiting for CPAN to finish. (As an aside, Irakli and I were discussing

the possibility of having the eIFL-FOSS coordinators investigate setting up

local mirrors of FOSS resources like CPAN to speed up access to frequently

used resources).

When we returned from the coffee break, the JSON::XS install had finished but

the participants were having problems searching and using the staff client. I

checked the logs (using the "grep ERR /openils/var/log/*" command to start

with) and saw that our database connections were dying for some reason. On a

hunch, I checked the system logs ("dmesg") and discovered that the Linux "out

of memory (OOM) killer" had started killing random processes to try to free up

memory. It was killing the PostgreSQL processes, the Evergreen processes -

anything! I was lucky, because I had been reading about the OOM on Linux

after hearing about a Linux user that had run into a similar

problem, and knew that the way to disable the OOM was to prevent Linux from

overcommitting memory to processes in the first place. Wondering why our

system had started running out of memory in the first place, I ran "free" and

saw that it had been set up with no swap space; I confirmed this by running

fdisk to see that there were no swap partitions. Here, however, I made a

mistake. I ran "echo '2' > /proc/sys/vm/overcommit_memory" to prevent Linux

from overcommitting memory to new processes and to prevent the OOM killer from

killing any more random processes. But this also meant that I was immediately

unable to launch any new programs - so I could not safely shut down PostgreSQL

and Evergreen, and we had to turn the power off to the system.

Fortunately, the system started up cleanly again (hurray for journalled

filesystems) and I was able to complete the upgrade before the rest of our

hands on session for the day was finished. A few things that are missing in the

current upgrade instructions:

You have to compile the new version of Evergreen. The easiest way to do

this is to copy install.conf over from your previous version of Evergreen and

run "make config" to ensure that all of the settings are still correct, then

run "make" to build the new version of Evergreen.
Very important: Before installing the new version of Evergreen, you must

prevent the database schema from being completely recreated or it will destroy

any data that is already in your system. One way of doing this is, during the

"make config" step, to list all of the Evergreen targets _except for_

openils_db. I am simply incapable of remembering all of those targets, so my

dirty workaround is to open Open-ILS/src/Makefile in an editor and modify the

"install: " make target by removing the "storage-bootstrap" make target. What

we really need is an "upgrade" target for "make config" that simply installs

everything except for the database schema.
Confirm that the new version of Evergreen has been installed by running

the srfsh command "request open-ils.storage open-ils.system.version".

For tomorrow (today, by the time you receive this), A. and G. are going to

create a swap file to enable the system to swap memory to disk if need be; the

system has 1 GB of RAM, which is enough for a small Evergreen system but when

one is compiling programs at the same time as running Evergreen swap space

really is necessary. This was a very good lesson learned for all of us!

also interested in learning more about basic Linux

administration. His institution currently runs on an entirely Windows

infrastructure, so the requirement to learn Linux is a fairly high hurdle.

I'm hoping that the eIFL-FOSS list will be a good resource for him to start

that journey. He has also asked to go over the step-by-step instructions for

installing Evergreen, so I'm considering starting that in a VMWare session so

that we can run through the steps. Our major goal for tomorrow is to migrate

some data from FSL's legacy system into Evergreen. Wish us luck!

Editorial comment: The combination of Armenian and Russian MARC records refused to load into the Evergreen 1.2.2.1 system, but on the flight home I confirmed that they loaded perfectly and were searchable on my Evergreen development system. As the development version will become this summer's 1.4 "internationalization" release, we are in good shape.

Editorial comment 2:On the second day, while running in circles trying to figure out why the records were refusing to load into the 1.2.2.1 system, I decided to try the #openils-evergreen IRC channel. Yerevan is 9 hours ahead of the Toronto/Atlanta time zone, so at noon Yerevan time I was hardly expecting any of the current core Evergreen developers to be online - yet, to our amazement, Mike Rylander responded. This was a pretty convincing demonstration to the attendees that the core developers really aren't far away or hard to contact at all.

Get out of jail, go free, part I

2008-06-16T20:38:00-04:00

As Mark Leggott mentioned in Vendor to Open Source ILS in 1 Month #1, I had the pleasure of assisting the migration of the University of Prince Edward Island library system from Unicorn to Evergreen. A little over a year ago, in discussing the business case for open source library systems, I stated that one of the problems we faced with migrations is that the license for a proprietary system often inhibits openly sharing of information about how to export data from those systems in machine-usable formats. Thus, the open source library community needs to encourage the development of "migration ninjas". Little did I know that I would soon join the guild of ninjas and become deadly and silent, and unspeakably violent(1)(2).

As a result, I have created a utility script that should be of assistance to SirsiDynix Unicorn or Symphony sites who are interested in exploring the possibilities offered by other library systems. The rather dryly named "export_unicorn.pl" script was added to the Unicorn API repository as entry # 228 today under a GPL v2 license(3). As the script uses the Unicorn/Symphony API, however, I am sadly (to the best of my knowledge) not free to simply share the script with anyone. Therefore, to gain access to the script you must be an API-certified Unicorn or Symphony customer. Still, by making an export script available to SirsiDynix customers that provides the raw data in a relatively standard output format, it should ease the effort required by the migration ninjas for open source systems to massage the data into the needed input formats, and to avoid the tetsu-bishi scattered by the proprietary systems in defence of "their" data(4)(5).

Barenaked Ladies, "The Ninjas". Their website is horrible Flash and JavaScript overkill but damnit Jim, they're musicians, not webmasters; the "Snacktime" album is especially recommended if you have kids.
Although I have to say I'm nowhere near as violent as Mike Rylander, who with his PostgreSQL-fu can carve seemingly any piece of data into the shape needed for import into Evergreen.
Thanks to Mark Leggott for insisting that I retain copyright over the scripts created during the UPEI migration and for allowing me to share those scripts in the appropriate avenues. It's another weapon (shuriken? ninja-to?) in the migration ninja arsenal.
This data does, after all, belong to the libraries who license a library system, but at least one company reportedly has a pattern of repeatedly removing interfaces that enable easy machine-readable access to library data...
I find myself being thankful that Unicorn does provide an API for generating machine-readable data exports; all that it cost our library was a week of my life and the associated training fees and travel expenses

Introduction to Evergreen at eIFL-FOSS ILS workshop

2008-06-16T20:04:00-04:00

I was in Armenia last week, leading a workshop on open source library systems along with Henri Damien Laurent from BibLibre. My charge was to introduce Evergreen and lead participants in two days of hands-on experience with the system; Henri took on the same task for Koha. I cannot say enough good things about our host for the workshop, the Fundamental Library of the National Academy of Sciences of Armenia headed up by Tigran Zargaryan; nor can I offer enough compliments to Randy Metcalfe on his skills in ensuring that everything ran smoothly; nor can I express how rewarding it was to meet representatives of so many different countries and how much I enjoyed their company! I look forward to helping the pilot sites succeed with their implementations.

So, for the short term, I'll simply link to the "Introduction to Evergreen" presentation that I gave at the start of the workshop in

OpenOffice and PowerPoint formats (as I promised to the participants). In the next day or two I plan to post a summary of the workshop activities; some of the lessons learned; and where I think I'll focus my attention next.

Weeding 2.0

2008-05-11T04:07:00-04:00

Okay, this is definitely a lame thing to be thinking about at midnight on a Saturday, but I was just playing with the shelf browser in the Evergreen representation of our 780,000 bibliographic records (okay, that is definitely the wrong thing to be doing at midnight on a Saturday). For some reason, I was wandering through the subject collection pertinent to librarians (pray for my soul), noticed a book that probably should have been discarded years ago, and thought "Gee, i don't want to deal with this right now, but wouldn't it be nice if I could just mark this Weed me and forget about it until Monday?"

Then I realized that that wouldn't be a stretch at all. In Evergreen, users have "bookbags" to which they can add items. These bookbags can be shared as RSS feeds and otherwise easily exported into other formats. If we were running Evergreen for real, I could create a "Weed me!" bookbag, add in the suspect along with a bunch of other festering tomes, and send the RSS feed to a student to perform the manual labour. Or perhaps the RSS feed gets aggregated with other weeders' feeds and a weeding list gets generated on a monthly basis for efficient labour practices. You get the idea.

Of course, you would really want to have more information than just the stock shelf browsing interface at hand when making weeding decisions. For example, you would need a tally of recorded uses displayed beside the item, with the ability to drill down for totals by year. If you participate in a consortial "last copy standing" program, you would want a quick check to see if any other institutions still hold a copy of the resource. So, an enhanced interface would be needed to provide an experience that combines the traditional weeding approach of roaming the stacks and generating reports of items matching some minimum age and minimum usage criteria.

Think about it a little further though (I'm sure you're thinking a lot faster than me at this point; you're probably having the luxury of reading this at the beginning of the day, coffee in hand, invigorated after an early morning run in the lingering late spring chill... or not), and there are points in our institutional workflows where we could naturally introduce weeding activities. How do we get to the point of having three editions of a given text on the shelf? If I have the 1995, 2003, and 2007 editions of a text, I can assure you that when I ordered the 2007 edition I had already checked our ILS to see if we had a copy of that edition already, and would have noticed the previous editions. At that point, I should have the ability to say "Oh - get rid of the 1995 edition now and once the 2007 edition is processed and on the shelf, cull the 2003 edition to boot." If I was designing an acquisitions module today, that's certainly something I would consider as a nice-to-have. Ahem.

Weeding 2.0 may not be a sexy subject. Google and Yahoo each turn up exactly four hits, none of them related to libraries, which is remarkable in this overly-hyped everything 2.0 world. But it's something we should consider in the design and tailoring of our library systems; and while it's not going to rank in my top level of priorities for Evergreen, it will work its way in there somewhere, sometime. Hopefully before the stacks in my subject areas buckle under the weight of unused, out-of-date books.

Tuning PostgreSQL for Evergreen on a test server

2008-04-14T18:48:00-04:00

Update 2008-05-01: Fixed a typo for sysctl: -a parameter simply shows all settings; -w parameter is needed to write the setting. Duh.

Once you have decided on and acquired your test hardware for Evergreen, you need to think about tuning your PostgreSQL database server. Once you start loading bibliographic records, you might notice that after 100,000 records or so that your search response times aren't too snappy. Don't snarl at Evergreen. By default, PostgreSQL ships with very conservative settings (something like machines with 256 MB of RAM!) so if you don't tune those settings you're getting a false representation of your system's capabilities.

The "right" settings for PostgreSQL depend significantly on your hardware and deployment context, but in almost any circumstance you will want to bump up the settings from the delivered defaults. To give you an idea of what you need to consider, I thought I would share the settings that we're currently using on our Evergreen test server at Laurentian University. You might be able to use these as a starting point and adjust them accordingly once you've run some representative load tests against your configuration. And it's useful documentation for me to fall back on in a few months, when all of this has escaped my grasp

The defaults (as shipped in Debian Etch)

The defaults in Debian Etch are quite conservative. Consider that our test server has 12GB of RAM. The default only allocates 1MB of RAM to work memory (which is critical for sorting performance) and only 8MB of RAM to shared buffers. Following are the defaults set in /etc/postgresql/8.1/main/postgresql.conf:

# - Memory -#shared_buffers = 1000                  # min 16 or max_connections*2, 8KB each#temp_buffers = 1000                    # min 100, 8KB each#max_prepared_transactions = 5          # can be 0 or more# note: increasing max_prepared_transactions costs ~600 bytes of shared memory# per transaction slot, plus lock space (see max_locks_per_transaction).#work_mem = 1024                        # min 64, size in KB#maintenance_work_mem = 16384           # min 1024, size in KB#max_stack_depth = 2048                 # min 100, size in KB# - Free Space Map -#max_fsm_pages = 20000                  # min max_fsm_relations*16, 6 bytes each#max_fsm_relations = 1000               # min 100, ~70 bytes each

Our test server settings

Our test server has 12 GB of RAM. Assuming that the PostgreSQL defaults were set for a system with 1 GB of RAM, we should be able to multiply the memory-based settings by at least a factor of 12. We're a little bit more aggressive than that in our settings. Note, however, that this is a single-server install of Evergreen, so we're also running memcached, ejabberd, Apache, and all of the Evergreen services as well as the database - oh, and a test instance of an institutional repository, among other apps - so we're not nearly as aggressive as we would be in a dedicated PostgreSQL server configuration. Please note that I'm making no claims that this is the optimal set of configuration values for PostgreSQL even on our own hardware!

# shared_buffers: much of our performance depends on sorting, so we'll set it 100X the default# some tuning guides suggest cranking this up to as much 30% of your available RAMshared_buffers = 100000 # 8K * 100000 = ~ 0.8 GB# work_mem: how much RAM each concurrent process is allowed to claim before swapping to disk# your workload will probably have a large number of concurrent processeswork_mem=524288 # 512 MB# max_fsm_pages: increased because PostgreSQL demanded itmax_fsm_pages = 200000

After you change these settings, you will need to restart PostgreSQL to make the settings take effect.

Kernel tuning

In addition to PostgreSQL complaining about max_fsm_pages not being high enough, your operating system kernel defaults for SysV shared memory might not be high enough to support the amount of RAM PostgreSQL demands as a result of your modifications. In one of our test configurations, we had cranked up work_mem to 8GB; Debian complained about an insufficient SHMMAX setting, so we were able to adjust that by running the following command as root to set the kernel SHMMAX to 8GB (8*1024^2):

sysctl -w kernel.shmmax=8589934592

To make this setting sticky through reboots, you can simply modify /etc/sysctl.conf to include the following line:

# Set SHMMAX to 8GB for PostgreSQL#kernel.shmmax=8589934592

Other measures

Debian Etch comes with PostgreSQL 8.1. The first version of PostgreSQL 8.1 was released in November 2005. That's a long time in computer years. Version 8.2, which was released less than a year later, "adds many functionality and performance improvements" (according to the release notes). If you're not getting the performance you expect from your hardware with Debian Etch, perhaps a backport of PostgreSQL 8.2 would help out.

Further resources

This is just a shallow dip into PostgreSQL tuning for Evergreen - hopefully enough to alert you to some of the factors you need to consider if you're putting Evergreen into a serious testing environment or production environment. Here are a few places to dig deeper into the art of PostgreSQL tuning:

PostgreSQL manual, resource consumption section of server configuration: version 8.1 and version 8.2
An annotated version of the 8.0 parameters with more explicit advice is available at ` <http://www.powerpostgresql.com/Downloads/annotated_conf_80.html>`__
Some good advice is buried about halfway down Christopher Browne's page under the heading "Tuning PostgreSQL", along with links to further resources
The "Performance Whack-A-Mole" presentation at PowerPostgreSQL is a great tutorial for holistic system tuning

Test server strategies

2008-04-10T00:39:00-04:00

Occasionally on the #OpenILS-Evergreen IRC channel, a question comes up what kind of hardware a site should buy if they're getting serious about trying out Evergreen. I had exactly the same chat with Mike Rylander back in December, so I thought it might be useful to share the strategy we developed in case other organizations are interested in piggy-backing on our research. We came up with three different scenarios, depending on the funding available to the organization and how serious the organization is about testing, developing, and deploying Evergreen.

You can also look at the scenarios as stages, as the scenarios enable

progressively more realistic testing. An organization can always

start with a single server and add more servers over time; if you can

swing a significant discount for buying in bulk, however, it might

make sense to bite the bullet early.

Some pertinent facts about our requirements: we will eventually be loading around 5 million bibliographic records onto the system. We're an academic organization, so concurrent searching and circulation loads will be low relative to public libraries.

Scenario 1: A single bargain-basement testing server

In this scenario, the organization purchases a single server for the short

term, and configures it to run the entire Evergreen + OpenSRF stack:

database
Web server
Jabber messaging
memcached
OpenSRF applications </p>

This server needs to have powerful CPUs, large amounts of RAM, and many fast (10K RPM or higher) hard drives in a

striped RAID configuration (the latter because database performance

typically gets knee-capped by disk access). A "higher education" quote online from a reputable big-name vendor for a rack-mounted 2U database server with 2x4-core

CPU, 16GB RAM, 6x73GB RAID 5 drives comes in at approximately $7000.

This scenario is fine for development and testing with a limited

number of users, but if you intend to do any sort of stress testing

with this server or throw it open to the public, performance will

likely grind to a halt. Note: This is close to the system that we're currently running at http://biblio-dev.laurentian.ca - 12 GB of RAM, 2 dual-core CPUs - with 800K bibliographic records and pretty snappy search performance. It's certainly nothing to sneeze at.

Scenario 2: one database server, one network server

In this scenario, you purchase a database server and a network server.

We'll use the same specs from scenario 1 for the database server, and

a CPU + RAM-oriented server for the network server (disk access isn't

a factor for the network apps, so you just buy two small mirrored

drives). The stock higher education quote for a rack-mounted 1U

network server with 2x4-core CPU, 16GB RAM, 2x73GB RAID 1 drives is

approximately $5250.

This scenario will support development and testing, as well as enable

you perform relatively representative stress testing runs with a

significant number of simultaneous users.

Scenario 3: two database servers, two or three network servers

In this scenario, you purchase two database servers so that you can test

database replication, split database loads between search and reporting, and two or three network servers to test

different distributions of the caching and network apps across the servers to

determine the configuration that best meets your expected demands. The cost of the five servers adds up to less than $30,000 - less than a single traditional proprietary UNIX server - and would be less if you can negotiate a bulk discount.

The third scenario supports development and testing, and will give you

practical experience with a configuration that would approximate your

production deployment of servers. When you go live, you could move one of the database servers

and all but one of the network servers over to the production cluster, and revert back to scenario one for your ongoing test and development environment.

The Conifer approach

We opted to go with the third scenario to build a serious test cluster for our consortium. However, the "scenarios as stages" approach ended up being our strategy as our original choice of Dell servers came with RAID controllers that do not work well under Debian. After returning the servers to Dell, we were forced to press one of our backup servers into service as a scenario-one style server while waiting for our new order from HP to arrive.

Progress with Project Conifer

2008-03-27T02:15:00-04:00

Project Conifer is the effort by McMaster University, University of Windsor, and Laurentian University to put together a consortial instance of Evergreen. A few weeks back, we agreed that May 2009 would be our go-live date. So the clock is ticking quite loudly in my ears.

Today I got an Evergreen test server up and running, loaded with the records from the consortium of Laurentian partners. I hit a few bumps on the road, but eventually successfully loaded about 800,000 bibliographic records and about 500,000 items. I also turned on the Syndetics enrichment data, so some items offer cover images, tables of contents, reviews, and author information. The response time is pretty snappy (it's running on a 4-core server with 12GB of RAM).

Things that made my task harder than it probably should have been:

yaz-marcdump generated invalid XML when I converted our MARC records from MARC21 to MARC21XML format. Maybe this problem is fixed in later versions of yaz-marcdump (I was using the stable Debian Etch version, 2.1.56, which is crazy old), or I could have tried marc4j or MarcEdit instead to try for better results, but I didn't, and it cascaded into problems with...
Dumping all of the holdings as part of the bibliographic records threw things off when some of the records had so many holdings attached (think a weekly periodical that a library circulates and therefore each issue has its own barcode) that they spilled over MARC's record length limit, resulting in multiple MARC records just to hold the holdings - which causes some problems for the basic import process. I eventually punted on trying to parse the MARC21XML for holdings and just dumped the data I needed directly from Unicorn in pipe-delimited format.
Not tuning PostgreSQL before starting to load data into the database was just plain stupid. The defaults for PostgreSQL are incredibly conservative, and must be modified to handle large transactions and to perform. Here are the tweaks I made for our 12GB machine, starting with the Linux kernel memory settings:
```
# -- in /etc/sysctl.conf --# Set SHMMAX to 8GB for PostgreSQLkernel.shmmax=8589934592
```
```
# -- in /etc/postgresql/8.1/main/postgresql.conf --# Crank up shared_buffers and work_memshared_buffers = 10000work_mem=8388608 # 8 GB, equal to our kernel.shmmaxmax_fsm_pages = 200000
```
Evergreen depends on accurate fixed fields to determine the format of an item. Unfortunately, many of our electronic resources appear not to have been coded as such... so we have some data clean-up to do.

Ah well: as Jerry Pournelle used to say in his Chaos Manor column, "I do these things so that you don't have to." Hopefully it makes a smoother path for others to get to Evergreen.

Evergreen Acquisitions at VALE's Next Generation Academic Library System Symposium

2008-03-15T16:34:00-04:00

On Wednesday, I was fortunate enough to join a distinguished panel

of speakers and a crowded music hall at VALE's Next Generation Academic Library System Symposium at The College of New Jersey. I had been invited to

present an update on the state of acquisitions support in Evergreen, as well

as to provide a brief overview of Project Conifer (the collaboration

between Laurentian University, McMaster University, and the University of

Windsor to create a consortial implementation of Evergreen).

To summarize what I intended to be the main points of my

presentation (which may or may not have come through in real life):

Project Conifer is an existing effort to create a shared consortial implementation of Evergreen for academic institutions; we would be delighted to have others join forces with us
If acquisitions isn't as far along as we would have hoped by now, it's because
- We (the Project Conifer institutions) haven't contributed enough
  
  development resource to the effort thus far - although we are planning to
  
  correct this problem in the near term by hiring one or more developers to
  
  work on the requirements that we, as academic institutions, need for a
  
  successful Evergreen experience. If you're interested in a position as an
  
  Evergreen developer for Project Conifer,
  
  let's talk.
- Creating an enterprise-grade acquisitions system demands much more
  
  effort and attention to detail than creating a simplistic acquisitions
  
  system that would be acceptable for a small library. If it took two years
  
  to build Evergreen's circulation, cataloging, reporting, and OPAC functionality
  
  from scratch, it's not unreasonable that it should take a year or more to
  
  build an acquisitions system to the same standards as the rest of Evergreen
Evergreen acquisitions has made significant progress since December 2007,

and at this pace we expect a complete set of basic functionality to be in

place by the end of April. By "basic functionality" I mean that the manual

acquisitions mode should be supported with a minimalist user interface. MARC

order record batch loading, EDI send/receive support, and a more polished

user interface will take some more time - probably September-ish 2008. You can see the in-development, regularly updated bare-bones interface at http://acq.open-ils.org/oils/acq/base/index.

I have to say that Equinox is making incredible progress considering that

they're still doing the bulk of the work with the same amount of development

resource that they had before Georgia PINES went live on Evergreen, and

they started their own company, and they started bringing BC PINES on line,

and they began receiving an onslaught of requests for visits and presentations

and conference calls... imagine what we could do with Evergreen, together,

if a few more sites or consortiums were able to devote human or

financial resources to enhancing Evergreen.

Here are my slides in OpenOffice and PowerPoint format. If you're going to

look at my slides, I highly recommend reading the presenter notes that I wrote;

I've recently realized that presenter notes are as much for the benefit of a

disconnected audience as they are useful preparation material for the presenter. In the absence of a full paper on the subject matter at hand, presenter notes should help flesh out the brevity forced by slideware.

A huge thanks to Ed Corrado, Anne Hoang, and Kurt Wagner for making the overall experience

so enjoyable. I was honoured to be part of such a high-quality panel of

speakers.

Oh, and as an aside - the entire symposium was videotaped, and the

presentations and question and answer sessions will be made available

from the VALE Web site. I will update this post when those become available. I

wonder if Ed got this idea from code4lib... in any case, I certainly applaud

the initiative.

Update: Umm, more polished acquisitions will likely be available in Sept. 2008, not 2007... thanks to Brad Lajeunesse for pointing out that time travel would be required to make that happen

Evergreen workshop at code4lib 2008

2008-02-26T13:44:00-05:00

Yesterday morning we (Bill Erickson, Sally Murphy aka "Murph", and I ran an Evergreen workshop (rough agenda, presentation, and links to associated resources from that page) for the code4lib 2008 preconference session. My personal goals were:

Walk people through a simple Evergreen install
Get a small set of bib records and holdings imported
Attract some more developers to the project by demonstrating how seductively simple it is to add a new service to Evergreen at the OpenSRF layer and then expose it in the catalogue or staff client
Show off some of the great features of Evergreen that haven't had nearly enough exposure (reports, "fresh meat" feeds, exporter interface)

Problems

Problem #1: I started organizing the pre-conference too late. To save time on the install section, I asked attendees to prepare by setting up a VMWare image or bootable Debian or Ubuntu partition and get a bunch of the prerequisite packages installed ahead of time. But by the time I sent my request out, the attendees only had a few days to prepare - and many of them probably hadn't worked with VMWare before, so they suddenly had another learning barrier to overcome. I wasn't too surprised when only about 25% of the room had been able to "do their homework".

Problem #2: I lost at least six hours of preparation time when, due to my own stupidity, I left my passport in a hotel in Atlanta and ended up having to drive across the border from Vancouver to Portland, Oregon. Six hours, man... that's almost a full day thrown away, which is critical when you've left things too late (see problem1). Continuing on the negative side, all I could listen to during the drive was completely formulaic rock stations and political rhetoric worthy of 10-year-olds as I drove through Washington. If radio is a dying medium, I have a very good idea why...

Problem #3: We ran into bizarre projector problems that, for some reason, prevented us from being able to see our laptop screens at the same time as the projected screen. This laptop worked fine with the projector at the OLA Superconference just a few weeks ago, and Bill was afflicted by the same problem - so it really put a crimp in my ability to switch from the presentation to the live install image. My neck was wrecked from constantly twisting around to peer up at the screen while trying to do some minor mousing around.

Problem #4: I severely underestimated how long the install process would take when trying to support a whole group of people at once; you're guaranteed to have a question on almost every step. When we were preparing for the workshop, we had this idea that we would take a hard line and spend no more than one or two minutes on each step - which certainly would have saved a lot of time. But when you've made a connection with the audience, and people have made it through the first dozen steps, it suddenly becomes a lot, lot harder to simply abandon them with the promise that you'll help them later. So we ended up spending something like 2 hours on the install (including a break) rather than the 45 minutes we had been aiming for.

Problem #5: We were overly optimistic about how much we could get done in 2.5 hours. Even without the severe compounding of our time crunch by problem4, in retrospect it's clear we would still have been rushing through all of the other pieces. I think we knew that anyways, but we were just so excited about showing off Evergreen that we wanted to show off as much as possible.

It's not really all that bleak though. There were successes, too.

Successes

Success #1: We have at least one person who successfully made it through the install phase and who successfully imported the bib records and holdings, and several others who feel they are very close to finishing. I'm hoping that we can spend a few minutes over the course of the conference to help them reach that finish line.

Success #2: We have a real example of how to import holdings into Evergreen now. This is something that people have been asking for on the list, and I'm really happy to have been able to package up what Mike Rylander provided with a set of sample records and a sample "parse holdings" script that hopefully others will be able to adopt to their own needs.

Success #3: I had feedback from a number of people who, even though they weren't trying to go through the install, still felt it was worthwhile getting an explanation of all the pieces that OpenSRF and Evergreen depend on and how they fit together. I think it was clear that the complexity involved in installing Evergreen isn't so much OpenSRF or Evergreen themselves as it is a few finicky details involving networking - largely ejabberd and Net::Domain's insistence on specific and sometimes conflicting definitions of hostnames.

Success #4: Bill did get to quickly demonstrate how to add a new OpenSRF service ("reset my password and email it to me") and how to integrate that into the catalogue. It was rough and dirty code, but at approximately one page of Perl code and about 10 lines of JavaScript I think it was a convincing demonstration of how easy it is to extend Evergreen.

Success #5: We have laid the groundwork for an Evergreen workshop now, and having gone through the experience once we'll be able to refine the concept for future events. One idea that we've already kicked around is to split it into several tracks so that attendees can self-select what they're interested in and so that we can give enough time to each section. Say, two (or three) hours for an installfest; two hours for "exploring the dark corners of Evergreen"; and two hours on developing and extending Evergreen (OpenSRF, catalogue, staff client). Or we could have spent the entire pre-conference day on Evergreen.

Reflection

I think it might have been really cool if we had worked with LibraryFind and Zotero to set up an ongoing theme throughout the three pre-conference sessions. We could have collaborated on pre-requisites, so that the LibraryFind install could go on top of the same image as the Evergreen install, and then the newly installed Evergreen image could have been added as a LibraryFind source during the LibraryFind administration section. Then, during the Zotero session, Evergreen and LibraryFind could have been added as new sources for capturing citation information (by making Evergreen and LibraryFind generate COInS objects that Zotero understands or giving Zotero the ability to understand the various formats that Evergreen offers via unAPI).

Of course, it also would have required a heck of a lot of pre-conference planning. A suggestion I would make for next year's pre-conference organizers would be to communicate as much as possible ahead of time to set expectations and help your attendees determine what your agenda should be. We could have just thrown out the entire Evergreen install section, had people get comfortable with a pre-installed VMWare ahead of time, and focused most of the session on developing and exposing OpenSRF services, for example, if that's what our attendees wanted.

The State of Evergreen: OLA Presentation

2008-02-02T03:56:00-05:00

Well, despite getting less than four hours of broken sleep before my 9:00 am presentation, I think I successfully delivered an update on Evergreen:State of the Open ILS to approximately 45 people at the OLA Super Conference today. There were some great questions from the audience that kept me on my toes. Thank heavens David Fiander was there to provide colour commentary and solid advice. Overall, the talk seemed to be well received.

Perhaps the most pleasant surprise of the session was when I discovered that one of the libraries close to my old home town has been working for the last six months on migrating to Evergreen. Marvelous!

If you want the slides from my presentation, I've licensed them under a Creative Commons 2.5 By Attribution license. Presentations available below in two different formats:

As if you didn't see it coming...

2008-01-08T18:58:00-05:00

My employer, Laurentian University, issued a press release today announcing that we have selected Evergreen as our future library system. I wrote more about this on the Evergreen blog, but what I didn't say was ... yay!

We still have a long road ahead of us, but knowing that we'll be migrating to a system that I can poke with a sharp stick and make it do my bidding goes a long way towards making me feel warm and fuzzy inside.

I predict that we'll see a few more announcements from universities and colleges in North America joining the Evergreen development effort / adoption process in 2008. Outside of Ontario, I know about the University of Utah' s interest and interest from a New Jersey consortium of academic institutions (see session h. "Open Library Systems and NJ: From Vision to Transformation")... are there other academics who have made public statements of interest in Evergreen that I'm missing out on?

Ain't no way to treat your CODI

2007-11-16T01:01:00-05:00

Wow. Eileen R. Kontrovitz, a board member of CODI (Customers of Dynix, Inc.) wrote, as part of her summary of the recent CODI conference:

Many very nice things happened at the conference but the buzz, the thing everyone who was not there wants to hear about is the unannounced, invitation only meeting with Martin Taylor from Vista Equity Partners about open source. Mr. Taylor began the meeting in a very cordial way and with a charm about him that belied the basic message he proceeded to give. That message was, we know more than you do and if you don't like it you can go to some other "happy place". He actually said that on more than one occasion.

I guess Vista feels quite threatened by open source library systems, even though they currently account for only 1% of the US public library market today, if they're willing to have their top brass lay on the fear, uncertainty, and doubt campaign to their top customers behind closed doors. It also seems that Mr. Taylor's tactics have backfired, at least in this case; after praising the common SirsiDynix employees, Eileen goes on to say that Vista's attitude has almost ensured that her library (Ouachita Parish Public Library) will not move to Symphony. This, even though Eileen says:

...I happen to agree with his assessment of open source for a library ILS at this point in time and the many problems with the very nature of the beast without some kind of regulating body...

Perhaps this should be the subject of another blog post, but I believe there are actually a number ways Evergreen is regulated. First is that an open source project is not an "anything goes" project: the committers for the project act as a level of quality control for Evergreen. If code doesn't further the Evergreen mission by contributing towards stability, robustness, flexibility, security, or user-friendliness, then it's simply not going to go into the project proper. Second is that at least one company (Equinox) is staking its success on Evergreen, and others are starting to build up business around Evergreen. They're not going to sit back and rest easy; they know that they have to enhance Evergreen beyond its current core strengths if they want to build inroads into markets like academic libraries. Third is that Evergreen's open source license also acts as a regulator - the code that comprises Evergreen can never be pulled from the market, so the future of Evergreen is always in the hands of the community.

Laurentian goes ever greener

2007-10-12T16:36:00-04:00

This is slightly in advance of our official press release, which is currently in translation, but I will be giving / have given a lightning talk at Access 2007 on this subject and have decided to make the following materials available:

Report: Assessing Evergreen for an academic bilingual library
Evergreen Business Readiness Rating: (OpenOffice) (Excel) - see OpenBRR for more information about the Open Business Readiness Rating templates)
Presentation: Lightning talk: Going Ever Greener at Laurentian University

Committing to Evergreen

2007-09-09T02:35:00-04:00

Yesterday, over on the Evergreen blog, Mike announced that I am now a full committer to the Subversion repository for Evergreen. (It was blog post #100 for Evergreen, by the way - two milestones in one!). The road to getting here was pretty standard fare for open-source projects: submit patches that do useful things (like simplify build processes or add i18n support); listen to feedback about those patches and incorporate those lessons leanred into the next patches; and repeat, as described in Evergreen's contribution process:

From time to time, and as individual community members become more familiar and skilled with the complete codebase of Evergreen, some individuals may be asked to join the core team. We see this as both an honor and a responsibility, as this group is charged with being the final quality control mechanism for the source code, as well as helping other less experienced community members come up to speed. It is not simply a way to get code into Subversion, but also about mentoring new contributors and helping to keep the overall vision of the project in focus, tempered by the history and evolution of the code and lessons learned from past successes and failures.

I'm not just tooting my own horn, here. I think it's important to emphasize that the Evergreen community is healthy, welcoming to newcomers, and growing. I am honoured to join the Evergreen team (as Mike says, "again"), this time as a committer - and I look forward to helping the Evergreen community continue to grow.

If you're interested, there are plenty of ways to help us - through your contribution of use cases, documentation, graphics and design, patches, translations, testing... Hmm. I'm repeating myself a bit here See you on the lists / IRC!

Open source in libraries: community = strength

2007-08-31T01:58:00-04:00

Karen G. Schneider has a great post on Enterprise Open Source on the ALA TechSource blog:

But the truly significant activity in LibraryLand technology hasn't been vendor-driven. It has been the maturation of what I call "enterprise open source": products such as Evergreen and Koha that are robust, well-implemented library automation packages with strong development communities and equally strong funded-support models.

Hear hear! Karen examines the value of open source, and finds that it's not so much in that it's a lower-cost alternative (although that can be a persuasive argument), and not so much that you have the ability to modify the code (although that can also be a persuasive argument), but that it depends on the strength of the community to continue to exist and improve. And that makes it a very good match for libraries, because we seem to do "community" better than most other industries.

So let me take a different tack than Karen, and assume that if you've read this far that you're interested in supporting open source for your library, but maybe you don't have a programmer on staff. How can you help?

Well, there are many ways other than programming to contribute to an open source community. Evergreen, for example, just posted a call on its development mailing list for help in defining and prioritizing the requirements for its acquisitions and serials modules. If you have experience with these areas, and have blue-sky ideas for how you could build a better system, this is a great opportunity to step into the conversation. There's a bit of a parallel here to proprietary systems, although with a proprietary system it's called a "request for enhancement" and most of those tend to get filed in the distant future. With Evergreen's invitation for discussion, you _know_ the developers are happy to listen to the ideas for making the best possible product. They don't have prior baggage holding them back, so they really can start from square one - and they have a huge incentive to do better than the existing options, because they want to convince you to pick Evergreen the next time you're thinking about your next ILS.

Or you can contribute your hard-won experience and knowledge to the documentation wiki. Evergreen and Koha both have wikis to which you can contribute. Interestingly, there is a parallel here to at least one proprietary vendor, which set up a wiki (behind a password-protected site) after many requests from their user group. It boggles _my_ mind, but some of these same customers have also argued that they (the customers) should pool their efforts and write a new set of manuals for the product for which they are paying support and licensing fees. I'm sure the Evergreen and Koha projects would really appreciate your assistance in writing a good set of manuals, and they won't charge you for the privilege, either.

You can also participate simply by joining in the conversations on the mailing lists or chat rooms (#OpenILS-Evergreen on Freenode for Evergreen, #koha on Freenode for Koha). You'll take some time to get familiar with the products, no doubt, but once you've climbed over the brick walls (with the help of the others on the mailing list), you will have the opportunity to pay it back a dozen times over as others face the same walls that you faced. I see this same principle on our proprietary vendor's mailing lists. The customers do a far better job of supporting each other than the vendor to whom they're paying support and licensing fees.

And it feels good, working together to build something that belongs to an entire community. For the little bits that I've been able to do, I get a huge sense of satisfaction. It's a nice little addiction.

Wrapping up the AcqFest

2007-07-24T04:12:00-04:00

Well, I'm finally back from Atlanta and the Evergreen AcqFest. I'll apologize right off the top for not providing more blog updates over the course of the weekend, but the requirements and design discussions were pretty intense so I didn't want to risk continuously missing subtle but important details and not being able to participate intelligently by live-blogging the event. After each full day of work, we "unwound" with a serious meal--which, after some socializing, usually involved slipping back into kicking more design and implementation ideas, problems, and potential solutions around. By the time I got back to the hotel room, I was either completely wiped out, or itching to commit something to the group document / or play with some code. So I hope you understand (all three of you that are reading this!).

On top of everything else, it was a bit of a gruelling trip back. In order to save a few hundred dollars and make a greener transportation choice for the final leg of my journey, I took the bus back to Sudbury. Hello, five-hour layover in Toronto and a packed six-hour bus ride (thanks to Hwy 401 construction) back home! It was quite a relief to get back and see the family.

Anyways, here is a mini-summary of what we accomplished:

Agreement on some realistic time frames for acquisitions and serials development: call the stages one philosopher, two philosopher, and three philosophers
After kicking the left-of-field idea of using a calendar server to handle serials schedules, coverage information, predictions, and claiming events for a day or so, and clearly invoking the concern of at least one library blogger, Mike had a brilliant idea for how to represent all of this natively in PostgreSQL. He's going to take a few days to work through a proof-of-concept to ensure that it's as solid as it sounds, so I won't give away the details just yet...
Agreement on the requirements for basic item-at-a-time acquisition workflow support (to be implemented first) and more advanced acquisitions support (batch orders via MARC record import, integrated vendor discovery API support, EDI support)
Agreement on adding internationalization support to OpenSRF. Right now OpenSRF (the messaging infrastructure on which Evergreen depends) knows nothing about locales. We've been able to use URL tricks to support translation of the catalog interfaces thus far, but Mike worked through the changes that will be required to pass locale as a property of each session. This will enable the service being invoked to "do the right thing" if locale is of a concern to whatever output it returns.
_Almost_ agreement on how to add internationalization support to Evergreen. We worked through a number of different scenarios for supporting translation of dynamic strings (library names, for example) that reside in the database, from a single table that holds all of the translated strings, to an i18n schema that holds tables that parallel any table in another schema that holds translatable content. We settled on the latter schema. I say "almost agreement" because until something gets committed to code, I have a feeling that this is still subject to change a little bit
Exposure to some parts of Evergreen that many of us hadn't seen before -- in particular, the reporting interface that Evergreen provides is extremely powerful and well-designed. It even supports basic line and bar charts for adding punch to your presentations.
Art introduced us to OFBiz and OpenTaps via an online training video, and later on Art and Ed successfully played around with the Java OpenSRF client via BeanShell. My takeaway lesson about OFBiz if I ever need to customize something built on it: it's all in controller.xml!

There was a lot more than that that we covered, but for now I have to get some shut-eye. For any skeptics out there, the actual acquisitions and serials workflows employed at our constituent libraries were used as testbed scenarios for the discussions about Evergreen's serials and acquisitions requirements and design. I'm feeling good about the work we accomplished, I think we found some elegant solutions for some of the age-old problems in these areas, and I think we have a common understanding of the path forward.

On the road again: Evergreen acqfest

2007-07-18T01:21:00-04:00

So I'm taking off tomorrow for Atlanta to spend four days deeply immersed in discussing, designing, planning, and implementing Evergreen's acquisitions and serials support. At least that's the plan. In our spare time (heh), we're going to tackle the internationalization infrastructure as well. The spirit of the event is modelled loosely after the Access conference hackfests, and has therefore been dubbed Acqfest. Unlike hackfest, however, where the journey itself is usually the goal, with Acqfest there's a much stronger emphasis on actually getting things done. I may, err, acquire a slight southern accent after a few days, but I mostly hope to increase my understanding of Evergreen while kicking in some design suggestions, code, and documentation here and there.

Laurentian is covering my travel--at this point in our evaluation of the future of our systems, it's in the library's interests to give me an opportunity to stare deep into the heart of Evergreen--and my local arrangements are being covered by BC Public Libraries, Georgia PINES, and Equinox Software. I'm contributing my time and, uh, expertise. All round, I think the whole community is going to benefit from the Evergreen Acqfest. Assuming I have a few minutes, I'll try to post some updates on our progress over the next few days.

Evergreen 1.2.0-rc1 is out! And so is the Gentoo VMWare image...

2007-07-07T18:43:00-04:00

So, yesterday afternoon Mike Rylander from the Evergreen (a.k.a Open-ILS) project pushed out the first release candidate of Evergreen 1.2.0. Hurrah! If you tried installing Evergreen before, but got hung up on some of the build, install, or configuration steps, I think you'll find this release a lot easier to deal with. For example, there's one less configuration file to deal with now -- bootstrap.conf is a thing of the past.

I'm happy to point out that I've updated my Gentoo-based VMWare image of Evergreen as well: Evergreen 1.2.0-rc1 (479M). Along with that, I've updated my instructions for installing Evergreen on Gentoo to reflect the newer, simpler install process.

Know your sources: Evergreen / Koha comparisons

2007-06-24T02:33:00-04:00

Correction update: 2007/06/26Wow. I am incredibly embarassed. Somehow, I made a very stupid mistake in my summary of the State Library

of Ohio ILS Options Discussion Meeting Minutes - April 24, 2007. The mistake was that I incorrectly attributed Joshua Ferraro of LibLime with making statements about Evergreen at that meeting when he was not even present. All of the statements about Evergreen should have been attributed to Stephen Hedges. I apologize profusely to Josh for this mistake, and will repeat this correction and apology in the section of the blog entry closer to the text.

I have been in the process of gathering information about the possible future

of our library system, with a focus on SirsiDynix's Rome, Evergreen, and Koha,

for a number of months now. This results in having to sift through claims from

a number of different sources about the capabilities (present and future) for

all of these systems. In the context of a recent post on the open-ils.org blog (Lies, Damned Lies, and Library Automation Software),

as well as all of the shilling that will undoubtedly be going on on the exhibit

floor and in the hospitality rooms of ALA, and finally building on

Karen Coombs' post on Bias, Objectivity and Authority, I would

like to make a point that ideally shouldn't need to be made (especially for

librarians!), but sadly seems to be necessary in the context of discussions about Evergreen and Koha.

The point? "Know your sources." And "Check your facts." When you've been given information about

something, you don't blindly accept the information as given - you check the

references and determine the authority of the source. This is par for the

course for reference librarians educating patrons performing research in their

libraries, but oddly enough seems to be a common blind spot when it comes to

performing evaluations of the software that powers your libraries.

Problem #1: Evaluating information about a given product from the company or

organization that stands to benefit from your adoption of that product.

This is especially hard when dealing with companies offering proprietary

products that won't hand you an evaluation copy to try out in your own

organization; or that run closed mailing lists; or that don't make their

documentation or support infrastructure openly available.

But it can also be hard with companies or organizations offering open source

products or support for open source products. The company or organization may

point you at an online demo of their product, but that demo may reflect a

heavily customized, bleeding-edge version of the product that mere mortals

cannot install - or even more insidiously, may include code that is not

currently included in the open source repository.

The good news on the open source front is that independent contributors have

made VMWare images of some of the most popular library systems available

(Evergreen,

Koha) for

download that reflect a standard install of the product directly from the open

source repository.

Note that a common approach to marketing a product is to provide a feature

list - basically a checklist of features. A naive decision maker might assume

that more is better, which often results in products breaking down the

features that they do well into many sub-features. It's a form of checklist

inflation - but as long as you've got your eyes open, at least it's more

information rather than less. For each feature you're actually interested in,

you have to ask a couple of additional questions:

How many other libraries are currently using this feature? It may be

great that the software you're looking at includes LDAP authentication as an

option, but if there's only one other library using the feature it's

unfortunately likely that they will be using a different LDAP directory

product than you (Novell eDirectory vs. MS ActiveDirectory vs. OpenLDAP vs.

IBM Directory) and they will be using it in a different way.
Is this feature part of the base package, or is it an optional extra

that's going to cost me more? Not such a problem with the open source

options, although it depends in that case whether you're buying commercial

support for the product. The model that the company uses may be all-inclusive

or a menu of different costed support options.
Is this a massive feature that hasn't been broken down into

sub-features? The danger here is that, for the purposes of looking good in

feature comparisons, a product may have added a number of "features" that

really just scratch the surface of what comparable products offer.

For example, if a library systems product says it has a serials module and an

acquisitions module, you have to dig into what that really means. Does the

"serials module" just mean that it will spit out a routing list whenever you

check in a new issue of a given serial? Or does it mean that it handles

predictions, claiming, holdings, etc., in the way that meets your library

needs? By "acquisitions module", does the product mean that it simply records

the cost of each item that you have acquired? Does it allow you to make

on-order items visible in the catalogue, with the ability to place holds? Does

it include EDI capabilities? Does it provide a complete fiscal management

system with funds and reporting and electronic record import / export hooks

for the ERP system that your university or municipality uses so that costs and

invoices don't have to be manually entered multiple times in multiple systems?

Perhaps most importantly, does the system have the flexibility to adapt to your

needs, or does the system require you to adapt to its needs? Can you live with

the 80% of functionality that most sites need, or does your site live in the

long tail of requirements. In the case of serials, for example, do you need

the ability to specify any pattern, or can you just deal with irregular

patterns those as exceptions?

Problem #2: Evaluating information about a given product from a company or

organization that offers a competitive product.

Sales people make it their business to know their competitors so that they can

accomplish two goals:

Focus attention on (and often embellish) their own product's strengths,

and know how to spin responses to their own product's weaknesses that might

be identified by a competitor.
Identify the weaknesses of their competitor's product, particularly when

their own product has comparable strengths in that same area. Note that these

weaknesses don't necessarily have to be real, they just have to be believable

and hard to disprove.

Among the proprietary options, without having access to a hands-on test system,

the system documentation, or the product mailing lists, it is incredibly hard to

verify claims about a products' strengths or weaknesses. Even for claims about

the future development of a proprietary product that you already have access

to, the company cannot be held liable if plans change. Companies can, for

example, cancel an entire product even after beta versions of the product have

been released into the wild for "development partners." Horizon 8, anyone?

Development partners: test our beta for us!

Oh - on the topic of "development partners" - this is typically a euphemism for

"we'll give you a discount on product XXX if you put it into production and

report all the bugs you find." Companies love this approach because it gives

them visibility in the marketplace ("Look, we already have deployed product XXX

to five sites! It's proven and ready for you!") while enabling them to

effectively continue development on product XXX and hope to have a polished

product ready in time for the bulk of their potential customer base to actually

adopt it. In the past, Microsoft very effectively used the "product announce"

to prevent customers from purchasing a competitor's product that offered

compelling features and stalling the decision long enough to then develop and

bring their own product to market.

Disinformation and open source projects

Surprisingly, the disinformation approach works even in open source projects.

For example, I have read and heard claims about Evergreen like: "Oh, Evergreen

is just for massive consortiums / it needs 40 servers to run / it doesn't scale

down to just a single library." You can see how this could be believable if you

don't push too hard on the claims, because Evergreen's was developed for a

consortial library system and much has been made about the impressive server

cluster that GPLS runs Evergreen on -- however, having run Evergreen on a single

VMWare machine on my laptop, I can personally attest to its ability to scale

down to a single server (or portion thereof). And you can run that same VMWare

image on your own laptop or spare desktop machine and disprove that claim

yourself; but many of the decision makers do not have the technical skills,

time, or interest to get hands-on with products like Evergreen. So they have

to trust what they read or hear, hopefully from the most trustworthy of

sources.

Another swipe at Evergreen is that it is not a true open source project; that

its history as a top-down project initiated by GPLS means that there is no

real development community around Evergreen. If you've followed the Evergreen

development

mailing list, you wouldn't believe a claim like this, and you would

proclaim it a blatant lie. To disprove this claim, you just

need to browse through the open-ils-dev mailing list and look for the emails

with the subject keyword "PATCH" and you'll see that some of us have indeed

been contributing patches to the source code. Beyond that, you'll also see

that there are many volunteer contributors for install and configuration

support, documentation, creation of VMWare images. So how could someone make

such a claim about Evergreen and get away with it?

It's all about trusting "authorities", not checking sources, and integrity (or perhaps

a lack thereof). Here's an excerpted quote about Evergreen from the State Library

of Ohio ILS Options Discussion Meeting Minutes - April 24, 2007

The documentation for the process is very poor, which is typical because it is

the last thing developers are thinking about. ... The source code is open but

they don't really follow the "playground" rules for the open source production

process.

Here's where you need to really know your sources and check your references.

Note that the claims about the nature of Evergreen as not being a true open

source project are credited to the introductory speaker, Stephen Hedges. Who

is Stephen Hedges? He was the director of Nelsonville Public Library (NPL)

when he worked with Joshua Ferraro to install Koha as the NPL integrated

library system. In addition, he is listed as the contact for Koha documentation

submissions. It seems, then, that he has a fairly significant personal stake in

the success of Koha, and if the meeting minutes accurately capture his

statements about Evergreen, it sounds like he was interested in dissuading

attendees from seriously considering Evergreen as an option. Subjectivity alert: as one of the volunteer contributors of code, documentation, install assistance, and a VMWare image of Evergreen from outside GPLS, this quote got me pretty hot under the collar; I've contributed to other open source projects, such as the Linux Documentation Project and PHP, and you always have to prove that you understand the project before being granted commit access.

Correction update: 2007/06/26Wow. In the following paragraph, I somehow made a very stupid mistake by incorrectly attributing Joshua Ferraro of LibLime with making statements about Evergreen at that meeting when he was not even present. All of the statements about Evergreen should have been attributed to Stephen Hedges. I apologize profusely to Josh and LibLime for this mistake.

Who was Stephen introducing as the guest speaker of honour on the subject of

open source ILS options in libraries? The speaker was Joshua Ferraro,

president of LibLime, the company best known for offering commercial support

for Koha. LibLime did announce that they would offer commercial

support for Evergreen, and have added sections about Evergreen to their Web

site, so it would on the surface seem to be a logical choice to invite a

LibLime employee as a one-speaker-fits-all host to cover both Koha and

Evergreen. However, LibLime has a rather unusual relationship with Evergreen.

It seems that LibLime has positioned Evergreen among their other offerings as

such a high-end product that only a handful of potential customers would qualify

for that market:

Evergreen

For consortia who need:

* Scalability to hundreds of libraries, tens of millions of records

[LibLime Products]

It sounds impressive, but way too high-end for the vast majority of

libraries. So of course people browsing the LibLime Web site will focus on the

Koha options instead. It seems like a deliberate bait-and-switch

move to attract libraries interested in Evergreen after the successful launch in

Georgia, but to get them to buy support for Koha instead. Consider: LibLime

has not contributed a single patch to the Evergreen development (open-ils-dev)

mailing list. LibLime has not contributed a single line of documentation to

the Evergreen wiki. LibLime does not include Evergreen among their demos.

LibLime hasn't made an Evergreen sale. So I think it's a fair question to ask

how committed LibLime really is to Evergreen - is LibLime's claim to support

Evergreen just a means to get people in the door, in hopes that they'll walk

out with a copy of Koha under their arms? I think so. You can come to your own conclusions.

In case you think that Ohio quote was just an unfortunate one-off, and that

I'm making a big deal about nothing, here's a more recent quote from the

Open

Source Session Q&A of the "Everything You Ever Wanted to Know about Open

Source" conference held on June 6th, 2007 that caught my attention (and

which apparently no-one in attendance at the meeting was capable of providing

a rebuttal to):

Q. Contrast Koha & Evergreen?

A. Major difference: Koha was grassroots: started w/rural libraries, distributed organization, bottom-up decision making. Evergreen: PINES library system; top-down decision making. Koha: 800 libs worldwide, 8 years old; Evergreen: 1 year old, 1 consortium.

So there's the comment about the "top-down" nature of Evergreen again, and

this time Evergreen is being attacked for being immature and not very widely

used. (Note: on that very day, the British Columbia Ministry of Education

announced the BC PINES Website - so

another consortium is getting on board the Evergreen express.) If there really

are 800 libraries using Koha, I'm shocked at how many basic install, config,

and runtime problems are being reported on the Koha mailing lists with the

current 2.2.9 release... but I'm getting off-topic. The speaker was

[STRIKEOUT:once again] Joshua Ferraro, who:

... talked with us about open source integrated library systems,

specifically Koha and Evergreen, and about his company,

LibLime... [

href="http://blogs.umass.edu/ealling/2007/06/06/open-source-session-reflections/">reference]

If your definition of "talking about" is "praise the product that pays your

bills and criticize the product that represents a major threat", then mission

accomplished. You can't blame the speaker for being in a perfect position

to pitch his product at the expense of a competing product, while being credited

with being an objective authority on both products. But I suspect the audience actually wanted a balanced presentation

about the two products.

So what's my point? Know your sources. If you invite someone to speak

on a broad topic, such as the State Library of Ohio meeting, where [t]he invitation was expanded to include any

library interested in the possibility of open source integrated library

systems (ILS), you might want to ensure

that any personal biases are very much out in the open for your audience

(both the in-person audience and the audience reading the meeting minutes at

home). If you're the speaker in such a situation, you should reveal any such biases.

If you're a company selling a

product or services related to a product, perhaps it's inevitable that the

profit motive is going to override ethics in such opportunities - but I can

dream. If you read the full minutes from the State Library of Ohio meeting,

you can see that in terms of an open source ILS option, Evergreen is given

only the most cursory coverage and the major focus is on selling Koha.

Getting a fair comparison

For a fair comparison of Koha and Evergreen, please consider either

hosting two separate presentations (you wouldn't consider asking SirsiDynix

to give a balanced presentation on all of the proprietary ILS options, would

you?), or try to find an independent speaker who can provide a more objective

analysis of the products at hand. Ask the speaker if they have any

financial ties to the products at hand. Heck, has anybody asked Marshall

Breeding and Andrew Pace if they've had any financial ties to ILS companies? I

assume the answer is no, but our community relies so much on their analysis of

the overall library systems landscape with so much financial implication for the

companies at question that it would be comforting to have a positive assertion

accompany any "state of the ILS landscape" articles in the future.

Ideally, you would find a member of the development community for each of

Evergreen and Koha. At the moment, I'm afraid that I can only qualify as a

member of the Evergreen community, but I plan to become more familiar with

Koha's codebase over the course of the summer - so maybe I can grow into that

position. Of course, then you would have to trust me. C'mon, you can trust me! **grin**

Make technology, not war

What would I like to avoid? I would really like to avoid negative energy being

invested in a Koha vs. Evergreen or LibLime vs. Equinox battle royale. That

doesn't interest me, but I'm sure it greatly interests the companies offering

proprietary products. Instead, I hope that this energy can continue to be

invested in making both Koha and Evergreen better by those with the technical

skills. Let's have a competition on product design and implementation, rather than

on marketing spin or dirty tricks. Everyone benefits from strong open source

library systems - even if you don't adopt an open source system, it raises the

bar for the proprietary systems to differentiate themselves.

Evergreen and the business case for choosing an open source ILS

2007-04-22T13:56:00-04:00

Due to a sad event, Art Rhyno asked me to be his co-presenter at the OLITA Digital Odyssey 2007. Our broad subject was Evergreen, more specifically introducing the Evergreen ILS to an audience that was aware of Evergreen's existence but wanted to know more about it from both a technical and a business perspective. I had two days' notice to prepare for the presentation, so I split my time between polishing the VMWare image of Evergreen and creating the slides for my presentation (PDF).

Art gave a general introduction to open source development, told the story of how Evergreen came about, and described its architecture and the capabilities currently demonstrated on the in-production system at PINES. Perhaps of most interest to the audience, Art talked a bit about the direction that he's taking Woodchip, the serials and acquisitions module based on Apache OFBiz that the University of Windsor has agreed to develop for Evergreen. No pressure, Art

Then the presentation was handed off to me. I started by asking for demographic information from the audience; to no surprise, about half of the audience of approximately 60 ran Horizon systems. Many of the attendees in the audience paid more than $20,000 annually for support and licensing costs. Most of the sites had the equivalent of one full-time position devoted to the care and feeding of their current library system.

The goals of my presentation were to:

Demonstrate that the library community has a strong culture of self-support with respect to library systems (based on the volume of email on our closed library systems mailing lists)
Suggest that the quality of official support we receive from our closed library systems does not warrant the annual support fees we pay
Point out that we already devote personnel to the care and feeding of our closed library systems, so the refrain of "open source is like getting a free kitten" is fine given that we're currently paying for a dog of a closed system
Urge the audience to consider what a waste of money and time it is to train staff to learn the proprietary API, templating language, etc for a closed system when that knowledge can become useless if the system is pulled from the market -- while investing money and time in learning the API and templating language for Evergreen results in reusable skills for your personnel because those are based on open standards.
Let the community know the preliminary results of my evaluation of the internationalization support offered by Evergreen
List some of the challenges that we face in achieving a wide adoption of Evergreen
Notify the community that my VMWare image is really, truly, close to being released and suggest that it would be a great way to get started with Evergreen
Run a quick live demo with the VMWare image to prove that a full install of Evergreen can scale down to running in a virtual machine with 512M of RAM

My self-assessment? I did not want to come across as an open source zealot; rather, I wanted to point out where our current relationships with our vendors are failing us and how open source can fill in some of those gaps. Unfortunately, I feel that I probably veered a little too much towards the rant side of the continuum a couple of times -- my passion for this subject came through, no doubt, but it was perhaps a little too strong.

I knew my presentation was text-heavy, but I didn't beat myself up too much because a good visual presentation needs more than just a couple of days to come together and I didn't have a variation of this already in the can somewhere... this was brand new content. I was pleased that I came up with and shared the visual image of migration ninjas. As the closed vendors' licensing terms might prevent us from openly sharing migration kits or migration how-tos, the “migration ninjas” would be the community's system gurus who would slip into a library and perform the secret, inhuman feats necessary to migrate from a closed system to an open system.

I wasn't at all happy with my live demo. First, I failed to arrange with the conference hosts to obtain an Internet connection, so the cover art in the catalog and the Z39.50 copy cataloging in the staff client facets of the demo were a bust. Second, while I knew it would be an exploratory live demo, given that I had just achieved a full working install a few days prior to the session, it's not very impressive for an audience to watch a presenter fumbling around the command line in response to a question about the API. Third, I failed to show off some really cool features of Evergreen such as the shelf-browser (although without cover art it wouldn't have been nearly as impressive). I tried firing up the reports Web interface and failed. So, now that I have a working install, I'll be able to prepare a much better live demo in the future - I just hope that our audience didn't take away a bad impression from our session on Friday.

Questions from the audience

We had some good questions from the audience; here's what I can remember. Please add more to the comments on this post, if you have them!

Why is there so much interest in Evergreen and why aren't we hearing much about Koha?

Dan said something about how his first investigation of Koha revealed evidence of classic MySQL dependencies and assumptions in the codebase that, as a former product planner for IBM DB2 relational database, made him cringe. Evergreen, in comparison, is built on PostgreSQL which was reassuring. I failed to note at the time that Evergreen has been developed so that it can support other databases, although some work would be required to convert to the SQL dialect and full-text search required by the target database.

Art mentioned that while Koha had been quite popular internationally for the past number of years, it had not been as popular in North America. Part of that reason may have been a severe scalability problem that kicked in somewhere around 450,000 records. Dan suggested that problem could be traced directly to MySQL 3 / 4, but that it might have been alleviated in MySQL 5 (which Koha does not yet support). Art noted that Koha ZOOM, using indexdata.dk's Zebra indexing engine, overcame that performance problem but some extra care was required to commit updates to the index.

What about the dangers of someone forking the code?

In my opinion, we didn't really answer this question well. Art didn't think that a fork was likely as Evergreen had been built with the best-of-breed components and plenty of input from the PINES library staff and community. What I should have added was that the ability to create a fork of a project is actually a wonderful feature of open source - it enables communities to route around projects that become overly bureaucratic, or closed to new developers, or not interested in input or exploring new directions.

You (Dan) talked a lot about the benefits of a system built on standards. Can you show us what the Web templating language looks like?

I fumbled this one badly. I quickly brought up footer.xml, but that doesn't contain any dynamic content so it was a bad example. I then suffered from presentation brain and couldn't remember the word “introspect” to demonstrate srfsh's ability to introspect its objects. Finally I (lamely) showed an example of a srfsh API request.

Summary

I believe that a solid business case needs to be developed on a library-by-library basis or on a consortial basis for migrations to Evergreen. I think that my presentation provided some useful input to those business cases, but in and of itself is not enough. Certainly, as our own library considers its options in the coming years, we're going to have to have a much more solid set of criteria before we can make any decision. I encourage you to take what you can from the presentation and improve, polish, and contribute your own analysis back to the Evergreen community so that you can help other libraries make an informed decision.

Evergreen VMWare image -- oh so close!

2007-04-18T00:11:00-04:00

Many of you know that I have been working on step-by-step instructions for installing Evergreen on Gentoo on the official Evergreen documentation wiki. At the same time, I have been working on using that documentation to create a VMWare image of Evergreen -- this avocation dates all the way back to the ILS Symposium hosted by the University of Windsor in November, 2006. I owe endless thanks to miker, berick, bradl, and phasefx from the Evergreen development team for all of their assistance with my annoying questions over the past months.

Obligatory defining of terms: What is a VMWare image, and why should I care? VMWare is a virtualization product (the "VM" stands for "virtual machine"). Virtualization is a technology that allows you to run one or more "guest" operating systems on top of a "host" operating systems. So, let's say you're really interested in trying out Evergreen, but don't have a spare computer to install Linux on, or don't have the time or interest in learning how to compile packages from source on Linux, or don't have much Linux experience -- you can install the free (zero dollar, but not open source) VMWare Server on any Windows computer, download an Evergreen VMWare image to your computer, and start up the Evergreen image. In less than an hour (assuming you have good banwidth to download VMWare Server and the Evergreen image), you can have Evergreen-running-on-Linux, running in a virtual machine on top of Windows. That's the basic testing / evaluation test case for virtualization, anyways. For some small libraries, this may in fact be all that they need for a production library system -- but that's a discussion for another blog post.

One more note on virtualization technology: there are other virtualization options, like Xen or Bochs. But VMWare is the 900-pound gorilla on the scene, and it's what I happen to have the most experience and success using, so that's why I'm working with it. But it's an open community, so if you've got the skills to create images for other virtualization software, go for it!

The good news is that Evergreen appears to be running cleanly on my system. The OPAC works, albeit without any bibliographic entries at the moment as I'm still pestering miker with questions about the MARC record and holdings import process. But getting a working install seemed like the more important first task. Importing holdings and patron information is going to require different steps depending on which ILS you are currently using, so this should be a reasonable starting point for an image.

In my documentation, I haven't attached the exact set of configuration files that I have used in the VMWare image, but I can do that if people indicate that they are desired. If you have questions about anything that seems missing from my documentation or why I made certain choices, I would be glad to share that information with you and correct the docs. But rather than supplying just the docs and config files, I suspect the whole VMWare image would be more generally useful in the short term. I'm guessing that most libraries interested in kicking the tires of Evergreen don't want to spend a large chunk of their evaluation period working out the installation kinks, but just want to get right to the hands-on portion of the evaluation.

So, Sunday night I uploaded my first version of the image and shared the URL with a few close contacts, asking them to flush out any bugs. Kudos to dmcmorris for indirectly leading me to discover that I had missed a minor dependency. Another upload last night, and I'm anxiously awaiting the feedback from my comrades in arms. If all goes well, a VMWare image of Evergreen should be available for download by the end of the week. Crossing fingers...

Evergreen internationalization chat

2006-11-17T05:11:00-05:00

I managed to corner Mike Rylander after Brad Lajeunesse waved his hands in surrender and offered Mike up as a sacrifice to my questions about Evergreen's support for internationalization. If you're travelling to Canada to tout a piece of (or multiple components of) software, you can be sure that somebody in the crowd is going to be interested in knowing how capable that software is of supporting a bilingual community. As Laurentian University is a bilingual institution, I took it upon myself to be "that guy" and grill Mike a bit on that topic. The good news is that he survived the grilling, and didn't earn the nickname "pork chop"; the better news is that it sounds like Evergreen hits most of the internationalization requirements on the head.

OPAC interface can be multilingual; Georgia has a large Spanish community and PINES is in the process of translating the OPAC interface into Spanish
- Sorting results alphabetically (for browsing by author / title) is problematic, however:
  - PostgreSQL doesn't have a good locale implementation for collating sequences
  - Probably not as much of an issue for French / English as it would be for Finnish
Search currently ignores diacritics (e == é == è), but this setting can be changed in TSearch2
Subject heading equivalency is possible for the simple use case of "when I search for History--United States--19th century, also show me records with Histoire--Les Etats Unis--19e annee" (or whatever the real LCSH/RVM equivalence would be)
- This possibility is based on authority records containing both sets of headings -- we can probably rely on / or possibly participate in the EU project to generate equivalence for LCSH, RVM, and German subject headings to seed this data
Staff client is mostly multilingual-ready (hasn't been a priority requirement for PINES):
- Most strings are contained in XML files, but there are still pockets of hardcoded strings
- Switching the locale would immediately load the new strings in the staff client interface
- "JavaScript doesn't have a good sprintf() implementation" -- check to see whether this suggests that token order can't be rearranged. LibX seems to manage to be able to do this.
Forgot to ask about boolean operators (e.g. AND / ET, OR / OU)