Monday, February 9. 2009
I was tagged by Lukas for the "7 things" meme, and meant to do something about it, but I've been kind of preoccupied with the new baby and the sprinting toddler and work. Anyway, it seems like a heck of a lot more reasonable than the evil Facebook's "25 things" meme, so I'm going to take a few minutes to try to play along.
- I was an early riser until I was around 15 or 16 years old and discovered the surrealists. At that point, I began experimenting with sleep deprivation as a means of stimulating my prose and poetry. This is also when I began drinking coffee. After about a month, I was no longer capable of being an early riser--and the fruit of writing experiments was, uh, not too impressive.
- Rather than going directly to university after high school, I elected to take what is now termed a gap year. No trips to Europe for me, though; the goal was to refine my bass-playing and music-reading skills and head to a post-secondary music program. I recorded a few prog-rock tracks in a studio with a fantastic couple of guys (hey Pete and Mike!), but ultimately didn't put enough effort into my bass to carry out the plan. Let me assure you that a year of working night shifts at a convenience store in the entertainment district of a small city is not a waste of time; I can't count the number of experiences that I'm thankful for having had during that time.
- Although I roast and grind my own coffee, I'm not a coffee snob. In fact, I possess almost no sense of smell and I suspect that my sense of taste is limited in comparison to most people, and I'm quite happy to drink diner coffee. I cannot stand the taste of Starbucks coffee, however.
- The first time I was able to run a full kilometre without walking was when I was eighteen. Since then I've run a couple of 5K races and and a sprint duathlon.
- I'm pretty sure I was destined to become a systems librarian. When I was 10, I used to hang out at the local college's computer room until the students would log me onto a completely restricted Gandalf mainframe account so I could pretend to be Matthew Broderick in WarGames. My first real job, when I was 14, was as a "computer page" at the Barrie Public Library Children's Annex. It was my responsibility to oversee the use of the bank of Commodore 64s that the library made available to children, luring them in with games but requiring them to complete their allotment of educational software first. Oh the power.
- I occasionally wrote reviews for random CDs that came into the campus newspaper office. Nobody else wanted to review this orange CD called Tragic Kingdom by some West-coast band, so I took it on. I gave it a savage review; I wasn't impressed with faux-ska and couldn't stand the lead singer's voice. Six months later No Doubt's "SpiderWebs" was in high rotation on every radio station in North America (look, folks, that song is repetitive enough without being played twice an hour!). I'm sure that my negative review still gnaws at Gwen Stefani today as she weeps bitterly in her platinum mansion.
- In grade one, my report card read Dan is too critical of his classmates. In my defence, if they weren't so stupid--come on, sound it out buddy--I wouldn't have been critical. Okay, not much of a defence.
- I am not a very demanding friend. I (almost) never call, (almost) never write, and (almost) never visit. Okay, scratch that: I'm a crappy friend. Most of my close friends found out that we were expecting a second child only through Lynn's Facebook account. I called one couple shortly after Arik was born and his quasi-namesake (one of the Eric's in our life who bring honour to the noble name) asked me after a few minutes: "So, uhh... did we know that you were expecting a baby?". No, no you didn't, and that's not your fault. Man I suck.
- I'm really good at arithmetic.
Link your original tagger(s), and list these rules on your blog.
- Share seven facts about yourself in the post — some random, some weird.
- Tag seven people at the end of your post by leaving their names and the links to their blogs.
- Let them know they’ve been tagged by leaving a comment on their blogs and/or Twitter.
Wow, that was fun. Lemme see, I'm going to break the rules and just tag two people: Helmut, because he's one of the only other people who worked on the ibm_db2 PHP driver out of passion rather than as a job assignment. And Gabriel because I like his style.
Sunday, February 8. 2009
Updated 2009-02-25 00:29 EST: Corrected setuptools installation step.
Updated 2009-02-08 23:39 EST: Trimmed width of some of the <pre> code sections for better formatting. Created bzr repository for unicorn2evergreen scripts at http://bzr.coffeecode.net/unicorn2evergreen
I did this once a long time ago for the Robertson Library at the University of Prince Edward Island. For our own migration to Evergreen, I have to load a representative sample of records from our Unicorn system onto one of our test servers. This has been a good refresher of the process... and a reminder to myself to post the other part of the Unicorn to Evergreen migration scripts in a publicly available location. Okay, they're posted to this bzr repository called unicorn2evergreen
- Export bibliographic records from Unicorn using Unicorn's catalog key (basic sequential accession number) as the unique identifier (I plopped the catalog key into the 935a field/subfield combo). I use the catalog key because the "flexkey" is not guaranteed to be unique within a single Unicorn instance - and because the catalog key makes it easy for us to match call numbers and copies.
- For each item, export call number / barcode / owning library / current location / home location / item type using the catalog key as the identifier.
- Set up the organization unit hierarchy on your Evergreen system. You can dump it from an existing Evergreen system into a file named "orgunits.dump" like so:
pg_dump -U evergreen --data-only --table actor.org_unit_type \
--table actor.org_unit > orgunits.sql
Then drop all of the existing org_units and org_unit_types and load your custom data in a psql session:
BEGIN;
SET CONSTRAINTS ALL DEFERRED;
DELETE FROM actor.org_unit;
DELETE FROM actor.org_unit_type;
\i orgunits.sql
COMMIT
- Import bibliographic records using the standard marc2bre.pl / direct_ingest.pl / pg_loader.pl process. Point the --idfield / --idsubfield and --tcnfield / --tcnsubfield options for marc2bre.pl at 935a (yes, this sucks for title control numbers, but as noted above they are not guaranteed to be unique in Unicorn and we need uniqueness in Evergreen). We need the bibliographic record entry ID field to be the catalog key to set up subsequent call number/barcode matches.
- Enable the subsequent addition of new bibliographic records by setting the sequence object values to avoid conflicting ID / TCN values by issuing the following SQL statements:
SELECT setval('biblio.autogen_tcn_value_seq',
(select max(id) from biblio.record_entry) + 100);
SELECT setval('biblio.record_entry_id_seq',
(select max(id) from biblio.record_entry) + 100);
- Process holdings records.
- Call numbers might have MARC8 encoded characters, so process'em and convert to UTF8. Theoretically "yaz-iconv -f MARC-8 -t UTF-8 < holdings.lst > holdings_utf8.lst" should do it, but instead it eats linefeeds and creates an unusable field. Ugh. We use a little Python script instead that requires pymarc, which in turn requires a version of setuptools (0.6c5) newer than Debian Etch's packaged version (0.6c3). So:
wget http://pypi.python.org/packages/2.4/s/setuptools/setuptools-0.6c9-py2.4.egg
sudo sh setuptools-0.6c9-py2.4.egg
sudo easy_install pymarc
- Now actually generate the 'holdings_utf8.lst' file.
cat holdings.lst | python marc8_to_utf8.py
- Adjust parse_unicorn.py to match up the holdings fields (added flexkey to the start). Then parse the holdings_utf8.lst to generate an SQL file (holdings_eg.sql) that we can load into the import staging table.
python parse_unicorn.py
Note that the holdings data for the item with barcode 30007007751786 didn't process cleanly and won't load. Weird - possibly a corrupt character in the item data? Augh, no - there are flexkeys and callnumbers that contain '|' characters (16 occurrences for "|z", 37 for "|b"), which is of course also what we are using as our delimiters. ARGH. I deleted it for now with:
grep -v '|z' holdings_utf8.lst > holdings_clean.lst
grep -v '|z' holdings_clean.lst > holdings_clean.lst2
mv holdings_clean.lst2 holdings_clean.lst
Adjust parse_unicorn.py to match the new input name and generate a new holdings_eg.sql.
- Create the import staging table:
psql -f Open-ILS/src/extras/import/import_staging_table.sql
- Load the items into the import staging table:
psql -f holdings_eg_clean.sql
We discover that some more of our data sucks - for example, one item ("Research in autism spectrum disorders", HIRC PER-WEB) has a create date of '0' which is not a valid date format because the barcode is "1750-9467|21". For now, grep it out as above and reload.
- Investigate possibilities of collapsing unnecessary duplicate item types:
SELECT item_type, COUNT(item_type)
FROM staging_items
GROUP BY item_type
ORDER BY item_type;
item_type | item_count
------------+------------
ATLAS | 162
AUDIO | 792
AUD_VISUAL | 1790
AV | 69
AV-EQUIP | 182
BOOK | 996
BOOKS | 581592
BOOK_ART | 1
BOOK_RARE | 4949
BOOK_SHWK | 5
BOOK_WEB | 49163
COMPUTER | 33
...
(40 rows)
How about locations?
SELECT location, COUNT(location)
FROM staging_items
GROUP BY location
ORDER BY location;
location | count
------------+--------
ALGO-ACH | 13
ALGO-ATLAS | 148
ALGO-AV | 1837
...
(212 rows)
Now we can collapse categories pretty simply inside the staging table. For example, if we want to collapse all of the BOOK types into a single type of BOOK:
UPDATE staging_items
SET item_type = 'BOOK'
WHERE item_type IN
('BOOKS', 'BOOK_ART', 'BOOK_RARE', 'BOOK_SHWK', 'BOOK_WEB', 'REF-BOOK');
- Update legacy library names to new Evergreen library short names (we're using OCLC codes where possible). Some will be straightforward old names to new names. Others will require a little more logic based on location + legacy library name; we're splitting the DESMARAIS collection into multiple org-units (Music Resource Centre, Hearst locations, hospital locations, etc).
-- Laurentian Music Resource Centre
UPDATE staging_items
SET owning_lib = 'LUMUSIC'
WHERE location = 'DESM-MRC';
-- Hearst - Kapuskasing location
UPDATE staging_items
SET owning_lib = 'KAP'
WHERE location LIKE 'HRSTK%';
-- Hearst - Timmins location
UPDATE staging_items
SET owning_lib = 'TIMMINS'
WHERE location LIKE 'HRSTT%';
- Generate the copies in the system:
psql -f generate_copies.sql
- Make the metarecords:
psql -f quick_metarecord_map.sql
Ah, recognize that any electronic resources (which don't have associated copies) won't appear. Check for 856 40 and change the bre source to a transcendent one mayhaps?
-- Create a new transcendant resource;
-- this autogenerates an ID of 4 in a default, untouched system
INSERT INTO config.bib_source (quality, source, transcendant)
VALUES (10, 'Electronic resource', 't');
-- Make the electronic full text resources (856 40) transcendant
-- by setting their bib record source to the new bib_source value of 4
UPDATE biblio.record_entry
SET source = 4
WHERE id IN (
SELECT DISTINCT(record)
FROM metabib.full_rec
WHERE tag = '856' AND ind1 = '4' AND ind2 = '0'
);
And no transcendence. Hmm. Oh well, worry about that later.
Sunday, February 1. 2009
Update 2009-02-19: uploaded diffs from Evergreen 1.4.0.2 (EG_exposed.tar.gz) for adding details to record summary; and Bill Erickson's slides and code examples are also available for download
The slides: Evergreen exposed, part 1 (OpenOffice)
My second presentation at the OLA SuperConference 2009 was Evergreen Exposed: hacking the open library system, which promised to take attendees on a tour of the architecture and source code of the Evergreen library system . I was very fortunate to have Bill Erickson, one of the original Evergreen developers, agree to join me as a co-presenter. Given the hour-and-fifteen-minute time slot that we were allotted, we opted to take an incremental approach to introducing parts of Evergreen to the audience, starting with basic tasks and working up to more complex customisations. We also tried to focus on answering questions that had been posted to the Evergreen mailing lists to ensure that we would satisfy our target audience's interests.
Dan starts with the basics
I started the session with an introduction of how to create a different skin for the catalogue, starting with text, CSS, JavaScript, and images and extending to the translation and customization framework. We talked about how to future-proof your customizations against future upgrades and how consortia can use skins to provide not just different look-and-feel, but different functionality, for each member of the consortium. Not much more than XML entities defined by DTDs, massaged via Apache server side includes (SSI), but it's an important conceptual building block for both the catalogue and the staff client.
I then ran through the exercise of adding a new metadata export format that brought the Federal Geographic Data Committee's Content Standard for Geospatial Data Metadata (FGDC CSGDM) format to Evergreen's existing list of supported formats. On the one hand: big deal, another metadata format. Hold that thought in that one hand; we'll come back to it later.
I also walked through two other common requests on the mailing lists: how do I define a new index or tweak the behaviour of an existing index and how do I hide or show more information on the detailed record display page? I'll follow up with separate posts for each of these pieces to augment what you have before you in the slides; suffice to say that there's a lot of MODS, a little bit of JavaScript, a smidgin of XPath, a dollop of Evergreen's interface definition language (IDL), and a slice of Perl mixed together. Along the way, I peeled back the covers to show a bit of OpenSRF in operation, setting up Bill's part of the show...
Bill leads us into the promised land
Note I'll update this with a link to Bill's slides when he manages to post them!
Bill gave a quick "big picture" view of how OpenSRF operates, including a much clearer overview of Evergreen's object-relational IDL that maps objects to relational tables. He also covered the cstore OpenSRF application that offers access to the underlying database without requiring SQL but still with support for full transactions (commit/rollback) and sub-transactions (savepoints). During Bill's demonstrations of these features, he exercised srfsh in a way that was new to me - he used the introspect command with a partial method name to perform a left-anchored search for matching method names. Cool!
Oh, and he also showed that if OpenSRF would normally return a reference to an object defined in the IDL, you can ask it to flesh the object in-place with its complete set of attributes instead; and of course if any of those attributes are object references, you have the option of fleshing those as well. It's a lovely way to cut down on chattiness in your application.
From there, Bill whipped out DojoSRF, the OpenSRF-aware extensions for dojo, the JavaScript toolkit that Evergreen adopted as its core JavaScript framework in release 1.4. In 90 lines of HTML and JavaScript code, he implemented a basic but workable catalogue - and then, with a few more lines of code, he gave the audience the payoff for that FGDC CSGDM (geographic metadata) format that I had earlier hacked into Evergreen. As part of the transform separates out the geographic coordinates of the subject matter (in the case of our demo data, maps of Northern California), Bill was able, in just a few more lines of code, to easily extract the coordinates from the FGDC CSGDM representation of the bibliographic material and plot the bounding box for the coverage area on a Google Map image. Very cool.
We had about 15 to 20 people attend our session, and I was happy with that attendance given the extremely technical content and relatively niche product. If as a result we end up adding just one more developer to the Evergreen community, that would be a great outcome. And for myself, I was forced to learn much more of Evergreen - just in time for Project Conifer, I hope 
|