Tuesday, October 31. 2006
Final entry in publishing my own hastily jotted Access 2006 conference notes--primarily for my own purposes, but maybe it will help you indirectly find some real content relating to your field of interest at the official podcast/presentation Web site for Access 2006. Contents include:
Consortium update
ASIN Overview
- 17 atlantic academic libraries
- 300 - 18,000 students
- 2 unilingual francophone sites
- Sirsi, Ex Libris, and Innovative
Why our users hate us
- choose format over subject
- learn multiple database interfaces
- citations presented in confusing formats
Addressing the problems
- a la carte user authentication
- EZProxy servers
- SingleSearch federated search tool over 400 resources
(including 100+ open access)
- 1Cate OpenURL resolvers
- Relais ILL
- Refwork/Refshare
Principles
- Click, don't type
- when you have it, show it
- when you don't have it, make it easy to get
- focus on appropriate links rather than click counts
- let the user determine the appropriate copy from the
available formats
Stephen Sloan
- Missing ingredient -- enabling subject choice for users,
rather than format
- working with SirsiDynix on a consortium version of EPS
Rooms CMS
- production version to be available in 1st quarter of
2007
- Rooms is basically a portal environment, with different
defaults/scoping for each subject (so that single search
Outstanding challenges
- Federated search connectors based on screen scraping will
break
- Citations from certain resources cannot be linked to
Resolver
- Cookie pushing in a public environment
- Implementation of the NISO Metasearch standard to improve
federated searching
Recognizing our differences
- Local customization of interfaces
- Emulating local default search options--everyone use EBSCO,
but everyone has configured different behaviour
- Relying on local expertise at each site
COPPUL Overview
- ANTS: Using Open Source, Social Software (in the COPPUL
consortium)
- sharing and updating animated tutorials that were believed
to be a better option than long information literacy
tutorials
- make it easy to locate and use these tutorials (central
location and explicit copyright / reuse statement)
- Make sharing easy and desirable through quality standards,
help, and the allowance for local customization
How does it work?
- Project is hosted at http://brandonu.ca/Library/coppul
- ask each institution to take responsibility for a certain
set of databases so that they can be updated along with the
user interface
- wiki enables institutions to update database list with
status of development, whenever they create a tutorial, or add
a new database to the list
- rss feeds enable you to track which tutorials have been
updated or created
- tutorials are housed within a single institutional
repository, licensed under CC licenses with options to the
creators
- Other organizations (like LU) are welcome to
participate!
Quebec Digital Infrastructure: The Year in Review
Main players
- BAnQ - Bibliotheque et archives nationales du Quebec
- CREPUQ - Conference of Rectors and Principals of Quebec
Universities
- Erudit
- Museums
- Quebec Gov.
- SRC and other media
BAnQ
- BNQ started in 1967
- April 2005, opening of la Grande Bibliotheque of the
BNQ
- Jan 2006 - Merger of ANQ and BNQ; mandate to acquire and
disseminate collections
- October 2006 - Second meeting on digital national
library
- 1996 - beginning of digitization activities
- 2003 - permanent digitation program
- 3.2 million pages of digital materials (newspapers, etc)
currently in the collection; 62000 images
Meanwhile in the World
- Dec 2004 - Google print project: 15 M ebooks by 2010
- Jan 2005 - CEO of BND Jeanneney react in Le Monde "Quand
Google defie l'Europe", results in the proposal for the
Creation of European Digital Library
- 2010 European DL expect 6M books
- Fevrier 2006 Franco network of digital libraries was formed
(including France and Quebec)
Meanwhile in Canada
- Quebec is participating in Alouette Canada, hoping that
nobody is reinventing the wheel
Erudit
- 18000 scholarly articles from 48 journals
- 150000 backfiles projected
- Erudit schema adopted by www.persee.fr and www.cens.cnrs.fr
= franc interoperability
- $3,000 annually to join
Virtual International Authority File (VIAF)
- Link national authority records
- Build on their authority work
- Move towards universal bibliographic control, while
allowing local variations to exist
- Deutsche Nationalbiblithek, LoC, and OCLC -- hoping for the
BNF (French) national file
- OCLC is responsible for the actual coding for the
project
Matching variations
- In the LCNAF and PND authority files:
- Same name, same person
- Same name, different people
- Different names, same person
- Missing person in one file
Enhancing the authorities
- Bibliographic record -> Derived authority -> Enhanced
authority
- Authority record -> Enhanced authority
Weaker attributes
- Only one of birth/death dates
- Subject area of works
- Format
- Language
- Publisher
- Partial title match
Even weaker attributes
- Date of publication
- Country
- Role
- Format
Compute it
- Standard approach:
- Generate keys and data
- Load information into a database
- Index it
- Extract fields needed
- Map/reduce approach (adopted from Google)
- Split the database up
- Run parallel jobs against those pieces of the
database
- Bring information together through map/reduce
Map/Reduce
- Map
- Read in source file (e.g MARC21)
- Write out key + data
- Reduce
- Read in array of data for each unique key
- Write out key + data
Map/Reduce implementation
- Written in Python
- Uses ssh and XML-RPC for control and communication
- Map/Reduce seems to add around 10% overhead
- Earlier implementation ran on a 48 CPU cluster
- Current VIAF cluster is a 12 CPU cluster on 4 nodes
- Running Linux and 64-bit Python (no need to worry about 2GB
memory limit)
VIAF matching code
- 17 modules
- 1,100 lines of code
- 600 lines of configu
|