About MeI'm Dan Scott, barista, library geek, and open source dabbler.
You may know me from such projects as PHP
(PEAR's File_MARC package and
PDO documentation),
Apache Derby, and the
Evergreen open-source ILS project.
I'm the Systems Librarian for Laurentian University. You can reach me by email at dan@coffeecode.net. License![]() This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 2.5 Canada License. Syndicate This Blog |
Wednesday, January 16. 2008Oooh... looks like I've got (even more) work cut out for mePHP is getting a native doubly-linked list structure. This is fabulous news; when I wrote the File_MARC PEAR package, I ended up having to implement a linked list class in PEAR to support it. File_MARC does its job today (even though I haven't taken it out of alpha yet), but due to its reliance on userspace data structures it's an order of magnitude slower than packages like marc4j so it's not the best choice for processing hundreds of thousands of MARC records... today. It hurts a little that the VuFind project has to use a non-PHP solution for populating its Solr indices - although I'm delighted that they have started using File_MARC for some on-demand processing. Now, when I get a chance (insert raucous mocking laughter here), I hope to be able to make File_MARC use splDoublyLinkedList and see how it fares with 500K records. Should be good fun! After that, it just needs to be taught how to convert MARC8 to UTF-8, and we'll have ourselves a fully featured standard MARC package for PHP. Thursday, March 15. 2007FacBackOPAC: making Casey Durfee's code talk to UnicornFor the past couple of days, I've been playing with Casey Durfee's code that uses Solr and Django to offer a faceted catalogue. My challenge? Turn a dense set of code focused on Dewey and Horizon ILS into a catalogue that speaks LC and Unicorn. Additionally, I want it to serve as both a proof of several technologies (Solr for faceted searching and Django as a Web application framework) to my colleagues and as a reasonable backup catalogue for when our main catalogue fails (as it all too often does). I emailed Casey today to tell him that I had a number of patches to contribute as a result of my experiments. It turns out that he's not really interested in pursuing this particular project much further, so he gave me his blessing to take his throwaway code and do whatever I want with it. Thus, the emergence of the FacBackOPAC project on code.google.com. If there's a grant out there for worst project name ever, this project's in the running... Anyways, I have contorted Casey's code so that it supports both Dewey and LC, and with a bit more torture it should be flexible enough to support both Horizon and Unicorn. Right now I've twisted it all the way to meet my Unicorn needs and consequently have broken Horizon support, but it won't take much to make it support Horizon again - or any other ILS, for that matter. The main requirement is that you have to be able to get your MARC records and holdings out of your ILS. A secondary requirement is to know how to create links to detailed item views in your current catalogue, because this thing does not yet have any current awareness about item status. There. My itch has been scratched for the time being. Go play with the FacBackOPAC project -- I even have (very) rough documentation on how to get the pieces installed andthe MARC records indexed, although you'll have to dig through the source in the Django catalog tree to overcome some hardcoded strings and URLs for the time being. Don't worry, pulling that hardcoded stuff out of the templates is high on the list of priorities. So, a huge thank you to Casey for freeing this code and making this possible. For something he considers throwaway code, I've learned a lot from walking through it and making it start to meet my needs. I hope it helps you, too! Update 2007-03-18: Edited links to point to the FacBackOPAC project page, rather than the wiki (which is subject to change, and which did -- breaking the dang links in the original version of this story. Argh!) Wednesday, February 28. 2007Lightning talk: File_MARC for PHPI gave a lightning talk at the code4lib conference today on The funny thing is that I had originally pitched a full session talk on this subject for the conference, and it didn't make the cut of the ruthless democracy that is code4lib. In retrospect, even a twenty-minute thunder talk probably would have been too much information for anybody but the most die-hard PHP and MARC coder out there; the lightning talk was a perfect format for the talk. I hope to let the documentation do most of the talking in the future. Here are the slides from the talk in OpenOffice.org and PDF format. Enjoy! /me notes that UnAPI would be useful here... Gotta try and submit a patch to the s9y repository... Tuesday, January 2. 2007Reflections at the start of 20072006 was a year full of change - wonderful, exhausting change. Here's a month-by-month summary of the highlights of 2006:
So, all in all, it was a pretty full year of geekdom, some regular exercise, a bit too much poker, a ton of travel, and a whole lot of change. There wasn't nearly enough Amber (of course there can never be enough), even though I have her all to myself a couple of mornings each week. But I'm living with the people that I love, doing fulfilling work, and that's all I can really ask for. Wednesday, December 27. 2006The state of PHP security (LWN article)One of my favourite online publications, the Linux Weekly News, recently published an article called The state of PHP security. Given Stefan's departure, the great taint debate, the addition of ext/filter in 5.2.0 and all of the associated security changes in both the 5.2.x and the 6 branches, I settled down to enjoy a nice pre-Christmas read. I was hoping for some provocative thoughts about the direction that PHP has been taking for the last six months or so in the arena of security. Unfortunately, I was greatly disappointed. Beyond using Stefan's departure as a kicking-off point for the article, the author didn't even mention any of these issues. Instead, he simply rehashed the history of PHP design missteps (magic_quotes, register_globals, allowing URLs in include) and noted that many PHP tutorials rely on dangerous practices. What bothered me the most, however, was the author's decision to paraphrase a quote Rasmus gave in an interview from 2002 without explicitly noting that the quote was from 2002. The sentence in the article, talking about register_globals, is: It is an extremely dubious feature, but one that PHP creator, Rasmus Lerdorf, seems to think should have been left on by default. Would it have been too much for the author to have actually asked Rasmus if he might have changed his mind in the past five years? Or perhaps the author could have done a little more research and dug up the PHP 6 planning meeting minutes that state that register_globals and magic_quotes were going to be removed entirely from the language. Instead, the author concludes with the following statement: Security seems to fall somewhere below simplicity in the minds of the PHP language developers; that makes it more difficult to have secure PHP applications. Security is a hard problem and any attempt to 'dumb down' a language is likely to run into security issues. Encouraging amateur programmers to write web applications is unlikely to produce secure code in any language, but by providing tutorials and examples that have glaring security issues and by not concentrating on teaching secure coding, PHP makes it that much worse. A great deal of useful code has been written on the PHP platform; it would be nice to find a way to keep that code coming while simultaneously making it more secure. The first sentence in that statement is the most damning of PHP developers, but it entirely ignores the evidence exhibited in the changes we've seen in PHP 5.2.0 and that are in the works for PHP 6. The third sentence, oddly enough, attributes the existence of "tutorials and examples that have glaring security issues" to PHP itself, as though the language itself or the core developers of the language have the ability to prevent insecure tutorials from being published. So I launched into the fray and attempted to right those injustices, perhaps a bit too passionately -- but so be it. I've been pretty quiet in the PHP world for the past while, outside of my little PEAR projects, but I still care about the language. If I can glean anything from this article, it suggests that it might be a good idea to revamp the php.net landing page and documentation a bit to try to highlight tutorials that teach developers how to write secure PHP applications. Right now the landing page is largely a bulletin board for events. It might benefit, say, from a prominent and permanent link to the PHP Security Consortium (if that project is actually still alive--the last posted article dates back to March 2005). We may also want to improve the visibility of the security chapter of the manual (although briefly revisiting the section on SQL injection suggests that we need to revise it to encourage the use of PDO and placeholders). Tuesday, November 14. 2006PEAR File_MARC 0.1.0 alpha officially releasedJust a short note to let y'all know that I received the thumbs-up from my fellow PEAR developers to add File_MARC as an official PEAR package. What does this mean? Well, assuming you have PHP 5.1+ and PEAR installed, you can now download and install File_MARC and its prerequisite with a simple command: pear install File_MARC-alpha I've also imported the File_MARC source into the PEAR CVS repository, so you can poke and prod and provide patches. Before moving to a 1.0 release, I have to write up some user-oriented documentation. I have a hankering to provide MARCXML support as well, so that will probably work its way into the package before 1.0. I'd love some more testing and feedback from other library geeks; now that installation is so simple, I'm hoping to see the Oh yes: a big thanks to the PEAR developers who have given me some excellent suggestions along the way, from my first proposal all the way through to this alpha release. File_MARC wouldn't be what it is today without your help! Thursday, October 19. 2006Serendipity (s9y) blog: Security releaseFolks, if you use Serendipity, I thought you should know they just released a security update to fix an XSS issue in the administration backend. Unfortunately, s9y.org itself appears to be very ill at the moment: I kept getting 500 - Internal Server Error. However, the new release with the security fix (1.0.2) is available for download from http://prdownloads.sourceforge.net/php-blog/ -- I recommend you go forth and upgrade. Tuesday, October 17. 2006Double-barreled PHP releasesI'm the proud parent of two new releases over the past couple of days: one official PEAR release for linked list fans, and another revision of the File_MARC proposal for library geeks. Structures_LinkedListA few days ago marked the first official PEAR release of the Structures_LinkedList. Yes, it's only at 0.1.0-alpha, but I'm pretty damned happy with the code at this stage and unless something drastic happens the only significant change I foresee between now and 1.0-stable is the addition of some user-oriented documentation. This code got a severe workout at the Access 2006 Hackfest, where I ran headlong into some significant limitations in parsing huge files. A few days later, after misdirecting some precious #php.pecl brainpower (sorry, sorry, sorry Wez, Ilia, and Tony) on the wrong problem, I discovered the reason writing your own __destruct() methods can be very, very necessary. If you don't clean up variables that PHP doesn't know how to deal with--say, nodes in a doubly-linked list that look like circular reference hell to PHP--then you're going to be in for a world of hurt for anything but the smallest of test scenarios. This particular problem has had a stake put through its heart in Structures_LinkedList as of the 0.1.0-alpha release. Go forth and create linked lists! File_MARCOn the library geek front, I pushed out File_MARC 0.0.9 via the PEAR Proposal process today. This new release repairs another embarassing problem that I originally blamed for breaking down during our Hackfest work. You see, I hadn't touched emilda.org's php-marc core routine for parsing MARC files, and it happened to call file() to read the entire target MARC file into memory as an array of lines before enabling you to start parsing the individual MARC records. That's nifty if you just want to count all of the MARC records in a given file, but it doesn't scale up very well when you've brought, oh, a single file with a half-million MARC records to parse. In fact, PHP kind of gets very upset with you. The solution, as Dan Chudnov suggested on the fly during my Hackfest interview, was to go with streams. It turns out that stream_get_line() was perfectly suited to the task: given a file pointer, it sucks in the contents of that file until it reaches a maximum length or a given string, then waits until you ask it to suck in the next chunk of data. It was a breeze to convert the code to the following approach: const END_OF_RECORD = "\x1D"; const MAX_RECORD_LENGTH = 99999; ... $this->source = fopen($source, 'rb'); ... $record = stream_get_line($source, MAX_RECORD_LENGTH, END_OF_RECORD); That change solved the "big file" problem, but as File_MARC represents MARC records as linked lists (fields) containing linked lists (subfields), the big file issue was just covering up the slightly more twisted memory managment issue in the Structures_LinkedList library. However, after those two changes, testing out the same code I had hastily written at Hackfest shows that the script to parse a 512M MARC file now never takes more than 0.8% of my system memory. So, library geeks -- this is a last call for significant comments on the File_MARC API. In a couple of days, I plan to put this proposal up for a vote to become an official PEAR package. Of course, if you want to test it out right now, I have high confidence in the code: you can grab it from marc.coffeecode.net. And yes, if you visit that site, I am grasping for the worst throwback HTML design award ever, thank you very much! Update 2006-10-19: Correct XHTML syntax errors. Heh. Tuesday, September 19. 2006Why quasi-open source doesn't workA few minutes ago, I gave you some examples of how open source works. Now I'm going to give you an example of how quasi-open source doesn't work. Looking for a tool to help visualize the development of our library's new Web site, I came across a reference in the Atlas of Cyberspaces to: a neat Java application for dynamically constructing interactive visual maps of Web sites
The Since 1996, alphaWorks has succeeded in helping IBM connect with innovative developers to lead the development of promising new standards, products, and open-source technology. To date, forty percent of technologies posted to the alphaWorks Web site have been incorporated into IBM products or licensed to third-party developers. So, other than the remarkably commercial bent of the goals for the site projects, what surprised me was the text displayed when I followed the link to Mappucino: Mapuccino has been retired. Yup, that's it. No link to a download. No source code. Just the bios of six researchers who probably laboured over this project for months--not to mention any input from the My quest for a good Web site visualization tool continues, by the way. Recommendations are welcomed. Why open source works...A couple of recent examples in the PHP community have reaffirmed my faith in the open source development model: the PEAR proposal process, and the delay in the 5.2.0 release. Continue reading "Why open source works..." Friday, August 25. 2006File_MARC and Structure_Linked_List: new alpha releasesEarlier in the month I asked for feedback on the super-alpha MARC package for PHP. Most of the responses I received were along the lines of "Sounds great!" but there hasn't been much in the way of real suggestions for improvement. In the mean time, I've figured out (with Lukas and Pierre's assistance, merci beaucoup) how to make use of PEAR::ErrorStack for error handling. I've also decided to split my linked-list-in-PHP implementation into a separate package; first, because it might be useful for someone else; second, because as a separate package the PHP gurus that care deeply about things like returning references can go over it with a fine-toothed comb without having to worry about all of the MARC stuff. So, once again I'm interested in your comments -- but this time I'm looking for comments on two different packages:
Next steps are to build real PEAR packages for these beasties and put together PEAR proposals for consideration of the community... but don't feel that you have to wait until the proposal to offer any suggestions! I will put up .phps versions of the examples for each package, but for some reason I'm having problems getting my host to accept my .htaccess file... look for an update on the respective pages after I contact my hosting support team. Monday, August 14. 2006Super-alpha MARC package for PHP: comments requestedOkay, I've been working on this project (let's call it PEAR_MARC, although it's not an official PEAR project yet) in my spare moments over the past month or two. It's a new PHP package for working with MARC records. The package tries to follow the PEAR project standards (coding, documentation, error handlers, etc) in the hopes that, when I put a proposal forward, it will be accepted as a true PEAR package. For now, I'm most interested in getting feedback from coders for libraries on the usability of the API that I've designed -- is it easy enough to use and does it offer the functionality that you require for your day-to-day work? The core MARC decoding routine was taken from the php-marc package that Christoffer Landtman coded for the Emilda open source library management system. The decoding routine was based on the algorithm contained in Perl's MARC::Record package. Christoffer generously relicensed php-marc under an LGPL license so that I could use it as the basis of a (hopefully, eventually) official PEAR package. PEAR_MARC itself will therefore be licensed under LGPL. Some of the major differences that users will see between php-marc and PEAR_MARC are:
You can find the latest version of the PEAR_MARC package posted at http://marc.coffeecode.net. Please append any comments as replies to this post, or email me at dan@coffeecode.net. Tuesday, June 13. 2006In-depth _and_ official DB2 and PHP documentationI should have mentioned this before, but now that I noticed Chris Jones' post on the Underground PHP and Oracle Manual, I felt obliged to point out that one of the final fruits of my labours at IBM is now visible in the DB2 "Viper" Information Center -- a set of task-oriented documentation that describes how to do all of the things that you really need to do with DB2 and PHP, using either the ibm_db2 or PDO_ODBC modules. By "task-oriented" I mean that, instead of documenting a set of objects and methods, the docs take the perspective of a developer and describe how to accomplish specific tasks (like "Connecting to a DB2 database from PDO" or "Calling a stored procedure" or "Retrieving multiple result sets"). I hope it works as both a good introduction to PHP development for DB2 users, and a good introduction to DB2 for PHP developers. And, of course, the same approach will work for Apache Derby databases as well. I find it interesting that Oracle has positioned their PHP documentation as "underground", while IBM has chosen to incorporate their PHP documentation into their official set of DB2 documentation. Oracle gets the points for coolness, but IBM's approach will make the pointy-headed types a bit more comfortable.
Update: Corrected bad XHTML (unescaped ampersand in URL). Bad Dan. And corrupted an intermediate version with garbage from another posting. Even worse. Monday, February 27. 2006Some apologies, some reassurancesAs I mentioned in a previous post, I'm leaving IBM for a new opportunity at Laurentian University. Over the past year and a half a lot of my personal and professional effort has gone into the PHP community: contributing documentation, acting as the release lead for the ibm_db2 and PDO_INFORMIX extensions, giving conference sessions, and writing the occasional article. When I made the decision to leave IBM, I also had to back down from some related commitments that I had made (formally and informally) to a few different people and groups. I want to apologize publically for letting you down, and hope that you'll forgive me:
Now, a few of you have wondered on IRC whether this means the end of my contributions to the PHP community... the answer is an emphatic *heck no*! First of all, my new position will give me some latitude in deciding upon the technologies we use to solve the problems we face, and the Library is already using PHP... so we're likely to continue to use it. And on a personal level, there are a number of projects that I want to continue to be involved with:
So I'll still be around; you just won't see me at the conferences this year, and I'll probably be even less productive over the next few months than normal as I adjust to the new role at the University. Oh, not to mention my new role as a father, which I am also expected to take on in a few months. But over time, I'm sure you'll hear more from me, and you'll start seeing submissions for conferences and articles grounded in my experiences at the library. Friday, February 10. 2006ADOdb: getting good support for IBM DB2, Cloudscape, and Apache DerbyThe stable release of the ibm_db2 PECL extension for IBM DB2, Cloudscape, and Apache Derby brought a high performing, highly functional database connectivity alternative to Unified ODBC for PHP 4 and 5 users. However, in and of itself a database extension does not enable you to use the many PHP applications that you might want to use. You either have to add a specific driver for each application that implements its own portability layer (such as phpMyFAQ), or if the application relies on one of the standard database abstraction layers (PEAR DB, MDB2, or ADOdb), then a driver needs to be added to the corresponding database abstraction layer. To date, the standard database abstraction layers have offered support for DB2 only through the Unified ODBC extension (and despite substantial overlap in names, MDB2 does not offer support for DB2 at all). Due to some limitations of the Unified ODBC extension, access to DB2 would seem slow and buggy -- and access to Apache Derby or Cloudscape would be frought with minefields, as Unified ODBC does not provide a way of differentiating between the databases to which you are connected and their corresponding features. The ibm_db2 extension offers the db2_server_info() function which can tell you whether you are connected to DB2 on Linux, DB2 on a zSeries machine, or an Apache Derby database, and let your application or database abstraction layer perform the appropriate workarounds. Now, however, as part of Larry Menard's efforts to enable Gallery 2, an ADOdb driver built on top of the ibm_db2 extension will, in all probability, be made available as part of a future ADOdb release. Undoubtedly there will be further testing to do, and tweaks and performance optimizations in the future code--for example, differentiating between the capabilities of Apache Derby and DB2--but this is a huge first step! Thanks to Larry and the Gallery 2 team for making this contribution.
(Page 1 of 2, totaling 27 entries)
» next page
|
Calendar
QuicksearchCategoriesBlog Administration |
|||||||||||||||||||||||||||||||||||||||||||||||||


