Thursday, October 18. 2012Triumph of the tiny brain: Dan vs. Drupal / PanelsA while ago I inherited responsibility for a Drupal 6 instance and a rather out-of-date server. (You know it's not good when your production operating system is so old that it is no longer getting security updates). I'm not a Drupal person. I dabbled with Drupal years and years ago when I was heavily into PHP, but it never stuck with me. Every time I poked around at the database schema, with serialized objects stuck inside columns, I found something else that I wanted to work on instead. Thus, inheriting a Drupal instance wasn't something I had been looking forward to. As this production server was running a number of different services that were in use by our library, I went through a number of trial runs to ensure that the base packages wouldn't introduce regressions or outages. Fast-forward past a reasonably successful early-morning upgrade from Debian Lenny to Squeeze and I was able to start looking at addressing the Drupal instance that was also approximately 18 months out of date. Initially, after I worked out the how-to of Drupal upgrades (in short: upgrade just Drupal core, then upgrade the modules), I thought all was well. I even got over the hump of realizing that our instance had had all of the modules dumped into Drupal's core directory, rather than sites/all/modules, and (even more impressively) got over the problem that the core bluemarine them had been hacked directly rather than having been separated out into a new custom theme. After working through those learning pains, I realized that somewhere in all of the Drupal and module upgrades, that something got "more secure" and started truncating IMG links to files with spaces in them at the first space. So "foo%20bar.jpg" was becoming "foo.jpg" and we were getting 404s everywhere. Did I mention that I didn't notice this until I upgraded our production instance? Oh yes, I went through iteration after iteration of upgrades on the test server, and dutifully fixed up the problems that I found in the subset of content that I was testing against. I discovered and fixed problems like the production server content linking directly to the test server (slight copy-and-paste errors on the part of the content creators, I suppose). But I didn't notice all of the 404s, because who uploads images with spaces in their filename? Turns out, everyone else in my library does that. Of course! And from what I was able to piece together via Google and browsing drupal.org, there was supposed to be some sanitization of the incoming filenames so that spaces would be normalized, etc. But either that wasn't introduced until well after our content had been created, or my predecessor had lightly hacked one of the modules, or Drupal itself, and hadn't bothered to use a source code repository to track those customizations. So, realizing that I needed to make some bulk changes, I went at it with a two-step plan:
If you're a Drupal user or a Drupal with Panels module user, you might know that the database schema suffers from some fairly horrible tricks being played on it. In this case, the Panels module creates a panels_pane table with a configuration TEXT column. Based on the name alone, it might seem odd that column is used to store the HTML content of the corresponding panel. Even odder to me is that this is not just a TEXT column, it's a column that expects a very particular structure - something like: a:5:{s:11:"admin_title";s:5:"RACER";s:5:"title";s:0:"";s:4:"body";s:639:"<p><img width="225" height [...]}
Ah, nothing like storing an object within a single database column. Of particular interest was the result that I had when I tested updating the column value with a basic "replace(configuration, '%20', '_')" - the panel showed only n/a, presumably because the size (defined by the s properties in the object) for the "body" text property was no longer a match. That would be an instance of http://drupal.org/node/926448 - so okay, clearly I had to change tactics and update the entire object. I tried quickly finding the Drupal way to do this: clearly there's an API and there must be some simple way to retrieve an object, change it's values, and update it so that the serialized object gets stored in the database and Drupal is happy. However, I couldn't find a simple tutorial, and trying #drupal on Freenode was unfortunately fruitless as well (although some people did try to suggest running REPLACE() at the database level, that was nice but they didn't recognize that that would actually damage things significantly). So... out came the Perl, and here's what I hacked together:
#!/bin/perl
use strict;
use warnings;
foreach (<DATA>) {
chomp();
my $i = 0;
my $body = 0;
my @fixed;
my @row = split /\t/;
my $pid = $row[1];
my $configuration = $row[0];
my @chunks = split /";s:/, $configuration;
foreach my $chunk (@chunks) {
if (!$i++) {
push @fixed, $chunk;
next;
}
if ($chunk =~ m/"body/) {
$body = 1;
push @fixed, $chunk;
next;
}
if ($body) {
my ($length, $content) = $chunk =~ m/^(\d+):"(.+)$/;
for (my $j = 0; $j < 50; $j++) {
$content =~ s{(/pictures/[^\./]+?)%20}{$1_}g;
}
$content =~ s{%20}{+}g;
$length = length($content);
$chunk = "$length:\"$content";
$body = 0;
}
push @fixed, $chunk;
}
print 'UPDATE panels_pane SET configuration = $ashes$' .
join('";s:', @fixed) . '$ashes$' . " WHERE pid = $pid;\n";
}
__DATA__
Against the trusty database (I ? PostgreSQL!), I ran COPY (SELECT configuration, pid FROM panels_pane WHERE configuration ~ '%20') TO 'conf_pids.out';, then slapped the Perl code on top and generated a load of UPDATE statements. It's far from my best Perl code, but it worked and once I gave up on doing things the Drupal way I was able to put it together in a handful of minutes. I now have a functional Drupal 6 instance again, updated such that there are no known security vulnerabilities with either core or the modules we're using, and there are no broken image links. And now I need to begin working towards either grokking Drupal, or finding a content management system that my tiny brain can comprehend, because I don't want to have to go through these kinds of contortions again with future upgrades... Suggestions welcome! Tuesday, September 18. 2012Seek and ye shall find: full-text search in PostgreSQLI'm at PostgresOpen in Chicago, and just gave my talk on Implementing full-text search in PostgreSQL. The goal was to give novice users the understanding and examples they needed to build a workable search solution using PostgreSQL's full-text search. And it went (in my opinion) well - an almost full room, lots of audience interaction (thanks Bruce Momjian, Jonathan Scott, Jonathan Katz, et al) a lot of nodding heads, and nobody running out of the room screaming. So... yay! A few takeaways from prepping for the presentation:
Also - PostgresOpen has had a great vibe so far; a relatively small but very high-quality conference with lots of knowledgeable, friendly participants. Selena (one of the organizers) had a goal of creating an environment similar to PgCon, and I would say from my limited experience attending one PgCon and one PostgresOpen that she and the rest of the conference team have done a great job! Sunday, September 2. 2012Leaving SELinux in enforcing mode with Evergreen on Fedora 17Ever since I switched over to Fedora a few years back (hi Fedora 13!), I've been guilty of a dirty secret: to run Evergreen, I've had to run setenforce 0 to disable the most excellent SELinux security policies before I could start up the Apache web server to serve up the Evergreen goodness. This worked for development purposes, but tonight something snapped and I decided that it was no longer acceptable to throw away a great layer of operating system security simply for the sake of hacking on Evergreen. So... I stepped into the world of what had formerly seemed to be inscrutable SELinux concepts, and came out with something that seems to work (at least for my fairly limited purposes thus far of searching the TPAC catalogue). This was a pretty iterative process that involved trying to start the httpd.service, then checking /var/log/messages and /var/log/audit/audit.log for clues as to why httpd.service was either not starting, or (once I passed that hurdle) was simply returning internal server errors. First, due to my recent experience with running a web.py script under Fedora, I had learned that the httpd SELinux policy had a number of booleans for enforcing or allowing particular behaviours, so I immediately ran the following command to enable httpd to connect to the network: setsebool httpd_can_network_connect on I then needed to change the labels on many of the OpenSRF and Evergreen files that were installed and which Fedora gave a default type of unconfined_t, which is understandably restrictive: # Mark web content as, well, web content chcon -R --type=httpd_sys_content_t /openils/lib/javascript chcon -R --type=httpd_sys_content_t /openils/var/web chcon -R --type=httpd_sys_content_t /openils/var/templates* chcon -R --type=httpd_sys_content_t /openils/var/data chcon -R --type=httpd_sys_content_t /openils/var/xsl chcon --type=httpd_sys_content_t /openils/conf/opensrf_core.xml chcon --type=httpd_sys_content_t /openils/conf/fm_IDL.xml # Mark the custom Apache modules chcon --user=system_u --type=httpd_modules_t /usr/lib64/httpd/modules/mod_xmlent.so chcon --user=system_u --type=httpd_modules_t /usr/lib64/httpd/modules/osrf_* # Mark the dynamic libraries we need to load # "-h" changes the context of symlinks as well as files chcon -h --type=lib_t /openils/lib/* # Mark executable scripts chcon -t httpd_sys_script_exec_t /openils/bin/openurl_map.pl chcon -t httpd_sys_script_exec_t /openils/bin/offline-blocked-list.pl # Might not have been necessary chcon -R --user=system_u /usr/local/share/perl5/ chcon --user=system_u /etc/httpd/conf.d/eg.conf chcon --user=system_u /etc/httpd/startup.pl chcon --user=system_u /etc/httpd/eg_vhost.conf chcon -R --user=system_u /etc/httpd/ssl/ Note: I'm aware that simply running chcon won't survive a relabelling of the files. We really need to turn this into a policy, or alternately use semanage to make the changes permanent... Next, I opted to finally start running Apache as the stock apache:apache user/group rather than as the opensrf user. This turned out to require only a few steps:
So this is a start. I think this has broader implications than for just Fedora; we should stop using the opensrf user to run the Apache service in the default configuration on all distributions (we've discussed this several times in the past, but never really done anything about it). I hope to update the README accordingly, and I also hope to take the SELinux work a step further to provide a modified policy so that Fedora and Red Hat (and derivative) distributions can offer a more secure environment for running Evergreen. Oh, and some handy resources: Wednesday, August 1. 2012Finding DRM-free books on the Google Play storeJohn Mark Ockerbloom recently said, while trying to buy a DRM-free copy of John Scalzi's Redshirts on the Google Play Store: At the publisher's request, this title is being sold without Digital Rights Management software (DRM) applied. And that lead to the realization you search for the magic phrase: this title is being sold without Digital Rights Management software, you'll find plenty of books up for sale on Google Play without DRM. When you buy one, you can then download the ePub from the How to Read tab for the book. Pretty straightforward, no?
« previous page
(Page 2 of 64, totaling 255 entries)
» next page
|
QuicksearchAbout MeI'm Dan Scott: barista, library geek, and free-as-in-freedom software developer.
I hack on projects such as the Evergreen
open-source ILS project and PEAR's File_MARC package .
By day I'm the Systems Librarian for Laurentian University. You can reach me by email at dan@coffeecode.net. Identi.ca microblogging
LicenseCategories |

