Test server strategies

Posted on Thu 10 April 2008 in misc

Occasionally on the #OpenILS-Evergreen IRC channel, a question comes up what kind of hardware a site should buy if they're getting serious about trying out Evergreen. I had exactly the same chat with Mike Rylander back in December, so I thought it might be useful to share the strategy we developed in case other organizations are interested in piggy-backing on our research. We came up with three different scenarios, depending on the funding available to the organization and how serious the organization is about testing, developing, and deploying Evergreen.

You can also look at the scenarios as stages, as the scenarios enable

progressively more realistic testing. An organization can always

start with a single server and add more servers over time; if you can

swing a significant discount for buying in bulk, however, it might

make sense to bite the bullet early.

Some pertinent facts about our requirements: we will eventually be loading around 5 million bibliographic records onto the system. We're an academic organization, so concurrent searching and circulation loads will be low relative to public libraries.

Scenario 1: A single bargain-basement testing server

In this scenario, the organization purchases a single server for the short

term, and configures it to run the entire Evergreen + OpenSRF stack:

  • database
  • Web server
  • Jabber messaging
  • memcached
  • OpenSRF applications </p>

This server needs to have powerful CPUs, large amounts of RAM, and many fast (10K RPM or higher) hard drives in a

striped RAID configuration (the latter because database performance

typically gets knee-capped by disk access). A "higher education" quote online from a reputable big-name vendor for a rack-mounted 2U database server with 2x4-core

CPU, 16GB RAM, 6x73GB RAID 5 drives comes in at approximately $7000.

This scenario is fine for development and testing with a limited

number of users, but if you intend to do any sort of stress testing

with this server or throw it open to the public, performance will

likely grind to a halt. Note: This is close to the system that we're currently running at http://biblio-dev.laurentian.ca - 12 GB of RAM, 2 dual-core CPUs - with 800K bibliographic records and pretty snappy search performance. It's certainly nothing to sneeze at.

Scenario 2: one database server, one network server

In this scenario, you purchase a database server and a network server.

We'll use the same specs from scenario 1 for the database server, and

a CPU + RAM-oriented server for the network server (disk access isn't

a factor for the network apps, so you just buy two small mirrored

drives). The stock higher education quote for a rack-mounted 1U

network server with 2x4-core CPU, 16GB RAM, 2x73GB RAID 1 drives is

approximately $5250.

This scenario will support development and testing, as well as enable

you perform relatively representative stress testing runs with a

significant number of simultaneous users.

Scenario 3: two database servers, two or three network servers

In this scenario, you purchase two database servers so that you can test

database replication, split database loads between search and reporting, and two or three network servers to test

different distributions of the caching and network apps across the servers to

determine the configuration that best meets your expected demands. The cost of the five servers adds up to less than $30,000 - less than a single traditional proprietary UNIX server - and would be less if you can negotiate a bulk discount.

The third scenario supports development and testing, and will give you

practical experience with a configuration that would approximate your

production deployment of servers. When you go live, you could move one of the database servers

and all but one of the network servers over to the production cluster, and revert back to scenario one for your ongoing test and development environment.

The Conifer approach

We opted to go with the third scenario to build a serious test cluster for our consortium. However, the "scenarios as stages" approach ended up being our strategy as our original choice of Dell servers came with RAID controllers that do not work well under Debian. After returning the servers to Dell, we were forced to press one of our backup servers into service as a scenario-one style server while waiting for our new order from HP to arrive.