Monday, April 14. 2008Tuning PostgreSQL for Evergreen on a test serverUpdate 2008-05-01: Fixed a typo for sysctl: -a parameter simply shows all settings; -w parameter is needed to write the setting. Duh. Once you have decided on and acquired your test hardware for Evergreen, you need to think about tuning your PostgreSQL database server. Once you start loading bibliographic records, you might notice that after 100,000 records or so that your search response times aren't too snappy. Don't snarl at Evergreen. By default, PostgreSQL ships with very conservative settings (something like machines with 256 MB of RAM!) so if you don't tune those settings you're getting a false representation of your system's capabilities.
The "right" settings for PostgreSQL depend significantly on your hardware and deployment context, but in almost any circumstance you will want to bump up the settings from the delivered defaults. To give you an idea of what you need to consider, I thought I would share the settings that we're currently using on our Evergreen test server at Laurentian University. You might be able to use these as a starting point and adjust them accordingly once you've run some representative load tests against your configuration. And it's useful documentation for me to fall back on in a few months, when all of this has escaped my grasp The defaults (as shipped in Debian Etch)The defaults in Debian Etch are quite conservative. Consider that our test server has 12GB of RAM. The default only allocates 1MB of RAM to work memory (which is critical for sorting performance) and only 8MB of RAM to shared buffers. Following are the defaults set in /etc/postgresql/8.1/main/postgresql.conf: # - Memory - #shared_buffers = 1000 # min 16 or max_connections*2, 8KB each #temp_buffers = 1000 # min 100, 8KB each #max_prepared_transactions = 5 # can be 0 or more # note: increasing max_prepared_transactions costs ~600 bytes of shared memory # per transaction slot, plus lock space (see max_locks_per_transaction). #work_mem = 1024 # min 64, size in KB #maintenance_work_mem = 16384 # min 1024, size in KB #max_stack_depth = 2048 # min 100, size in KB # - Free Space Map - #max_fsm_pages = 20000 # min max_fsm_relations*16, 6 bytes each #max_fsm_relations = 1000 # min 100, ~70 bytes each Our test server settingsOur test server has 12 GB of RAM. Assuming that the PostgreSQL defaults were set for a system with 1 GB of RAM, we should be able to multiply the memory-based settings by at least a factor of 12. We're a little bit more aggressive than that in our settings. Note, however, that this is a single-server install of Evergreen, so we're also running memcached, ejabberd, Apache, and all of the Evergreen services as well as the database - oh, and a test instance of an institutional repository, among other apps - so we're not nearly as aggressive as we would be in a dedicated PostgreSQL server configuration. Please note that I'm making no claims that this is the optimal set of configuration values for PostgreSQL even on our own hardware! # shared_buffers: much of our performance depends on sorting, so we'll set it 100X the default # some tuning guides suggest cranking this up to as much 30% of your available RAM shared_buffers = 100000 # 8K * 100000 = ~ 0.8 GB # work_mem: how much RAM each concurrent process is allowed to claim before swapping to disk # your workload will probably have a large number of concurrent processes work_mem=524288 # 512 MB # max_fsm_pages: increased because PostgreSQL demanded it max_fsm_pages = 200000 After you change these settings, you will need to restart PostgreSQL to make the settings take effect. Kernel tuningIn addition to PostgreSQL complaining about max_fsm_pages not being high enough, your operating system kernel defaults for SysV shared memory might not be high enough to support the amount of RAM PostgreSQL demands as a result of your modifications. In one of our test configurations, we had cranked up work_mem to 8GB; Debian complained about an insufficient SHMMAX setting, so we were able to adjust that by running the following command as root to set the kernel SHMMAX to 8GB (8*1024^2): sysctl -w kernel.shmmax=8589934592 To make this setting sticky through reboots, you can simply modify /etc/sysctl.conf to include the following line: # Set SHMMAX to 8GB for PostgreSQL #kernel.shmmax=8589934592 Other measuresDebian Etch comes with PostgreSQL 8.1. The first version of PostgreSQL 8.1 was released in November 2005. That's a long time in computer years. Version 8.2, which was released less than a year later, "adds many functionality and performance improvements" (according to the release notes). If you're not getting the performance you expect from your hardware with Debian Etch, perhaps a backport of PostgreSQL 8.2 would help out. Further resourcesThis is just a shallow dip into PostgreSQL tuning for Evergreen - hopefully enough to alert you to some of the factors you need to consider if you're putting Evergreen into a serious testing environment or production environment. Here are a few places to dig deeper into the art of PostgreSQL tuning:
Wednesday, April 9. 2008Test server strategiesOccasionally on the #OpenILS-Evergreen IRC channel, a question comes up what kind of hardware a site should buy if they're getting serious about trying out Evergreen. I had exactly the same chat with Mike Rylander back in December, so I thought it might be useful to share the strategy we developed in case other organizations are interested in piggy-backing on our research. We came up with three different scenarios, depending on the funding available to the organization and how serious the organization is about testing, developing, and deploying Evergreen. You can also look at the scenarios as stages, as the scenarios enable progressively more realistic testing. An organization can always start with a single server and add more servers over time; if you can swing a significant discount for buying in bulk, however, it might make sense to bite the bullet early. Some pertinent facts about our requirements: we will eventually be loading around 5 million bibliographic records onto the system. We're an academic organization, so concurrent searching and circulation loads will be low relative to public libraries. Scenario 1: A single bargain-basement testing serverIn this scenario, the organization purchases a single server for the short term, and configures it to run the entire Evergreen + OpenSRF stack:
This server needs to have powerful CPUs, large amounts of RAM, and many fast (10K RPM or higher) hard drives in a striped RAID configuration (the latter because database performance typically gets knee-capped by disk access). A "higher education" quote online from a reputable big-name vendor for a rack-mounted 2U database server with 2x4-core CPU, 16GB RAM, 6x73GB RAID 5 drives comes in at approximately $7000. This scenario is fine for development and testing with a limited number of users, but if you intend to do any sort of stress testing with this server or throw it open to the public, performance will likely grind to a halt. Note: This is close to the system that we're currently running at http://biblio-dev.laurentian.ca - 12 GB of RAM, 2 dual-core CPUs - with 800K bibliographic records and pretty snappy search performance. It's certainly nothing to sneeze at. Scenario 2: one database server, one network serverIn this scenario, you purchase a database server and a network server. We'll use the same specs from scenario 1 for the database server, and a CPU + RAM-oriented server for the network server (disk access isn't a factor for the network apps, so you just buy two small mirrored drives). The stock higher education quote for a rack-mounted 1U network server with 2x4-core CPU, 16GB RAM, 2x73GB RAID 1 drives is approximately $5250. This scenario will support development and testing, as well as enable you perform relatively representative stress testing runs with a significant number of simultaneous users. Scenario 3: two database servers, two or three network serversIn this scenario, you purchase two database servers so that you can test database replication, split database loads between search and reporting, and two or three network servers to test different distributions of the caching and network apps across the servers to determine the configuration that best meets your expected demands. The cost of the five servers adds up to less than $30,000 - less than a single traditional proprietary UNIX server - and would be less if you can negotiate a bulk discount. The third scenario supports development and testing, and will give you practical experience with a configuration that would approximate your production deployment of servers. When you go live, you could move one of the database servers and all but one of the network servers over to the production cluster, and revert back to scenario one for your ongoing test and development environment. The Conifer approachWe opted to go with the third scenario to build a serious test cluster for our consortium. However, the "scenarios as stages" approach ended up being our strategy as our original choice of Dell servers came with RAID controllers that do not work well under Debian. After returning the servers to Dell, we were forced to press one of our backup servers into service as a scenario-one style server while waiting for our new order from HP to arrive.
(Page 1 of 1, totaling 2 entries)
|
QuicksearchAbout MeI'm Dan Scott: barista, library geek, and free-as-in-freedom software developer.
I hack on projects such as the Evergreen
open-source ILS project and PEAR's File_MARC package .
By day I'm the Systems Librarian for Laurentian University. You can reach me by email at dan@coffeecode.net. Identi.ca microblogging
LicenseCategories |
