Hi all,
I'm trying to load ldap with several ldif files. The biggest one containing 500k records of approx 700 bytes in size each took slapadd a little over 14 hours to load. Would this be considered normal given the amount of records and size?
I am including my DB_CONFIG and slapd.conf files to see if anyone here can help me figure out if I made a mistake while setting them up.
The command I use for loading is:
slapadd -q -f slapd.conf -l myfile.ldif
One thing I did notice is that using bdb instead of hdb took the load time to 19 hours.
Anyone have any recommendations?
Thanks,
Diego.
--------- DB_CONFIG --------- set_lg_max 209715200 set_lg_bsize 52428800 set_tmp_dir /data/ldap/tmp set_cachesize 0 209715200 2 set_lk_max_locks 4000 set_lk_max_lockers 4000 set_lk_max_objects 4000
---------- slapd.conf ---------- allow bind_v2 include /etc/ldap/schema/core.schema include /etc/ldap/schema/cosine.schema include /etc/ldap/schema/nis.schema include /etc/ldap/schema/inetorgperson.schema include /etc/ldap/schema/my.schema pidfile /var/run/slapd/build-slapd.pid argsfile /var/run/slapd/build-slapd.args modulepath /usr/lib/ldap moduleload back_hdb password-hash {SSHA} disallow bind_anon backend hdb database hdb suffix "dc=mydomain,dc=com" directory "/var/lib/build-ldap" lastmod on
--On May 15, 2009 2:10:25 PM -0400 Diego Figueroa dfiguero@yorku.ca wrote:
Hi all,
I'm trying to load ldap with several ldif files. The biggest one containing 500k records of approx 700 bytes in size each took slapadd a little over 14 hours to load. Would this be considered normal given the amount of records and size?
DB_CONFIG
set_lg_max 209715200 set_lg_bsize 52428800 set_tmp_dir /data/ldap/tmp set_cachesize 0 209715200 2 set_lk_max_locks 4000 set_lk_max_lockers 4000 set_lk_max_objects 4000
I'd guess your cachesize is miniscule compared to what you need. What was the resulting size of the database (du -c -h *.bdb)? Your cachesize needs to be that big, at the least.
Second, don't use multiple BDB cache segments, it slows things down.
slapd.conf
allow bind_v2 include /etc/ldap/schema/core.schema include /etc/ldap/schema/cosine.schema include /etc/ldap/schema/nis.schema include /etc/ldap/schema/inetorgperson.schema include /etc/ldap/schema/my.schema pidfile /var/run/slapd/build-slapd.pid argsfile /var/run/slapd/build-slapd.args modulepath /usr/lib/ldap moduleload back_hdb password-hash {SSHA} disallow bind_anon backend hdb database hdb suffix "dc=mydomain,dc=com" directory "/var/lib/build-ldap" lastmod on
You should configure the tool-threads option to match how many real cores your system has. Then it can multi-thread any indices.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
Thanks for your input Quanah,
I also just noticed that top is reporting 50-90% I/O waiting times. I might have to look at my disks to further improve things.
Thanks,
Diego.
Quanah Gibson-Mount quanah@zimbra.com 2009/05/15 03:01 PM
To Diego Figueroa dfiguero@yorku.ca, openldap-software@openldap.org cc
Subject Re: slow slapadd?
--On May 15, 2009 2:10:25 PM -0400 Diego Figueroa dfiguero@yorku.ca wrote:
Hi all,
I'm trying to load ldap with several ldif files. The biggest one containing 500k records of approx 700 bytes in size each took slapadd a little over 14 hours to load. Would this be considered normal given the amount of records and size?
DB_CONFIG
set_lg_max 209715200 set_lg_bsize 52428800 set_tmp_dir /data/ldap/tmp set_cachesize 0 209715200 2 set_lk_max_locks 4000 set_lk_max_lockers 4000 set_lk_max_objects 4000
I'd guess your cachesize is miniscule compared to what you need. What was
the resulting size of the database (du -c -h *.bdb)? Your cachesize needs
to be that big, at the least.
Second, don't use multiple BDB cache segments, it slows things down.
slapd.conf
allow bind_v2 include /etc/ldap/schema/core.schema include /etc/ldap/schema/cosine.schema include /etc/ldap/schema/nis.schema include /etc/ldap/schema/inetorgperson.schema include /etc/ldap/schema/my.schema pidfile /var/run/slapd/build-slapd.pid argsfile /var/run/slapd/build-slapd.args modulepath /usr/lib/ldap moduleload back_hdb password-hash {SSHA} disallow bind_anon backend hdb database hdb suffix "dc=mydomain,dc=com" directory "/var/lib/build-ldap" lastmod on
You should configure the tool-threads option to match how many real cores your system has. Then it can multi-thread any indices.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
Hi Diego
On 15 May 2009, at 20:54, Diego Figueroa wrote:
Thanks for your input Quanah,
I also just noticed that top is reporting 50-90% I/O waiting times. I might have to look at my disks to further improve things.
That can be an over-simplification - you may be right, but it could be an over-simplification.
Random seeks will always create a performance slowdown on physical disks. If you optimise the DB so that you reduce the number of random seeks, you'll get dramatically faster performance.
Realise that if your db is, say, 200mb, you could probably write the whole file contiguously in 3-4 seconds on most server PCs. But if you do 1 seek per object in your 500k item database with reasonable seek- time disks (say 6.5ms), you'll be doing 500000 seeks *6.5ms = 3250000 ms = 3250 seconds = 54 minutes.
http://www.oracle.com/technology/documentation/berkeley-db/db/ref/transapp/t... says that every write can do the following seeks:
1 Disk seek to database file 2 Database file read 3 Disk seek to log file 4 Log file write 5 Flush log file information to disk 6 Disk seek to update log file metadata (for example, inode information) 7 Log metadata write 8 Flush log file metadata to disk
So, what to do? Well, if you update cache values, you'll find less reads. If you assume each item above is equal in wall-clock time, you could remove the first 3 items and speed things up 37.5% for every one of the cache hits.
You could also put the log file on a separate disk. Or you could perhaps put the log file on a ramdisk for your build, and move it to a stable disk after it completes. I'm assuming you don't have sufficient ram to store the whole db on a ramdisk, which would be the ideal for the build process.
You can also mount your filesystems with -noatime, which will help by removing step 6. Note you'll have to check whether this breaks other things on your system.
You could also try fiddle with the DB_TXN_WRITE_NOSYNC and DB_TXN_NOSYNC flags. I've not done that, and you'd have to be 100% sure that once your db goes live, this flag is then turned off or you disk disaster if your db server reboots. I wonder if it's possible for slapadd to turn these on automatically for the load process (perhaps it already does - I'm ignorant on that fact, unfortunately).
If you're feeling brave, and are building on a throwaway system (where you can reinstall due to filesystem corruption), you could also use something like hdparm under linux to change the disks so that they always return writes as successful immediately, even if the data hasn't been written to disk. I don't recommend this, but I've been known to do it when testing on a dev system. I don't have any stats on how much it'd help.
Another thing: I read an article a while back where someone found that innodb file fragmentation on mysql dbs created a massive slowdown over time with random small writes to to the file. The solution was fairly simple - move the files to a different directory, make a copy back into the original directory, and start the db again running off the copy. The new files will be written contiguously with very little fragmentation. It's not possible to do this mid-stream in the load on a new DB, but it may be a good practice once you have a very large complete DB file that's been built over time.
I'd be really interested in which of those items helps the most. Let people know if you play and find something interesting. Hopefully it helps someone else!
Oskar
Oskar Pearson wrote:
Hi Diego
On 15 May 2009, at 20:54, Diego Figueroa wrote:
Thanks for your input Quanah,
I also just noticed that top is reporting 50-90% I/O waiting times. I might have to look at my disks to further improve things.
That can be an over-simplification - you may be right, but it could be an over-simplification.
Random seeks will always create a performance slowdown on physical disks. If you optimise the DB so that you reduce the number of random seeks, you'll get dramatically faster performance.
Realise that if your db is, say, 200mb, you could probably write the whole file contiguously in 3-4 seconds on most server PCs. But if you do 1 seek per object in your 500k item database with reasonable seek- time disks (say 6.5ms), you'll be doing 500000 seeks *6.5ms = 3250000 ms = 3250 seconds = 54 minutes.
http://www.oracle.com/technology/documentation/berkeley-db/db/ref/transapp/t... says that every write can do the following seeks:
1 Disk seek to database file 2 Database file read 3 Disk seek to log file 4 Log file write 5 Flush log file information to disk 6 Disk seek to update log file metadata (for example, inode information) 7 Log metadata write 8 Flush log file metadata to disk
So, what to do? Well, if you update cache values, you'll find less reads. If you assume each item above is equal in wall-clock time, you could remove the first 3 items and speed things up 37.5% for every one of the cache hits.
You could also put the log file on a separate disk.
That is standard practice, as recommended in the BDB docs.
Or you could perhaps put the log file on a ramdisk for your build, and move it to a stable disk after it completes. I'm assuming you don't have sufficient ram to store the whole db on a ramdisk, which would be the ideal for the build process.
Putting the logfile on a ramdisk or other volatile storage completely defeats the purpose of the logfile...
You can also mount your filesystems with -noatime, which will help by removing step 6. Note you'll have to check whether this breaks other things on your system.
You could also try fiddle with the DB_TXN_WRITE_NOSYNC and DB_TXN_NOSYNC flags. I've not done that, and you'd have to be 100% sure that once your db goes live, this flag is then turned off or you disk disaster if your db server reboots. I wonder if it's possible for slapadd to turn these on automatically for the load process (perhaps it already does - I'm ignorant on that fact, unfortunately).
When using slapadd -q the transaction subsystem is disabled, so no synchronous writes/flushes of are performed by the main program. However, on some versions a background thread may be spawned off to perform trickle syncs, which may also be causing some seek traffic.
If you're feeling brave, and are building on a throwaway system (where you can reinstall due to filesystem corruption), you could also use something like hdparm under linux to change the disks so that they always return writes as successful immediately, even if the data hasn't been written to disk. I don't recommend this, but I've been known to do it when testing on a dev system. I don't have any stats on how much it'd help.
It wouldn't help at all since disks have such small on-board caches. Once the drive cache fills, it's forced to wait for some queued I/O to complete anyway before it can proceed.
Another thing: I read an article a while back where someone found that innodb file fragmentation on mysql dbs created a massive slowdown over time with random small writes to to the file. The solution was fairly simple - move the files to a different directory, make a copy back into the original directory, and start the db again running off the copy. The new files will be written contiguously with very little fragmentation. It's not possible to do this mid-stream in the load on a new DB, but it may be a good practice once you have a very large complete DB file that's been built over time.
Fragmentation is not an issue with back-bdb/hdb when creating new databases. I don't think it's much of an issue on heavily used databases either, due to the way BDB manages data.
The biggest factor is simply to configure a large enough BDB cache to prevent internal pages of the Btrees from getting swapped out of the cache. The other factor to consider is that BDB uses mmap'd files for its cache, by default. On some OSes (like Solaris) the default behavior for mmap'd regions is to aggressively sync them to the backing store. So whenever BDB touches a page in its cache, it gets immediately written back to disk. On Linux the default behavior is usually to hold the updates in the cache, and only flush them at a later time. This allows much higher throughput on Linux since it will usually be flushing a large contiguous block instead of randomly seeking to do a lot of small writes. However, on BDB 4.7 it seems the default behavior on Linux is also to do synchronous flushes of the cache. As such, one approach to getting consistent performance is to configure the backend to use shared memory for the BDB cache instead of mmap'd files. That way incidental page updates don't sync to anything, and the BDB library has full control over when pages get flushed back to disk.
--On Sunday, May 17, 2009 11:56:35 PM -0700 Howard Chu hyc@symas.com wrote:
However, on BDB 4.7 it seems the default behavior on Linux is also to do synchronous flushes of the cache. As such, one approach to getting consistent performance is to configure the backend to use shared memory for the BDB cache instead of mmap'd files. That way incidental page updates don't sync to anything, and the BDB library has full control over when pages get flushed back to disk.
We can confirm that setting shm_key on 4.7 dramatically affects the load time. In our initial tests of 4.7 we just pulled our old 4.2 BDB parameters forward. We never saw the slapadd of a 4.6 gbyte database complete. We killed it after 4 hours. After changing the shm_key setting the load time dropped to the more normal 30 minutes.
Bill
--On Monday, May 18, 2009 7:55 AM -0700 Bill MacAllister whm@stanford.edu wrote:
However, on BDB 4.7 it seems the default behavior on Linux is also to do synchronous flushes of the cache. As such, one approach to getting consistent performance is to configure the backend to use shared memory for the BDB cache instead of mmap'd files. That way incidental page updates don't sync to anything, and the BDB library has full control over when pages get flushed back to disk.
We can confirm that setting shm_key on 4.7 dramatically affects the load time. In our initial tests of 4.7 we just pulled our old 4.2 BDB parameters forward. We never saw the slapadd of a 4.6 gbyte database complete. We killed it after 4 hours. After changing the shm_key setting the load time dropped to the more normal 30 minutes.
To expand on this slightly. BDB has no difference in perf for small databases between disk and shm. It's only once your database is past some 6GB in size that shm vs disk starts to make a difference on Linux.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
On Sunday 17 May 2009 11:31:40 Oskar Pearson wrote:
Hi Diego
On 15 May 2009, at 20:54, Diego Figueroa wrote:
Thanks for your input Quanah,
I also just noticed that top is reporting 50-90% I/O waiting times. I might have to look at my disks to further improve things.
The fact that the disks can't keep up doesn't always mean you need faster disks ... it may mean you need to reduce the number of writes you are doing to them.
You could also try fiddle with the DB_TXN_WRITE_NOSYNC and DB_TXN_NOSYNC flags. I've not done that, and you'd have to be 100% sure that once your db goes live, this flag is then turned off or you disk disaster if your db server reboots. I wonder if it's possible for slapadd to turn these on automatically for the load process (perhaps it already does - I'm ignorant on that fact, unfortunately).
The -q flag to slapadd disables transactions, so it's best not to fool around with these flags (as you risk leaving them in after the import).
Regards, Buchan
openldap-software@openldap.org