Openldap Technical folk,
We have inherited an openldap farm that was deployed using openldap v2.3.27.
We have been testing a newly compiled v2.4.11 with same compile flags as a possible replacement due to some replication errors we have seen, but have discovered other bigger problems with the new instance.
We believe the issue may be related to in memory cache not working as expected, or that 2.4.11 does not use the hdb backend as efficiently as before. Can anyone confirm a negative performance difference between these versions, or an issue with cache? We are seeing major significant differences in the db_stat output with orders of magnitude difference in the number of attempted reads against the backend cache. I assume these are unsual and that in memory entry cache would normally prevent this traffic from reach the bdb cache. I assume we simply have something wrong in configuration, but I don't see an obvious explanation. If anyone has a moment to review, we would appreciate your feedback.
Here is the process we followed, with supporting config info:
We have a SLAMD benchmark test based on a real world use case where 400 clients make a "near" simultaneous connection to the directory and execute a search like the following:
ldapsearch -h server1 -x - b "ou=myou,dc=mydc,dc=com" "objectclass=*"
There are nearly 70,000 objects in this ou with 5 attributes each ( 3 of which are objectclass ), and nearly 210,000 objects in the entire directory. We have an objectclass index.
2.4.11 tests were performed on instances compiled on SLES9.3 64bit, 4 way dual core procs, 16GB RAM, using hoard memory manager, bdb 4.6, and cyrus-sasl-2.1.22 .
2.3.27 tests were performaned on SLES9.3 64bit, 2 way single core proc, 8GB RAM, using standard memory manager, and standard bdb ( 4.2 ).
On the v2.3.27 instances, we see all 400 clients get a connection, and get their results.
On the new v2.4.11 instance, we see around 150-175 clients get a connection, and the rest get a failure that they cannot reach the server. After more benchmarking, tcpdump, and loglevel -1 we know that the client traffic is getting to the box, but the openldap listener thread does not pick up the connection. We also see high numbers of processes waiting in the CPU run queue.
Reducing the number of objects in the directory to 100 results in successful connections to all 400 clients, which lead us to believe the issue might be due to differences in read performance between the instances. The same DB_CONFIG was used in both cases, and the slapd.conf was the same, with some minor tweaks due to slightly different cache configuration options between the versions. Please see the DB_CONFIG and the hdb backend stanza from the slapd.conf file included below.
We then did some basic single query tests of both instances and looked at the logs with loglevel -1 and the db_stat output. What we saw was a major difference between both instances on the db_stat results. As mentioned in the summary above, we don't have a good explanation for the difference, although it is significant, and reliable across multiple iterations of test. Please see the db_stat differences shown below. Also seems very unusual that the initial db cache stats would be so high on the new version.
slapd.conf:
2.4.11 hdb stanza ( also tested these with the same cache numbers as below 2.3.27 instance with no difference. We reduced these to reasonable levels as old version config seemed overkill ):
database hdb directory /local/mnt/ldap.2.4.11/cache-data threads 32 suffix "dc=mydc,dc=com" rootdn <<snip>> rootpw <<snip>> cachesize 500000 dncachesize 1000000 idlcachesize 30000000 sizelimit 10000000 loglevel stats sync dirtyread include /opt/ldap/indexes/my.indexes
2.3.27 hdb stanza
database hdb directory /local/mnt/ldap/cache-data threads 32 suffix " dc=mydc,dc=com " rootdn <<snip>> rootpw <<snip>> cachesize 20971520 dbcachesize 20971520 ( not a typo - this one is "dBcachesize. The other is dNcachesize ) idlcachesize 20971520 sizelimit 10000000 loglevel stats sync dirtyread include /opt/ldap/indexes/my.indexes
DB_config ( Same for both instances ):
set_cachesize 1 1048576000 12 set_flags DB_LOG_AUTOREMOVE set_lg_bsize 2097512 set_lg_dir /local/mnt/ldap/cache-data ( this value points to correct directory in both instances ) set_flags DB_TXN_NOSYNC set_lg_regionmax 500000 set_lk_max_locks 30000 set_lk_max_lockers 30000 set_lk_max_objects 30000 set_tmp_dir /dev/shm
After startup with no client test ( previous database instance was completely deleted and recreated using slapadd ), here are the db_stat -m output. I excluded some of the index db info for brevity:
Version 2.4.11
1GB 1000MB Total cache size 12 Number of caches 12 Maximum number of caches 168MB 688KB Pool individual cache size 0 Maximum memory-mapped file size 0 Maximum open file descriptors 0 Maximum sequential buffer writes 0 Sleep after writing maximum sequential buffers 0 Requested pages mapped into the process' address space 15M Requested pages found in the cache (99%) 24 Requested pages not found in the cache 9225 Pages created in the cache 24 Pages read into the cache 9244 Pages written from the cache to the backing file 0 Clean pages forced from the cache 0 Dirty pages forced from the cache 0 Dirty pages written by trickle-sync thread 9247 Current total page count 9247 Current clean page count 0 Current dirty page count 393252 Number of hash buckets used for page location 14M Total number of times hash chains searched for a page (14773760) 9 The longest hash chain searched for a page 14M Total number of hash chain entries checked for page (14764487) 0 The number of hash bucket locks that required waiting (0%) 0 The maximum number of times any hash bucket lock was waited for (0%) 0 The number of region locks that required waiting (0%) 0 The number of buffers frozen 0 The number of buffers thawed 0 The number of frozen buffers freed 9309 The number of page allocations 0 The number of hash buckets examined during allocations 0 The maximum number of hash buckets examined for an allocation 0 The number of pages examined during allocations 0 The max number of pages examined for an allocation 0 Threads waited on page I/O
Pool File: dn2id.bdb 4096 Page size 0 Requested pages mapped into the process' address space 1005002 Requested pages found in the cache (99%) 2 Requested pages not found in the cache 3062 Pages created in the cache 2 Pages read into the cache 3064 Pages written from the cache to the backing file
Pool File: id2entry.bdb 16384 Page size 0 Requested pages mapped into the process' address space 419925 Requested pages found in the cache (99%) 2 Requested pages not found in the cache 2967 Pages created in the cache 2 Pages read into the cache 2969 Pages written from the cache to the backing file
Version 2.3.27
1GB 1000MB Total cache size. 12 Number of caches. 168MB 688KB Pool individual cache size. 0 Requested pages mapped into the process' address space. 22738 Requested pages found in the cache (99%). 285 Requested pages not found in the cache. 0 Pages created in the cache. 285 Pages read into the cache. 0 Pages written from the cache to the backing file. 0 Clean pages forced from the cache. 0 Dirty pages forced from the cache. 0 Dirty pages written by trickle-sync thread. 285 Current total page count. 285 Current clean page count. 0 Current dirty page count. 393252 Number of hash buckets used for page location. 23308 Total number of times hash chains searched for a page. 12 The longest hash chain searched for a page. 22738 Total number of hash buckets examined for page location. 46616 The number of hash bucket locks granted without waiting. 0 The number of hash bucket locks granted after waiting. 0 The maximum number of times any hash bucket lock was waited for. 641 The number of region locks granted without waiting. 0 The number of region locks granted after waiting. 297 The number of page allocations. 0 The number of hash buckets examined during allocations 0 The max number of hash buckets examined for an allocation 0 The number of pages examined during allocations 0 The max number of pages examined for an allocation
Pool File: dn2id.bdb 4096 Page size. 0 Requested pages mapped into the process' address space. 13076 Requested pages found in the cache (99%). 132 Requested pages not found in the cache. 0 Pages created in the cache. 132 Pages read into the cache. 0 Pages written from the cache to the backing file.
Pool File: id2entry.bdb 16384 Page size. 0 Requested pages mapped into the process' address space. 9659 Requested pages found in the cache (99%). 138 Requested pages not found in the cache. 0 Pages created in the cache. 138 Pages read into the cache. 0 Pages written from the cache to the backing file.
After 1 client query:
Version 2.4.11
1GB 1000MB Total cache size 12 Number of caches 12 Maximum number of caches 168MB 688KB Pool individual cache size 0 Maximum memory-mapped file size 0 Maximum open file descriptors 0 Maximum sequential buffer writes 0 Sleep after writing maximum sequential buffers 0 Requested pages mapped into the process' address space 15M Requested pages found in the cache (99%) 24 Requested pages not found in the cache 9244 Pages created in the cache 24 Pages read into the cache 9263 Pages written from the cache to the backing file 0 Clean pages forced from the cache 0 Dirty pages forced from the cache 0 Dirty pages written by trickle-sync thread 9266 Current total page count 9266 Current clean page count 0 Current dirty page count 393252 Number of hash buckets used for page location 14M Total number of times hash chains searched for a page (14753673) 9 The longest hash chain searched for a page 14M Total number of hash chain entries checked for page (14744381) 0 The number of hash bucket locks that required waiting (0%) 0 The maximum number of times any hash bucket lock was waited for (0%) 0 The number of region locks that required waiting (0%) 0 The number of buffers frozen 0 The number of buffers thawed 0 The number of frozen buffers freed 9328 The number of page allocations 0 The number of hash buckets examined during allocations 0 The maximum number of hash buckets examined for an allocation 0 The number of pages examined during allocations 0 The max number of pages examined for an allocation 0 Threads waited on page I/O
Pool File: dn2id.bdb 4096 Page size 0 Requested pages mapped into the process' address space 997746 Requested pages found in the cache (99%) 2 Requested pages not found in the cache 3062 Pages created in the cache 2 Pages read into the cache 3064 Pages written from the cache to the backing file
Pool File: id2entry.bdb 16384 Page size 0 Requested pages mapped into the process' address space 410855 Requested pages found in the cache (99%) 2 Requested pages not found in the cache 2967 Pages created in the cache 2 Pages read into the cache 2969 Pages written from the cache to the backing file
Version 2.3.27
1GB 1000MB Total cache size. 12 Number of caches. 168MB 688KB Pool individual cache size. 0 Requested pages mapped into the process' address space. 299222 Requested pages found in the cache (98%). 7144 Requested pages not found in the cache. 0 Pages created in the cache. 7144 Pages read into the cache. 0 Pages written from the cache to the backing file. 0 Clean pages forced from the cache. 0 Dirty pages forced from the cache. 0 Dirty pages written by trickle-sync thread. 7144 Current total page count. 7144 Current clean page count. 0 Current dirty page count. 393252 Number of hash buckets used for page location. 313510 Total number of times hash chains searched for a page. 23 The longest hash chain searched for a page. 300752 Total number of hash buckets examined for page location. 627020 The number of hash bucket locks granted without waiting. 0 The number of hash bucket locks granted after waiting. 0 The maximum number of times any hash bucket lock was waited for. 14400 The number of region locks granted without waiting. 0 The number of region locks granted after waiting. 7164 The number of page allocations. 0 The number of hash buckets examined during allocations 0 The max number of hash buckets examined for an allocation 0 The number of pages examined during allocations 0 The max number of pages examined for an allocation
Pool File: dn2id.bdb 4096 Page size. 0 Requested pages mapped into the process' address space. 173225 Requested pages found in the cache (98%). 3233 Requested pages not found in the cache. 0 Pages created in the cache. 3233 Pages read into the cache. 0 Pages written from the cache to the backing file.
Pool File: id2entry.bdb 16384 Page size. 0 Requested pages mapped into the process' address space. 125990 Requested pages found in the cache (97%). 3888 Requested pages not found in the cache. 0 Pages created in the cache. 3888 Pages read into the cache. 0 Pages written from the cache to the backing file.
##########################
Thanks!
-Michael