Openldap Technical folk,

 

We have inherited an openldap farm that was deployed using openldap v2.3.27.

 

We have been testing a newly compiled v2.4.11 with same compile flags as a possible replacement due to some replication errors we have seen, but have discovered other bigger problems with the new instance.

 

We believe the issue may be related to in memory cache not working as expected, or that 2.4.11 does not use the hdb backend as efficiently as before.  Can anyone confirm a negative performance difference between these versions, or an issue with cache?  We are seeing major significant differences in the db_stat output with orders of magnitude difference in the number of attempted reads against the backend cache. I assume these are unsual and that in memory entry cache would normally prevent this traffic from reach the bdb cache. I assume we simply have something wrong in configuration, but I don’t see an obvious explanation. If anyone has a moment to review, we would appreciate your feedback.

 

 

 

Here is the process we followed, with supporting config info:

 

We have a SLAMD benchmark test based on a real world use case where 400 clients make a “near” simultaneous connection to the directory and execute a search like the following:

 

ldapsearch –h server1 –x – b “ou=myou,dc=mydc,dc=com” “objectclass=*”

 

There are nearly 70,000 objects in this ou with 5 attributes each ( 3 of which are objectclass ), and nearly 210,000 objects in the entire directory.  We have an objectclass index.

 

2.4.11 tests were performed on instances compiled on SLES9.3 64bit, 4 way dual core procs, 16GB RAM, using hoard memory manager, bdb 4.6, and cyrus-sasl-2.1.22 .

 

2.3.27 tests were performaned on SLES9.3 64bit, 2 way single core proc, 8GB RAM, using standard memory manager, and standard bdb ( 4.2 ).

 

On the v2.3.27 instances, we see all 400 clients get a connection, and get their results.

 

On the new v2.4.11 instance, we see around 150-175 clients get a connection, and the rest get a failure that they cannot reach the server.  After more benchmarking, tcpdump, and loglevel -1  we know that the client traffic is getting to the box, but the openldap listener thread does not pick up the connection. We also see high numbers of processes waiting in the CPU run queue.

 

Reducing the number of objects in the directory to 100 results in successful connections to all 400 clients, which lead us to believe the issue might be due to differences in read performance between the instances.  The same DB_CONFIG was used in both cases, and the slapd.conf was the same, with some minor tweaks due to slightly different cache configuration options between the versions.  Please see the DB_CONFIG and the hdb backend stanza from the slapd.conf file included below.

 

We then did some basic single query tests of both instances and looked at the logs with loglevel -1 and the db_stat output.  What we saw was a major difference between both instances on the db_stat results. As mentioned in the summary above, we don’t have a good explanation for the difference, although it is significant, and reliable across multiple iterations of test. Please see the db_stat differences shown below. Also seems very unusual that the initial db cache stats would be so high on the new version.

 

slapd.conf:

 

2.4.11 hdb stanza ( also tested these with the same cache numbers as below 2.3.27 instance with no difference. We reduced these to reasonable levels as old version config seemed overkill ):

 

database       hdb

directory      /local/mnt/ldap.2.4.11/cache-data

threads        32

suffix         "dc=mydc,dc=com"

rootdn         <<snip>>

rootpw         <<snip>>

cachesize      500000

dncachesize      1000000

idlcachesize   30000000

sizelimit      10000000

loglevel       stats sync

dirtyread

include        /opt/ldap/indexes/my.indexes

 

2.3.27 hdb stanza

 

database        hdb

directory /local/mnt/ldap/cache-data

threads             32

suffix          " dc=mydc,dc=com "

rootdn          <<snip>>

rootpw           <<snip>>

cachesize               20971520

dbcachesize             20971520      ( not a typo  - this one is “dBcachesize.  The other is dNcachesize )

idlcachesize               20971520

sizelimit    10000000

loglevel stats sync

dirtyread

include /opt/ldap/indexes/my.indexes

 

DB_config  ( Same for both instances ):

 

set_cachesize 1 1048576000 12

set_flags DB_LOG_AUTOREMOVE

set_lg_bsize 2097512

set_lg_dir /local/mnt/ldap/cache-data  ( this value points to correct directory in both instances )

set_flags DB_TXN_NOSYNC

set_lg_regionmax 500000

set_lk_max_locks    30000

set_lk_max_lockers    30000

set_lk_max_objects    30000

set_tmp_dir    /dev/shm

 

 

After startup with no client test ( previous database instance was completely deleted and recreated using slapadd ), here are the db_stat –m output. I excluded some of the index db info for brevity:

 

Version 2.4.11

 

1GB 1000MB      Total cache size

12      Number of caches

12      Maximum number of caches

168MB 688KB     Pool individual cache size

0       Maximum memory-mapped file size

0       Maximum open file descriptors

0       Maximum sequential buffer writes

0       Sleep after writing maximum sequential buffers

0       Requested pages mapped into the process' address space

15M     Requested pages found in the cache (99%)

24      Requested pages not found in the cache

9225    Pages created in the cache

24      Pages read into the cache

9244    Pages written from the cache to the backing file

0       Clean pages forced from the cache

0       Dirty pages forced from the cache

0       Dirty pages written by trickle-sync thread

9247    Current total page count

9247    Current clean page count

0       Current dirty page count

393252  Number of hash buckets used for page location

14M     Total number of times hash chains searched for a page (14773760)

9       The longest hash chain searched for a page

14M     Total number of hash chain entries checked for page (14764487)

0       The number of hash bucket locks that required waiting (0%)

0       The maximum number of times any hash bucket lock was waited for (0%)

0       The number of region locks that required waiting (0%)

0       The number of buffers frozen

0       The number of buffers thawed

0       The number of frozen buffers freed

9309    The number of page allocations

0       The number of hash buckets examined during allocations

0       The maximum number of hash buckets examined for an allocation

0       The number of pages examined during allocations

0       The max number of pages examined for an allocation

0       Threads waited on page I/O

 

Pool File: dn2id.bdb

4096    Page size

0       Requested pages mapped into the process' address space

1005002 Requested pages found in the cache (99%)

2       Requested pages not found in the cache

3062    Pages created in the cache

2       Pages read into the cache

3064    Pages written from the cache to the backing file

 

Pool File: id2entry.bdb

16384   Page size

0       Requested pages mapped into the process' address space

419925  Requested pages found in the cache (99%)

2       Requested pages not found in the cache

2967    Pages created in the cache

2       Pages read into the cache

2969    Pages written from the cache to the backing file

 

 

Version 2.3.27

 

1GB 1000MB     Total cache size.

12   Number of caches.

168MB 688KB    Pool individual cache size.

0    Requested pages mapped into the process' address space.

22738     Requested pages found in the cache (99%).

285  Requested pages not found in the cache.

0    Pages created in the cache.

285  Pages read into the cache.

0    Pages written from the cache to the backing file.

0    Clean pages forced from the cache.

0    Dirty pages forced from the cache.

0    Dirty pages written by trickle-sync thread.

285  Current total page count.

285  Current clean page count.

0    Current dirty page count.

393252    Number of hash buckets used for page location.

23308     Total number of times hash chains searched for a page.

12   The longest hash chain searched for a page.

22738     Total number of hash buckets examined for page location.

46616     The number of hash bucket locks granted without waiting.

0    The number of hash bucket locks granted after waiting.

0    The maximum number of times any hash bucket lock was waited for.

641  The number of region locks granted without waiting.

0    The number of region locks granted after waiting.

297  The number of page allocations.

0    The number of hash buckets examined during allocations

0    The max number of hash buckets examined for an allocation

0    The number of pages examined during allocations

0    The max number of pages examined for an allocation

 

Pool File: dn2id.bdb

4096 Page size.

0    Requested pages mapped into the process' address space.

13076     Requested pages found in the cache (99%).

132  Requested pages not found in the cache.

0    Pages created in the cache.

132  Pages read into the cache.

0    Pages written from the cache to the backing file.

 

Pool File: id2entry.bdb

16384     Page size.

0    Requested pages mapped into the process' address space.

9659 Requested pages found in the cache (99%).

138  Requested pages not found in the cache.

0    Pages created in the cache.

138  Pages read into the cache.

0    Pages written from the cache to the backing file.

 

After 1 client query:

 

Version 2.4.11

 

1GB 1000MB      Total cache size

12      Number of caches

12      Maximum number of caches

168MB 688KB     Pool individual cache size

0       Maximum memory-mapped file size

0       Maximum open file descriptors

0       Maximum sequential buffer writes

0       Sleep after writing maximum sequential buffers

0       Requested pages mapped into the process' address space

15M     Requested pages found in the cache (99%)

24      Requested pages not found in the cache

9244    Pages created in the cache

24      Pages read into the cache

9263    Pages written from the cache to the backing file

0       Clean pages forced from the cache

0       Dirty pages forced from the cache

0       Dirty pages written by trickle-sync thread

9266    Current total page count

9266    Current clean page count

0       Current dirty page count

393252  Number of hash buckets used for page location

14M     Total number of times hash chains searched for a page (14753673)

9       The longest hash chain searched for a page

14M     Total number of hash chain entries checked for page (14744381)

0       The number of hash bucket locks that required waiting (0%)

0       The maximum number of times any hash bucket lock was waited for (0%)

0       The number of region locks that required waiting (0%)

0       The number of buffers frozen

0       The number of buffers thawed

0       The number of frozen buffers freed

9328    The number of page allocations

0       The number of hash buckets examined during allocations

0       The maximum number of hash buckets examined for an allocation

0       The number of pages examined during allocations

0       The max number of pages examined for an allocation

0       Threads waited on page I/O

 

Pool File: dn2id.bdb

4096    Page size

0       Requested pages mapped into the process' address space

997746  Requested pages found in the cache (99%)

2       Requested pages not found in the cache

3062    Pages created in the cache

2       Pages read into the cache

3064    Pages written from the cache to the backing file

 

Pool File: id2entry.bdb

16384   Page size

0       Requested pages mapped into the process' address space

410855  Requested pages found in the cache (99%)

2       Requested pages not found in the cache

2967    Pages created in the cache

2       Pages read into the cache

2969    Pages written from the cache to the backing file

 

Version 2.3.27

 

1GB 1000MB     Total cache size.

12   Number of caches.

168MB 688KB    Pool individual cache size.

0    Requested pages mapped into the process' address space.

299222    Requested pages found in the cache (98%).

7144 Requested pages not found in the cache.

0    Pages created in the cache.

7144 Pages read into the cache.

0    Pages written from the cache to the backing file.

0    Clean pages forced from the cache.

0    Dirty pages forced from the cache.

0    Dirty pages written by trickle-sync thread.

7144 Current total page count.

7144 Current clean page count.

0    Current dirty page count.

393252    Number of hash buckets used for page location.

313510    Total number of times hash chains searched for a page.

23   The longest hash chain searched for a page.

300752    Total number of hash buckets examined for page location.

627020    The number of hash bucket locks granted without waiting.

0    The number of hash bucket locks granted after waiting.

0    The maximum number of times any hash bucket lock was waited for.

14400     The number of region locks granted without waiting.

0    The number of region locks granted after waiting.

7164 The number of page allocations.

0    The number of hash buckets examined during allocations

0    The max number of hash buckets examined for an allocation

0    The number of pages examined during allocations

0    The max number of pages examined for an allocation

 

Pool File: dn2id.bdb

4096 Page size.

0    Requested pages mapped into the process' address space.

173225    Requested pages found in the cache (98%).

3233 Requested pages not found in the cache.

0    Pages created in the cache.

3233 Pages read into the cache.

0    Pages written from the cache to the backing file.

 

Pool File: id2entry.bdb

16384     Page size.

0    Requested pages mapped into the process' address space.

125990    Requested pages found in the cache (97%).

3888 Requested pages not found in the cache.

0    Pages created in the cache.

3888 Pages read into the cache.

0    Pages written from the cache to the backing file.

 

##########################

 

 

 

Thanks!

 

-Michael