Openldap Technical
folk,
We have inherited an openldap farm
that was deployed using openldap v2.3.27.
We have
been testing a newly compiled
v2.4.11 with same compile flags
as a possible replacement due to
some replication errors we have seen, but have discovered other bigger problems
with the new instance.
We believe
the issue may be related to in memory cache not working as expected, or that
2.4.11 does not use the hdb backend as efficiently as before. Can anyone
confirm a negative performance difference between these versions, or an issue
with cache? We are seeing major significant differences in the db_stat
output with orders of magnitude difference in the number of attempted reads
against the backend cache. I assume these are unsual and that in memory entry
cache would normally prevent this traffic
from reach the bdb cache. I assume we simply have something wrong in
configuration, but I don’t see an obvious explanation. If anyone has a moment to
review, we would appreciate your feedback.
Here is the process we followed,
with supporting config info:
We have a SLAMD benchmark test based
on a real world use case where 400 clients make a “near” simultaneous connection
to the directory and execute a search like the
following:
ldapsearch –h server1 –x – b
“ou=myou,dc=mydc,dc=com” “objectclass=*”
There are nearly 70,000 objects in
this ou with 5 attributes each ( 3 of which are objectclass ), and nearly
210,000 objects in the entire directory. We have an objectclass
index.
2.4.11 tests were performed on
instances compiled on SLES9.3 64bit, 4 way dual core procs, 16GB RAM, using
hoard memory manager, bdb 4.6, and cyrus-sasl-2.1.22
.
2.3.27 tests were performaned on
SLES9.3 64bit, 2 way single core proc, 8GB RAM, using standard memory manager,
and standard bdb ( 4.2 ).
On the v2.3.27 instances, we see all
400 clients get a connection, and get their
results.
On the new v2.4.11 instance, we see
around 150-175 clients get a connection, and the rest get a failure that they
cannot reach the server. After more benchmarking, tcpdump, and loglevel
-1 we know that the client traffic is getting to the box, but the openldap
listener thread does not pick up the connection. We also see high numbers of
processes waiting in the CPU run queue.
Reducing the number of objects in
the directory to 100 results in successful connections to all 400 clients, which
lead us to believe the issue might be due to differences in read performance
between the instances. The same DB_CONFIG was used in both cases, and the
slapd.conf was the same, with some minor tweaks due to slightly different cache
configuration options between the versions. Please see the DB_CONFIG and
the hdb backend stanza from the slapd.conf file included
below.
We then did some basic single query
tests of both instances and looked at the logs with loglevel -1 and the db_stat
output. What we saw was a major difference between both instances on the
db_stat results. As mentioned in the summary above, we don’t have a good
explanation for the difference, although it is significant, and reliable across
multiple iterations of test. Please see the db_stat differences shown below.
Also seems very unusual that the initial db cache stats would be so high on the
new version.
slapd.conf:
2.4.11 hdb stanza ( also tested
these with the same cache numbers as below 2.3.27 instance with no difference.
We reduced these to reasonable levels as old version config seemed overkill
):
database
hdb
directory
/local/mnt/ldap.2.4.11/cache-data
threads
32
suffix
"dc=mydc,dc=com"
rootdn
<<snip>>
rootpw
<<snip>>
cachesize
500000
dncachesize
1000000
idlcachesize
30000000
sizelimit
10000000
loglevel
stats sync
dirtyread
include
/opt/ldap/indexes/my.indexes
2.3.27 hdb
stanza
database
hdb
directory
/local/mnt/ldap/cache-data
threads
32
suffix
" dc=mydc,dc=com "
rootdn
<<snip>>
rootpw
<<snip>>
cachesize
20971520
dbcachesize
20971520 ( not a typo - this one is
“dBcachesize. The other is dNcachesize )
idlcachesize
20971520
sizelimit
10000000
loglevel stats
sync
dirtyread
include
/opt/ldap/indexes/my.indexes
DB_config ( Same for both
instances ):
set_cachesize 1 1048576000
12
set_flags
DB_LOG_AUTOREMOVE
set_lg_bsize
2097512
set_lg_dir
/local/mnt/ldap/cache-data ( this value points to correct directory in
both instances )
set_flags
DB_TXN_NOSYNC
set_lg_regionmax
500000
set_lk_max_locks
30000
set_lk_max_lockers
30000
set_lk_max_objects
30000
set_tmp_dir
/dev/shm
After startup with no client test (
previous database instance was completely deleted and recreated using slapadd ),
here are the db_stat –m output. I excluded some of the index db info for
brevity:
Version
2.4.11
1GB
1000MB Total cache
size
12
Number of caches
12
Maximum number of caches
168MB 688KB
Pool individual cache size
0
Maximum memory-mapped file size
0
Maximum open file descriptors
0
Maximum sequential buffer writes
0
Sleep after writing maximum sequential buffers
0
Requested pages mapped into the process' address
space
15M
Requested pages found in the cache (99%)
24
Requested pages not found in the cache
9225 Pages created
in the cache
24
Pages read into the cache
9244 Pages written
from the cache to the backing file
0
Clean pages forced from the cache
0
Dirty pages forced from the cache
0
Dirty pages written by trickle-sync thread
9247 Current total
page count
9247 Current clean
page count
0
Current dirty page count
393252 Number of hash buckets
used for page location
14M Total
number of times hash chains searched for a page
(14773760)
9
The longest hash chain searched for a page
14M Total
number of hash chain entries checked for page
(14764487)
0
The number of hash bucket locks that required waiting
(0%)
0
The maximum number of times any hash bucket lock was waited for
(0%)
0
The number of region locks that required waiting
(0%)
0
The number of buffers frozen
0
The number of buffers thawed
0
The number of frozen buffers freed
9309 The number of
page allocations
0
The number of hash buckets examined during
allocations
0
The maximum number of hash buckets examined for an
allocation
0
The number of pages examined during allocations
0
The max number of pages examined for an allocation
0
Threads waited on page I/O
Pool File:
dn2id.bdb
4096 Page
size
0
Requested pages mapped into the process' address
space
1005002 Requested pages found in the
cache (99%)
2
Requested pages not found in the cache
3062 Pages created
in the cache
2
Pages read into the cache
3064 Pages written
from the cache to the backing file
Pool File:
id2entry.bdb
16384 Page
size
0
Requested pages mapped into the process' address
space
419925 Requested pages found
in the cache (99%)
2
Requested pages not found in the cache
2967 Pages created
in the cache
2
Pages read into the cache
2969 Pages written
from the cache to the backing file
Version
2.3.27
1GB 1000MB
Total cache size.
12 Number of
caches.
168MB 688KB Pool
individual cache size.
0 Requested pages
mapped into the process' address space.
22738
Requested pages found in the cache (99%).
285 Requested pages not found
in the cache.
0 Pages created in
the cache.
285 Pages read into the
cache.
0 Pages written
from the cache to the backing file.
0 Clean pages
forced from the cache.
0 Dirty pages
forced from the cache.
0 Dirty pages
written by trickle-sync thread.
285 Current total page
count.
285 Current clean page
count.
0 Current dirty
page count.
393252 Number of
hash buckets used for page location.
23308 Total
number of times hash chains searched for a page.
12 The longest hash
chain searched for a page.
22738 Total
number of hash buckets examined for page location.
46616 The
number of hash bucket locks granted without
waiting.
0 The number of
hash bucket locks granted after waiting.
0 The maximum
number of times any hash bucket lock was waited
for.
641 The number of region locks
granted without waiting.
0 The number of
region locks granted after waiting.
297 The number of page
allocations.
0 The number of
hash buckets examined during allocations
0 The max number
of hash buckets examined for an allocation
0 The number of
pages examined during allocations
0 The max number
of pages examined for an allocation
Pool File:
dn2id.bdb
4096 Page
size.
0 Requested pages
mapped into the process' address space.
13076
Requested pages found in the cache (99%).
132 Requested pages not found
in the cache.
0 Pages created in
the cache.
132 Pages read into the
cache.
0 Pages written
from the cache to the backing file.
Pool File:
id2entry.bdb
16384 Page
size.
0 Requested pages
mapped into the process' address space.
9659 Requested pages found in the
cache (99%).
138 Requested pages not found
in the cache.
0 Pages created in
the cache.
138 Pages read into the
cache.
0 Pages written
from the cache to the backing file.
After 1 client
query:
Version
2.4.11
1GB
1000MB Total cache
size
12
Number of caches
12
Maximum number of caches
168MB 688KB
Pool individual cache size
0
Maximum memory-mapped file size
0
Maximum open file descriptors
0
Maximum sequential buffer writes
0
Sleep after writing maximum sequential buffers
0
Requested pages mapped into the process' address
space
15M
Requested pages found in the cache (99%)
24
Requested pages not found in the
cache
9244 Pages created
in the cache
24
Pages read into the cache
9263 Pages written
from the cache to the backing file
0
Clean pages forced from the cache
0
Dirty pages forced from the cache
0
Dirty pages written by trickle-sync thread
9266 Current total
page count
9266 Current clean
page count
0
Current dirty page count
393252 Number of hash buckets
used for page location
14M Total
number of times hash chains searched for a page
(14753673)
9
The longest hash chain searched for a page
14M Total
number of hash chain entries checked for page
(14744381)
0
The number of hash bucket locks that required waiting
(0%)
0
The maximum number of times any hash bucket lock was waited for
(0%)
0
The number of region locks that required waiting
(0%)
0
The number of buffers frozen
0
The number of buffers thawed
0
The number of frozen buffers freed
9328 The number of
page allocations
0
The number of hash buckets examined during
allocations
0
The maximum number of hash buckets examined for an
allocation
0
The number of pages examined during allocations
0
The max number of pages examined for an allocation
0
Threads waited on page I/O
Pool File:
dn2id.bdb
4096 Page
size
0
Requested pages mapped into the process' address
space
997746 Requested pages found
in the cache (99%)
2
Requested pages not found in the cache
3062 Pages created
in the cache
2
Pages read into the cache
3064 Pages written
from the cache to the backing file
Pool File:
id2entry.bdb
16384 Page
size
0
Requested pages mapped into the process' address
space
410855 Requested pages found
in the cache (99%)
2
Requested pages not found in the cache
2967 Pages created
in the cache
2
Pages read into the cache
2969 Pages written
from the cache to the backing file
Version
2.3.27
1GB 1000MB
Total cache size.
12 Number of
caches.
168MB 688KB Pool
individual cache size.
0 Requested pages
mapped into the process' address space.
299222 Requested
pages found in the cache (98%).
7144 Requested pages not found in
the cache.
0 Pages created in
the cache.
7144 Pages read into the
cache.
0 Pages written
from the cache to the backing file.
0 Clean pages
forced from the cache.
0 Dirty pages
forced from the cache.
0 Dirty pages
written by trickle-sync thread.
7144 Current total page
count.
7144 Current clean page
count.
0 Current dirty
page count.
393252 Number of
hash buckets used for page location.
313510 Total
number of times hash chains searched for a page.
23 The longest hash
chain searched for a page.
300752 Total
number of hash buckets examined for page location.
627020 The number
of hash bucket locks granted without waiting.
0 The number of
hash bucket locks granted after waiting.
0 The maximum
number of times any hash bucket lock was waited
for.
14400 The
number of region locks granted without waiting.
0 The number of
region locks granted after waiting.
7164 The number of page
allocations.
0 The number of
hash buckets examined during allocations
0 The max number
of hash buckets examined for an allocation
0 The number of
pages examined during allocations
0 The max number
of pages examined for an allocation
Pool File:
dn2id.bdb
4096 Page
size.
0 Requested pages
mapped into the process' address space.
173225 Requested
pages found in the cache (98%).
3233 Requested pages not found in
the cache.
0 Pages created in
the cache.
3233 Pages read into the
cache.
0 Pages written
from the cache to the backing file.
Pool File:
id2entry.bdb
16384 Page
size.
0 Requested pages
mapped into the process' address space.
125990 Requested
pages found in the cache (97%).
3888 Requested pages not found in
the cache.
0 Pages created in
the cache.
3888 Pages read into the
cache.
0 Pages written
from the cache to the backing file.
##########################
Thanks!
-Michael