Hello list,
Solaris 10u8
db-4.8.30.NC
openldap-2.4.23
So in the last week, simple queries have been really slow. For example:
# time /usr/local/bin/ldapsearch -h 0 -D cn=admin,dc=company,dc=com -x -w pass
-b ou=mail,dc=company,dc=com mail=testlookup(a)domain.com
1m21s
Of course "mail" in indexed. The server is dedicated to LDAP (test server) so
there are no other connections. Not even syncrepl connection.
I have checked the usual things, memory usage, BDB cache, and disk IO. All seem
to be within reasonable limits. Then I went deeper.
I started trussing it, and it seems to send the query reply nearly right away,
then sit around 200 seconds before we send a few more bytes and exit.
I tried out db_stat on various db, and found it is also really slow on stat on
"id2entry.bdb". For example:
# time /usr/local/BerkeleyDB.4.8/bin/db_stat -d id2entry.bdb
2m32s
So, trussing that call (including all calls in libdb-4.2):
19700/1@1: 0.0001 -> libdb-4.8:__db_stat_print_pp(0x419920, 0x0, 0x412c30,
0x4146a0, 0xfffffd7fffdff880, 0xfffffd7fffdff824)
19700/1: 0.0009 time() = 128972
5450
19700/1: 0.0001 brk(0x0041EC20) = 0
19700/1: 0.0001 brk(0x00422C20) = 0
19700/1: 0.0001 open("/usr/share/lib/zoneinfo/Japan", O_RDONLY) = 4
19700/1: 0.0000 fstat(4, 0xFFFFFD7FFFDFF880) = 0
19700/1: 0.0001 read(4, 0xFFFFFD7FFEEA0880, 125) = 125
19700/1: T Z i f\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\003\0\0\003\0\0\0
\0
19700/1: \0\0\0\t\0\0\003\0\0\0\fC3CE85 pD7 >1E90D7EC1680D8F91690D9CBF8
80
19700/1: DB071D10DBABDA80DCE6FF10DD8BBC80020102010201020102\0\0 ~90\0\0
\0
19700/1: \08CA00104\0\0 ~90\0\b C J T\0 J D T\0 J S T\0\0\0\0\0\0\0
19700/1: 0.0000 close(4) = 0
19700/1: 0.0002 ioctl(1, TCGETA, 0xFFFFFD7FFFDFE8B0) = 0
19700/1: 0.0000 fstat(1, 0xFFFFFD7FFFDFE840) = 0
19700/1: 0.0001 write(1, 0xFFFFFD7FFF006764, 36) = 36
19700/1: S u n N o v 1 4 1 8 : 0 4 : 1 0 2 0 1 0\t L o c a l
t
19700/1: i m e\n
19700/1: 589.8794 write(1, 0xFFFFFD7FFF006764, 25)
= 25
19700/1: 5 3 1 6 2\t B t r e e m a g i c n u m b e r\n
19700/1: 0.0046 write(1, 0xFFFFFD7FFF006764, 23) = 23
Not entirely sure why a write (everything leading up) to stdout takes 589 seconds.
I have tried rebuilding the whole database with slapadd but made no difference.
I put all DB files in /tmp on x4540 (has 32G RAM) and yet, still 2mins to do
db_stat.
General db_stat -m output:
4GB Total cache size
8 Number of caches
8 Maximum number of caches
512MB Pool individual cache size
0 Maximum memory-mapped file size
0 Maximum open file descriptors
0 Maximum sequential buffer writes
0 Sleep after writing maximum sequential buffers
0 Requested pages mapped into the process' address space
407M Requested pages found in the cache (99%)
38 Requested pages not found in the cache
425405 Pages created in the cache
38 Pages read into the cache
425427 Pages written from the cache to the backing file
0 Clean pages forced from the cache
0 Dirty pages forced from the cache
0 Dirty pages written by trickle-sync thread
425427 Current total page count
425427 Current clean page count
0 Current dirty page count
524296 Number of hash buckets used for page location
4096 Assumed page size used
407M Total number of times hash chains searched for a page (407904893)
32 The longest hash chain searched for a page
466M Total number of hash chain entries checked for page (466861223)
0 The number of hash bucket locks that required waiting (0%)
0 The maximum number of times any hash bucket lock was waited for (0%)
0 The number of region locks that required waiting (0%)
0 The number of buffers frozen
0 The number of buffers thawed
0 The number of frozen buffers freed
425538 The number of page allocations
0 The number of hash buckets examined during allocations
0 The maximum number of hash buckets examined for an allocation
0 The number of pages examined during allocations
0 The max number of pages examined for an allocation
0 Threads waited on page I/O
0 The number of times a sync is interrupted
Pool File: id2entry.bdb
16384 Page size
0 Requested pages mapped into the process' address space
13M Requested pages found in the cache (99%)
2 Requested pages not found in the cache
163449 Pages created in the cache
2 Pages read into the cache
163451 Pages written from the cache to the backing file
Pool File: mail.bdb
4096 Page size
0 Requested pages mapped into the process' address space
2797594 Requested pages found in the cache (99%)
2 Requested pages not found in the cache
3335 Pages created in the cache
2 Pages read into the cache
3337 Pages written from the cache to the backing file
-rw-r--r-- 1 root root 168 Nov 14 12:54 DB_CONFIG
-rw------- 1 root root 3.8M Nov 14 14:03 DNSData.bdb
-rw------- 1 root root 5.4M Nov 14 14:03 DNSHostName.bdb
-rw------- 1 root root 3.3M Nov 14 14:03 DNSIPAddr.bdb
-rw------- 1 root root 2.7M Nov 14 14:03 DNSType.bdb
-rw------- 1 root root 24K Nov 14 12:54 __db.001
-rw------- 1 root root 180M Nov 14 18:28 __db.002
-rw------- 1 root root 512M Nov 14 13:45 __db.003
-rw------- 1 root root 512M Nov 14 14:35 __db.004
-rw------- 1 root root 512M Nov 14 14:40 __db.005
-rw------- 1 root root 512M Nov 14 13:45 __db.006
-rw------- 1 root root 512M Nov 14 13:45 __db.007
-rw------- 1 root root 512M Nov 14 13:45 __db.008
-rw------- 1 root root 512M Nov 14 14:35 __db.009
-rw------- 1 root root 512M Nov 14 14:27 __db.010
-rw------- 1 root root 160K Nov 14 12:54 __db.011
-rw------- 1 root root 1.2M Nov 14 18:25 __db.012
-rw------- 1 root root 32K Nov 14 12:54 __db.013
-rw------- 1 root root 2.1M Nov 14 14:03 accountStatus.bdb
-rw-r--r-- 1 root root 2.0K Nov 14 16:12 alock
-rw------- 1 root root 536K Nov 14 14:03 deliveryMode.bdb
-rw------- 1 root root 621M Nov 14 14:07 dn2id.bdb
-rw------- 1 root root 172M Nov 14 14:03 entryCSN.bdb
-rw------- 1 root root 125M Nov 14 14:03 entryUUID.bdb
-rw------- 1 root root 17M Nov 14 14:03 gecos.bdb
-rw------- 1 root root 1.8M Nov 14 14:03 gidNumber.bdb
-rw------- 1 root root 2.5G Nov 14 14:40 id2entry.bdb
-rw------- 1 root root 50M Nov 14 16:12 log.0000000266
-rw------- 1 root root 13M Nov 14 14:03 mail.bdb
-rw------- 1 root root 3.2M Nov 14 14:03 mailAlternateAddress.bdb
-rw------- 1 root root 8.7M Nov 14 14:03 o.bdb
-rw------- 1 root root 4.9M Nov 14 14:03 objectClass.bdb
-rw------- 1 root root 1.3M Nov 14 14:03 radiusGroupName.bdb
-rw------- 1 root root 20M Nov 14 14:03 uid.bdb
-rw------- 1 root root 17M Nov 14 14:03 uidNumber.bdb
checkpoint 128 15
cachesize 5000
idlcachesize 15000
overlay syncprov
syncprov-checkpoint 100 10
syncprov-sessionlog 1000
# DB_CONFIG FILE. Warning, these entries are only written if the DB_CONFIG
# file DOES NOT EXIST. So if you change any of these values, you will need
# to remove the file, then restart slapd.
dbconfig set_lk_detect DB_LOCK_DEFAULT
dbconfig set_lg_max 52428800
dbconfig set_cachesize 4 0 8
dbconfig set_flags db_log_autoremove
dbconfig set_lk_max_objects 1500
dbconfig set_lk_max_locks 1500
dbconfig set_lk_max_lockers 1500
# DB_CONFIG FILE.
Any suggestions? Must have hit some limit somewhere I guess.
Lund