Hello list,
Solaris 10u8 db-4.8.30.NC openldap-2.4.23
So in the last week, simple queries have been really slow. For example:
# time /usr/local/bin/ldapsearch -h 0 -D cn=admin,dc=company,dc=com -x -w pass -b ou=mail,dc=company,dc=com mail=testlookup@domain.com 1m21s
Of course "mail" in indexed. The server is dedicated to LDAP (test server) so there are no other connections. Not even syncrepl connection.
I have checked the usual things, memory usage, BDB cache, and disk IO. All seem to be within reasonable limits. Then I went deeper.
I started trussing it, and it seems to send the query reply nearly right away, then sit around 200 seconds before we send a few more bytes and exit.
I tried out db_stat on various db, and found it is also really slow on stat on "id2entry.bdb". For example:
# time /usr/local/BerkeleyDB.4.8/bin/db_stat -d id2entry.bdb 2m32s
So, trussing that call (including all calls in libdb-4.2):
19700/1@1: 0.0001 -> libdb-4.8:__db_stat_print_pp(0x419920, 0x0, 0x412c30, 0x4146a0, 0xfffffd7fffdff880, 0xfffffd7fffdff824) 19700/1: 0.0009 time() = 128972 5450 19700/1: 0.0001 brk(0x0041EC20) = 0 19700/1: 0.0001 brk(0x00422C20) = 0 19700/1: 0.0001 open("/usr/share/lib/zoneinfo/Japan", O_RDONLY) = 4 19700/1: 0.0000 fstat(4, 0xFFFFFD7FFFDFF880) = 0 19700/1: 0.0001 read(4, 0xFFFFFD7FFEEA0880, 125) = 125 19700/1: T Z i f\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\003\0\0\003\0\0\0 \0 19700/1: \0\0\0\t\0\0\003\0\0\0\fC3CE85 pD7 >1E90D7EC1680D8F91690D9CBF8 80 19700/1: DB071D10DBABDA80DCE6FF10DD8BBC80020102010201020102\0\0 ~90\0\0 \0 19700/1: \08CA00104\0\0 ~90\0\b C J T\0 J D T\0 J S T\0\0\0\0\0\0\0 19700/1: 0.0000 close(4) = 0 19700/1: 0.0002 ioctl(1, TCGETA, 0xFFFFFD7FFFDFE8B0) = 0 19700/1: 0.0000 fstat(1, 0xFFFFFD7FFFDFE840) = 0 19700/1: 0.0001 write(1, 0xFFFFFD7FFF006764, 36) = 36 19700/1: S u n N o v 1 4 1 8 : 0 4 : 1 0 2 0 1 0\t L o c a l t 19700/1: i m e\n 19700/1: 589.8794 write(1, 0xFFFFFD7FFF006764, 25) = 25 19700/1: 5 3 1 6 2\t B t r e e m a g i c n u m b e r\n 19700/1: 0.0046 write(1, 0xFFFFFD7FFF006764, 23) = 23
Not entirely sure why a write (everything leading up) to stdout takes 589 seconds.
I have tried rebuilding the whole database with slapadd but made no difference. I put all DB files in /tmp on x4540 (has 32G RAM) and yet, still 2mins to do db_stat.
General db_stat -m output:
4GB Total cache size 8 Number of caches 8 Maximum number of caches 512MB Pool individual cache size 0 Maximum memory-mapped file size 0 Maximum open file descriptors 0 Maximum sequential buffer writes 0 Sleep after writing maximum sequential buffers 0 Requested pages mapped into the process' address space 407M Requested pages found in the cache (99%) 38 Requested pages not found in the cache 425405 Pages created in the cache 38 Pages read into the cache 425427 Pages written from the cache to the backing file 0 Clean pages forced from the cache 0 Dirty pages forced from the cache 0 Dirty pages written by trickle-sync thread 425427 Current total page count 425427 Current clean page count 0 Current dirty page count 524296 Number of hash buckets used for page location 4096 Assumed page size used 407M Total number of times hash chains searched for a page (407904893) 32 The longest hash chain searched for a page 466M Total number of hash chain entries checked for page (466861223) 0 The number of hash bucket locks that required waiting (0%) 0 The maximum number of times any hash bucket lock was waited for (0%) 0 The number of region locks that required waiting (0%) 0 The number of buffers frozen 0 The number of buffers thawed 0 The number of frozen buffers freed 425538 The number of page allocations 0 The number of hash buckets examined during allocations 0 The maximum number of hash buckets examined for an allocation 0 The number of pages examined during allocations 0 The max number of pages examined for an allocation 0 Threads waited on page I/O 0 The number of times a sync is interrupted
Pool File: id2entry.bdb 16384 Page size 0 Requested pages mapped into the process' address space 13M Requested pages found in the cache (99%) 2 Requested pages not found in the cache 163449 Pages created in the cache 2 Pages read into the cache 163451 Pages written from the cache to the backing file
Pool File: mail.bdb 4096 Page size 0 Requested pages mapped into the process' address space 2797594 Requested pages found in the cache (99%) 2 Requested pages not found in the cache 3335 Pages created in the cache 2 Pages read into the cache 3337 Pages written from the cache to the backing file
-rw-r--r-- 1 root root 168 Nov 14 12:54 DB_CONFIG -rw------- 1 root root 3.8M Nov 14 14:03 DNSData.bdb -rw------- 1 root root 5.4M Nov 14 14:03 DNSHostName.bdb -rw------- 1 root root 3.3M Nov 14 14:03 DNSIPAddr.bdb -rw------- 1 root root 2.7M Nov 14 14:03 DNSType.bdb -rw------- 1 root root 24K Nov 14 12:54 __db.001 -rw------- 1 root root 180M Nov 14 18:28 __db.002 -rw------- 1 root root 512M Nov 14 13:45 __db.003 -rw------- 1 root root 512M Nov 14 14:35 __db.004 -rw------- 1 root root 512M Nov 14 14:40 __db.005 -rw------- 1 root root 512M Nov 14 13:45 __db.006 -rw------- 1 root root 512M Nov 14 13:45 __db.007 -rw------- 1 root root 512M Nov 14 13:45 __db.008 -rw------- 1 root root 512M Nov 14 14:35 __db.009 -rw------- 1 root root 512M Nov 14 14:27 __db.010 -rw------- 1 root root 160K Nov 14 12:54 __db.011 -rw------- 1 root root 1.2M Nov 14 18:25 __db.012 -rw------- 1 root root 32K Nov 14 12:54 __db.013 -rw------- 1 root root 2.1M Nov 14 14:03 accountStatus.bdb -rw-r--r-- 1 root root 2.0K Nov 14 16:12 alock -rw------- 1 root root 536K Nov 14 14:03 deliveryMode.bdb -rw------- 1 root root 621M Nov 14 14:07 dn2id.bdb -rw------- 1 root root 172M Nov 14 14:03 entryCSN.bdb -rw------- 1 root root 125M Nov 14 14:03 entryUUID.bdb -rw------- 1 root root 17M Nov 14 14:03 gecos.bdb -rw------- 1 root root 1.8M Nov 14 14:03 gidNumber.bdb -rw------- 1 root root 2.5G Nov 14 14:40 id2entry.bdb -rw------- 1 root root 50M Nov 14 16:12 log.0000000266 -rw------- 1 root root 13M Nov 14 14:03 mail.bdb -rw------- 1 root root 3.2M Nov 14 14:03 mailAlternateAddress.bdb -rw------- 1 root root 8.7M Nov 14 14:03 o.bdb -rw------- 1 root root 4.9M Nov 14 14:03 objectClass.bdb -rw------- 1 root root 1.3M Nov 14 14:03 radiusGroupName.bdb -rw------- 1 root root 20M Nov 14 14:03 uid.bdb -rw------- 1 root root 17M Nov 14 14:03 uidNumber.bdb
checkpoint 128 15 cachesize 5000 idlcachesize 15000
overlay syncprov
syncprov-checkpoint 100 10 syncprov-sessionlog 1000
# DB_CONFIG FILE. Warning, these entries are only written if the DB_CONFIG # file DOES NOT EXIST. So if you change any of these values, you will need # to remove the file, then restart slapd. dbconfig set_lk_detect DB_LOCK_DEFAULT dbconfig set_lg_max 52428800 dbconfig set_cachesize 4 0 8 dbconfig set_flags db_log_autoremove dbconfig set_lk_max_objects 1500 dbconfig set_lk_max_locks 1500 dbconfig set_lk_max_lockers 1500 # DB_CONFIG FILE.
Any suggestions? Must have hit some limit somewhere I guess.
Lund