Hello list,
Solaris 10u8 db-4.8.30.NC openldap-2.4.23
So in the last week, simple queries have been really slow. For example:
# time /usr/local/bin/ldapsearch -h 0 -D cn=admin,dc=company,dc=com -x -w pass -b ou=mail,dc=company,dc=com mail=testlookup@domain.com 1m21s
Of course "mail" in indexed. The server is dedicated to LDAP (test server) so there are no other connections. Not even syncrepl connection.
I have checked the usual things, memory usage, BDB cache, and disk IO. All seem to be within reasonable limits. Then I went deeper.
I started trussing it, and it seems to send the query reply nearly right away, then sit around 200 seconds before we send a few more bytes and exit.
I tried out db_stat on various db, and found it is also really slow on stat on "id2entry.bdb". For example:
# time /usr/local/BerkeleyDB.4.8/bin/db_stat -d id2entry.bdb 2m32s
So, trussing that call (including all calls in libdb-4.2):
19700/1@1: 0.0001 -> libdb-4.8:__db_stat_print_pp(0x419920, 0x0, 0x412c30, 0x4146a0, 0xfffffd7fffdff880, 0xfffffd7fffdff824) 19700/1: 0.0009 time() = 128972 5450 19700/1: 0.0001 brk(0x0041EC20) = 0 19700/1: 0.0001 brk(0x00422C20) = 0 19700/1: 0.0001 open("/usr/share/lib/zoneinfo/Japan", O_RDONLY) = 4 19700/1: 0.0000 fstat(4, 0xFFFFFD7FFFDFF880) = 0 19700/1: 0.0001 read(4, 0xFFFFFD7FFEEA0880, 125) = 125 19700/1: T Z i f\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\003\0\0\003\0\0\0 \0 19700/1: \0\0\0\t\0\0\003\0\0\0\fC3CE85 pD7 >1E90D7EC1680D8F91690D9CBF8 80 19700/1: DB071D10DBABDA80DCE6FF10DD8BBC80020102010201020102\0\0 ~90\0\0 \0 19700/1: \08CA00104\0\0 ~90\0\b C J T\0 J D T\0 J S T\0\0\0\0\0\0\0 19700/1: 0.0000 close(4) = 0 19700/1: 0.0002 ioctl(1, TCGETA, 0xFFFFFD7FFFDFE8B0) = 0 19700/1: 0.0000 fstat(1, 0xFFFFFD7FFFDFE840) = 0 19700/1: 0.0001 write(1, 0xFFFFFD7FFF006764, 36) = 36 19700/1: S u n N o v 1 4 1 8 : 0 4 : 1 0 2 0 1 0\t L o c a l t 19700/1: i m e\n 19700/1: 589.8794 write(1, 0xFFFFFD7FFF006764, 25) = 25 19700/1: 5 3 1 6 2\t B t r e e m a g i c n u m b e r\n 19700/1: 0.0046 write(1, 0xFFFFFD7FFF006764, 23) = 23
Not entirely sure why a write (everything leading up) to stdout takes 589 seconds.
I have tried rebuilding the whole database with slapadd but made no difference. I put all DB files in /tmp on x4540 (has 32G RAM) and yet, still 2mins to do db_stat.
General db_stat -m output:
4GB Total cache size 8 Number of caches 8 Maximum number of caches 512MB Pool individual cache size 0 Maximum memory-mapped file size 0 Maximum open file descriptors 0 Maximum sequential buffer writes 0 Sleep after writing maximum sequential buffers 0 Requested pages mapped into the process' address space 407M Requested pages found in the cache (99%) 38 Requested pages not found in the cache 425405 Pages created in the cache 38 Pages read into the cache 425427 Pages written from the cache to the backing file 0 Clean pages forced from the cache 0 Dirty pages forced from the cache 0 Dirty pages written by trickle-sync thread 425427 Current total page count 425427 Current clean page count 0 Current dirty page count 524296 Number of hash buckets used for page location 4096 Assumed page size used 407M Total number of times hash chains searched for a page (407904893) 32 The longest hash chain searched for a page 466M Total number of hash chain entries checked for page (466861223) 0 The number of hash bucket locks that required waiting (0%) 0 The maximum number of times any hash bucket lock was waited for (0%) 0 The number of region locks that required waiting (0%) 0 The number of buffers frozen 0 The number of buffers thawed 0 The number of frozen buffers freed 425538 The number of page allocations 0 The number of hash buckets examined during allocations 0 The maximum number of hash buckets examined for an allocation 0 The number of pages examined during allocations 0 The max number of pages examined for an allocation 0 Threads waited on page I/O 0 The number of times a sync is interrupted
Pool File: id2entry.bdb 16384 Page size 0 Requested pages mapped into the process' address space 13M Requested pages found in the cache (99%) 2 Requested pages not found in the cache 163449 Pages created in the cache 2 Pages read into the cache 163451 Pages written from the cache to the backing file
Pool File: mail.bdb 4096 Page size 0 Requested pages mapped into the process' address space 2797594 Requested pages found in the cache (99%) 2 Requested pages not found in the cache 3335 Pages created in the cache 2 Pages read into the cache 3337 Pages written from the cache to the backing file
-rw-r--r-- 1 root root 168 Nov 14 12:54 DB_CONFIG -rw------- 1 root root 3.8M Nov 14 14:03 DNSData.bdb -rw------- 1 root root 5.4M Nov 14 14:03 DNSHostName.bdb -rw------- 1 root root 3.3M Nov 14 14:03 DNSIPAddr.bdb -rw------- 1 root root 2.7M Nov 14 14:03 DNSType.bdb -rw------- 1 root root 24K Nov 14 12:54 __db.001 -rw------- 1 root root 180M Nov 14 18:28 __db.002 -rw------- 1 root root 512M Nov 14 13:45 __db.003 -rw------- 1 root root 512M Nov 14 14:35 __db.004 -rw------- 1 root root 512M Nov 14 14:40 __db.005 -rw------- 1 root root 512M Nov 14 13:45 __db.006 -rw------- 1 root root 512M Nov 14 13:45 __db.007 -rw------- 1 root root 512M Nov 14 13:45 __db.008 -rw------- 1 root root 512M Nov 14 14:35 __db.009 -rw------- 1 root root 512M Nov 14 14:27 __db.010 -rw------- 1 root root 160K Nov 14 12:54 __db.011 -rw------- 1 root root 1.2M Nov 14 18:25 __db.012 -rw------- 1 root root 32K Nov 14 12:54 __db.013 -rw------- 1 root root 2.1M Nov 14 14:03 accountStatus.bdb -rw-r--r-- 1 root root 2.0K Nov 14 16:12 alock -rw------- 1 root root 536K Nov 14 14:03 deliveryMode.bdb -rw------- 1 root root 621M Nov 14 14:07 dn2id.bdb -rw------- 1 root root 172M Nov 14 14:03 entryCSN.bdb -rw------- 1 root root 125M Nov 14 14:03 entryUUID.bdb -rw------- 1 root root 17M Nov 14 14:03 gecos.bdb -rw------- 1 root root 1.8M Nov 14 14:03 gidNumber.bdb -rw------- 1 root root 2.5G Nov 14 14:40 id2entry.bdb -rw------- 1 root root 50M Nov 14 16:12 log.0000000266 -rw------- 1 root root 13M Nov 14 14:03 mail.bdb -rw------- 1 root root 3.2M Nov 14 14:03 mailAlternateAddress.bdb -rw------- 1 root root 8.7M Nov 14 14:03 o.bdb -rw------- 1 root root 4.9M Nov 14 14:03 objectClass.bdb -rw------- 1 root root 1.3M Nov 14 14:03 radiusGroupName.bdb -rw------- 1 root root 20M Nov 14 14:03 uid.bdb -rw------- 1 root root 17M Nov 14 14:03 uidNumber.bdb
checkpoint 128 15 cachesize 5000 idlcachesize 15000
overlay syncprov
syncprov-checkpoint 100 10 syncprov-sessionlog 1000
# DB_CONFIG FILE. Warning, these entries are only written if the DB_CONFIG # file DOES NOT EXIST. So if you change any of these values, you will need # to remove the file, then restart slapd. dbconfig set_lk_detect DB_LOCK_DEFAULT dbconfig set_lg_max 52428800 dbconfig set_cachesize 4 0 8 dbconfig set_flags db_log_autoremove dbconfig set_lk_max_objects 1500 dbconfig set_lk_max_locks 1500 dbconfig set_lk_max_lockers 1500 # DB_CONFIG FILE.
Any suggestions? Must have hit some limit somewhere I guess.
Lund
--On Sunday, November 14, 2010 7:13 PM +0900 Jorgen Lundman lundman@lundman.net wrote:
dbconfig set_cachesize 4 0 8
Why are you breaking your cache into segments? This has always had a negative performance impact in all tests I've done, and stopped being necessary to do with BDB 4.3 and later.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
No real reason, tried various different settings but to no real advantage.
Now I have:
Filesystem size used avail capacity Mounted on swap 19G 7.7G 11G 42% /tmp
# grep cache DB_CONFIG set_cachesize 8 0 1
# time /usr/local/BerkeleyDB.4.8/bin/db_stat -d id2entry.bdb real 6m6.099s
# time cp id2entry.bdb /dev/null real 0m0.040s (It's not on disk)
I thought to delete id2entry.bdb, and use slapindex to re-generate it but that appears not to be a supported feature. slapindex can not run without a valid id2entry.bdb. This is why I tried slapcat, rm *, slapadd. But no difference in speed up.
If I truss with -u *:* (All inter-library calls) I get no "single" large system call, just a lot of work somewhere (that does not call read/write etc). Alas, the number of lines in truss file is: 6209933 /var/tmp/db_stat_truss
Lund
Quanah Gibson-Mount wrote:
--On Sunday, November 14, 2010 7:13 PM +0900 Jorgen Lundman lundman@lundman.net wrote:
dbconfig set_cachesize 4 0 8
Why are you breaking your cache into segments? This has always had a negative performance impact in all tests I've done, and stopped being necessary to do with BDB 4.3 and later.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc
Zimbra :: the leader in open source messaging and collaboration
Jorgen Lundman wrote:
No real reason, tried various different settings but to no real advantage.
Now I have:
Filesystem size used avail capacity Mounted on swap 19G 7.7G 11G 42% /tmp
# grep cache DB_CONFIG set_cachesize 8 0 1
# time /usr/local/BerkeleyDB.4.8/bin/db_stat -d id2entry.bdb real 6m6.099s
# time cp id2entry.bdb /dev/null real 0m0.040s (It's not on disk)
I thought to delete id2entry.bdb, and use slapindex to re-generate it but that appears not to be a supported feature. slapindex can not run without a valid id2entry.bdb. This is why I tried slapcat, rm *, slapadd. But no difference in speed up.
If I truss with -u *:* (All inter-library calls) I get no "single" large system call, just a lot of work somewhere (that does not call read/write etc). Alas, the number of lines in truss file is: 6209933 /var/tmp/db_stat_truss
truss is pretty much useless in this context. Most of BDB's activity is thru memory-mapping, which involves no system calls for truss to trace. You need an actual profile (e.g. using oprofile) to identify where the time is going.
Jorgen Lundman wrote:
No real reason, tried various different settings but to no real advantage.
Now I have:
Filesystem size used avail capacity Mounted on swap 19G 7.7G 11G 42% /tmp
# grep cache DB_CONFIG set_cachesize 8 0 1
# time /usr/local/BerkeleyDB.4.8/bin/db_stat -d id2entry.bdb real 6m6.099s
# time cp id2entry.bdb /dev/null real 0m0.040s (It's not on disk)
Now repeat the db_stat call and see how long it takes the 2nd time.
truss is pretty much useless in this context. Most of BDB's activity is thru memory-mapping, which involves no system calls for truss to trace. You need an actual profile (e.g. using oprofile) to identify where the time is going.
This is very true. But reach for the tools you have, even if it is a hammer. I guess Purify would be the Solaris equivalent, unless we find the problem also occurs on a Linux box.
truss has a simple profiler, but only for System Calls which do not help in this case:
Library: Function calls libaio: close 16 libc: membar_exit 1633814 libc: thr_self 653455 libc: _lock_clear 326545 libc: _lock_try 326545 libc: memcpy 163311 libc: memcmp 242 libc: strcasecmp 195 libc: free 77
sys totals: .921 303 22 usr time: 21.669 elapsed: 829.890
Now repeat the db_stat call and see how long it takes the 2nd time.
It does indeed speed up, if I do not wait too long between tests.
real 1m27.712s real 0m29.696s real 0m4.332s real 0m4.452s
4 seconds is much nicer. So what you are saying is that BDB uses mmap, and operations inside this memory will trigger reads inside the kernel which do not show as libc syscalls. Rats. So it may be IO? I need to throw even more memory at it, and live with the increasing startup times?
How does the "set_cachesize" relate to the mmap usage?
Jorgen Lundman wrote:
Now repeat the db_stat call and see how long it takes the 2nd time.
It does indeed speed up, if I do not wait too long between tests.
real 1m27.712s real 0m29.696s real 0m4.332s real 0m4.452s
If it slows down after you wait a while, that means some other process on the machine is using the system RAM and forcing the BDB data out of the system cache. Find out what other program is hogging the memory, it's obvious that BDB is not doing anything wrong on its own.
4 seconds is much nicer. So what you are saying is that BDB uses mmap, and operations inside this memory will trigger reads inside the kernel which do not show as libc syscalls. Rats. So it may be IO? I need to throw even more memory at it, and live with the increasing startup times?
How does the "set_cachesize" relate to the mmap usage?
Howard Chu wrote:
If it slows down after you wait a while, that means some other process on the machine is using the system RAM and forcing the BDB data out of the system cache. Find out what other program is hogging the memory, it's obvious that BDB is not doing anything wrong on its own.
If I db_stat another large file, like dn2id.bdb, the subsequent id2entry.bdb will be slower. So maybe it is fighting itself.
However, since I am executing separate "db_stat" processes each time, the setcachesize would have no chance to help improve things. I will have to try different values for slapd running.
Could be I should investigate various Solaris specific process limits as well. It is all 64bit now, but per process limits may still be interfering.
Jorgen, I suggest that you use dtrace to get a better understanding of what is going on.
You can start with some pre-existing documented scripts from the dtrace toolkit here:
http://hub.opensolaris.org/bin/view/Community+Group+dtrace/dtracetoolkit
The dtrace guide is here:
http://wikis.sun.com/display/DTrace/Documentation
There are many examples in the dtrace toolkit that should help sort out what other processes or system resources are affecting the ldap servers performance in your specific situation.
Doug.
On 11/14/10 07:42 PM, Jorgen Lundman wrote:
Howard Chu wrote:
If it slows down after you wait a while, that means some other process on the machine is using the system RAM and forcing the BDB data out of the system cache. Find out what other program is hogging the memory, it's obvious that BDB is not doing anything wrong on its own.
If I db_stat another large file, like dn2id.bdb, the subsequent id2entry.bdb will be slower. So maybe it is fighting itself.
However, since I am executing separate "db_stat" processes each time, the setcachesize would have no chance to help improve things. I will have to try different values for slapd running.
Could be I should investigate various Solaris specific process limits as well. It is all 64bit now, but per process limits may still be interfering.
dtrace is a fantastic tool but it is also hard to use :) But if I use the existing scripts from the dtracetoolkit, like that of iotop:
UID PID PPID CMD DEVICE MAJ MIN D BYTES 0 15099 1 slapd sd0 32 0 R 16384 0 3 0 fsflush sd0 32 0 R 98304 0 3 0 fsflush sd2 32 128 R 98304 0 0 0 sched sd0 32 0 W 36930560 0 0 0 sched sd2 32 128 W 39273472
would suggest that it is indeed paging memory around to make room.
I do also wonder how it plays with ZFS and ARC for memory, if I should make some changing with that in mind as well. I have tried normal recordsize and 16k, and compression on and off. Not that it made any difference in this case.
Thanks for the suggestions, I will keep monitoring and see if I can else I can discover. I have requested to increase the cachesize to 8G on one of the slaves, which I can probably do tomorrow at 2am.
Lund
Doug Leavitt wrote:
Jorgen, I suggest that you use dtrace to get a better understanding of what is going on.
You can start with some pre-existing documented scripts from the dtrace toolkit here:
http://hub.opensolaris.org/bin/view/Community+Group+dtrace/dtracetoolkit
The dtrace guide is here:
http://wikis.sun.com/display/DTrace/Documentation
There are many examples in the dtrace toolkit that should help sort out what other processes or system resources are affecting the ldap servers performance in your specific situation.
Doug.
On 11/14/10 07:42 PM, Jorgen Lundman wrote:
Howard Chu wrote:
If it slows down after you wait a while, that means some other process on the machine is using the system RAM and forcing the BDB data out of the system cache. Find out what other program is hogging the memory, it's obvious that BDB is not doing anything wrong on its own.
If I db_stat another large file, like dn2id.bdb, the subsequent id2entry.bdb will be slower. So maybe it is fighting itself.
However, since I am executing separate "db_stat" processes each time, the setcachesize would have no chance to help improve things. I will have to try different values for slapd running.
Could be I should investigate various Solaris specific process limits as well. It is all 64bit now, but per process limits may still be interfering.
openldap-technical@openldap.org