I'm running a pretty large slapd 2.3.23 (3G-5G or so) and shutdown is taking anywhere from twenty minutes to an hour. (I'm still waiting for a shutdown and hoping for an hour)
I was just wondering what exactly was happening. I suspect that it's something in bdb_db_close(), but I'm not sure what's really going on since truss is just sitting around and prstat reports the pid in sleep.
My guess is that it's walking the entire cache and free-ing it, but I would expect the memory to change or the truss to return free's in that case. Any ideas? My last log messages are: daemon: shutdown requested and initiated. slapd shutdown: waiting for 0 threads to terminate
Thanks, _Matt
matthew sporleder wrote:
I'm running a pretty large slapd 2.3.23 (3G-5G or so) and shutdown is taking anywhere from twenty minutes to an hour. (I'm still waiting for a shutdown and hoping for an hour)
I was just wondering what exactly was happening. I suspect that it's something in bdb_db_close(), but I'm not sure what's really going on since truss is just sitting around and prstat reports the pid in sleep.
My guess is that it's walking the entire cache and free-ing it, but I would expect the memory to change or the truss to return free's in that case. Any ideas? My last log messages are: daemon: shutdown requested and initiated. slapd shutdown: waiting for 0 threads to terminate
Attach to it with gdb and get a stack trace.
On 1/25/07, Howard Chu hyc@symas.com wrote:
matthew sporleder wrote:
I'm running a pretty large slapd 2.3.23 (3G-5G or so) and shutdown is taking anywhere from twenty minutes to an hour. (I'm still waiting for a shutdown and hoping for an hour)
I was just wondering what exactly was happening. I suspect that it's something in bdb_db_close(), but I'm not sure what's really going on since truss is just sitting around and prstat reports the pid in sleep.
My guess is that it's walking the entire cache and free-ing it, but I would expect the memory to change or the truss to return free's in that case. Any ideas? My last log messages are: daemon: shutdown requested and initiated. slapd shutdown: waiting for 0 threads to terminate
Attach to it with gdb and get a stack trace.
I didn't get this reply in time. It finally shutdown cleanly. (~ 2 hours) Maybe it was scanning through the dn2id in o=people: 6.9G id2entry.bdb 1.4G dn2id.bdb
Here's the truss output for when it seemed to wake up and do stuff: stat("/ldap//ldap/bkp-replica/slapd-bdb4.2.52/o=people/id2entry.bdb", 0xFFFFFFFF7FFFEE40) = 0 open("/ldap/bkp-replica/slapd-bdb4.2.52/o=people/id2entry.bdb", O_RDWR) = 6 fcntl(6, F_SETFD, 0x00000001) = 0 fdsync(6, FSYNC) = 0 close(6) = 0 time() = 1169715679 lseek(25, 5596525, SEEK_SET) = 5596525 write(25, "\0 R a14\0\0\0 6R9E0BV N".., 52) = 52 fdsync(25, FSYNC) = 0 time() = 1169715679 munmap(0xFFFFFFFF7B900000, 24576) = 0 munmap(0xFFFFFFFF78C00000, 3145728) = 0 close(25) = 0 munmap(0xFFFFFFFF7BB00000, 606208) = 0 munmap(0xFFFFFFFF79000000, 37027840) = 0 munmap(0xFFFFFFFF7BC00000, 8192) = 0 lseek(7, 0, SEEK_SET) = 0 fcntl(7, F_SETLKW, 0xFFFFFFFF7FFFF0B0) = 0 lseek(7, 1024, SEEK_SET) = 1024 read(7, " x V 412\0\0\0\002\0\0\0".., 1024) = 1024 lseek(7, 1024, SEEK_SET) = 1024 write(7, " x V 412\0\0\0\0\0\0\0\0".., 1024) = 1024 lseek(7, 1024, SEEK_SET) = 1024 fcntl(7, F_SETLK, 0xFFFFFFFF7FFFF0B0) = 0 lseek(7, 0, SEEK_SET) = 0 fcntl(7, F_SETLK, 0xFFFFFFFF7FFFF0B0) = 0 close(7) = 0 fdsync(31, FSYNC) = 0 fdsync(31, FSYNC) = 0 close(31) = 0 stat("/ldap/bkp-replica/slapd-bdb4.2.52/ou=files/objectClass.bdb", 0xFFFFFFFF7FFFED40) = 0 open("/ldap/bkp-replica/slapd-bdb4.2.52/ou=files/objectClass.bdb", O_RDWR) = 6 fcntl(6, F_SETFD, 0x00000001) = 0 fdsync(6, FSYNC) = 0 close(6) = 0 fdsync(30, FSYNC) = 0 fdsync(30, FSYNC) = 0 close(30) = 0 stat("/ldap/bkp-replica/slapd-bdb4.2.52/ou=files/uid.bdb", 0xFFFFFFFF7FFFED40) = 0 open("/ldap/bkp-replica/slapd-bdb4.2.52/ou=files/uid.bdb", O_RDWR) = 6 fcntl(6, F_SETFD, 0x00000001) = 0 fdsync(6, FSYNC) = 0 close(6) = 0 fdsync(12, FSYNC) = 0 fdsync(12, FSYNC) = 0 close(12) = 0 stat("/ldap/bkp-replica/slapd-bdb4.2.52/ou=files/dn2id.bdb", 0xFFFFFFFF7FFFED40) = 0 open("/ldap/bkp-replica/slapd-bdb4.2.52/ou=files/dn2id.bdb", O_RDWR) = 6 fcntl(6, F_SETFD, 0x00000001) = 0 fdsync(6, FSYNC) = 0 close(6) = 0 fdsync(11, FSYNC) = 0 fdsync(11, FSYNC) = 0 close(11) = 0 stat("/ldap/bkp-replica/slapd-bdb4.2.52/ou=files/id2entry.bdb", 0xFFFFFFFF7FFFEE40) = 0 open("/ldap/bkp-replica/slapd-bdb4.2.52/ou=files/id2entry.bdb", O_RDWR) = 6 fcntl(6, F_SETFD, 0x00000001) = 0 fdsync(6, FSYNC) = 0 close(6) = 0 time() = 1169715697 lseek(29, 5827086, SEEK_SET) = 5827086 write(29, "\0 YT9A6\0\0\0 \ 7BB 183".., 403) = 403 fdsync(29, FSYNC) = 0 time() = 1169715697 munmap(0xFFFFFFFF7B400000, 24576) = 0 munmap(0xFFFFFFFF78400000, 3145728) = 0 close(29) = 0 munmap(0xFFFFFFFF7B500000, 606208) = 0 munmap(0xFFFFFFFF78800000, 3645440) = 0 munmap(0xFFFFFFFF7B600000, 8192) = 0 lseek(10, 0, SEEK_SET) = 0 fcntl(10, F_SETLKW, 0xFFFFFFFF7FFFF0B0) = 0 lseek(10, 1024, SEEK_SET) = 1024 read(10, " x V 412\0\0\0\002\0\0\0".., 1024) = 1024 lseek(10, 1024, SEEK_SET) = 1024 write(10, " x V 412\0\0\0\0\0\0\0\0".., 1024) = 1024 lseek(10, 1024, SEEK_SET) = 1024 fcntl(10, F_SETLK, 0xFFFFFFFF7FFFF0B0) = 0 lseek(10, 0, SEEK_SET) = 0 fcntl(10, F_SETLK, 0xFFFFFFFF7FFFF0B0) = 0 close(10) = 0 fdsync(28, FSYNC) = 0 fdsync(28, FSYNC) = 0 close(28) = 0 stat("/ldap/bkp-replica/slapd-bdb4.2.52/ou=cred/uid.bdb", 0xFFFFFFFF7FFFED40) = 0 open("/ldap/bkp-replica/slapd-bdb4.2.52/ou=cred/uid.bdb", O_RDWR) = 6 fcntl(6, F_SETFD, 0x00000001) = 0 fdsync(6, FSYNC) = 0 close(6) = 0 fdsync(27, FSYNC) = 0 fdsync(27, FSYNC) = 0 close(27) = 0 stat("/ldap/bkp-replica/slapd-bdb4.2.52/ou=cred/objectClass.bdb", 0xFFFFFFFF7FFFED40) = 0 open("/ldap/bkp-replica/slapd-bdb4.2.52/ou=cred/objectClass.bdb", O_RDWR) = 6 fcntl(6, F_SETFD, 0x00000001) = 0 fdsync(6, FSYNC) = 0 close(6) = 0 fdsync(26, FSYNC) = 0 fdsync(26, FSYNC) = 0 close(26) = 0 stat("/ldap/bkp-replica/slapd-bdb4.2.52/ou=cred/allowedSeconds.bdb", 0xFFFFFFFF7FFFED40) = 0 open("/ldap/bkp-replica/slapd-bdb4.2.52/ou=cred/allowedSeconds.bdb", O_RDWR) = 6 fcntl(6, F_SETFD, 0x00000001) = 0 fdsync(6, FSYNC) = 0 close(6) = 0 fdsync(24, FSYNC) = 0 fdsync(24, FSYNC) = 0 close(24) = 0 stat("/ldap/bkp-replica/slapd-bdb4.2.52/ou=cred/usedSeconds.bdb", 0xFFFFFFFF7FFFED40) = 0 open("/ldap/bkp-replica/slapd-bdb4.2.52/ou=cred/usedSeconds.bdb", O_RDWR) = 6 fcntl(6, F_SETFD, 0x00000001) = 0 fdsync(6, FSYNC) = 0 close(6) = 0 fdsync(15, FSYNC) = 0 fdsync(15, FSYNC) = 0 close(15) = 0 stat("/ldap/bkp-replica/slapd-bdb4.2.52/ou=cred/dn2id.bdb", 0xFFFFFFFF7FFFED40) = 0 open("/ldap/bkp-replica/slapd-bdb4.2.52/ou=cred/dn2id.bdb", O_RDWR) = 6 fcntl(6, F_SETFD, 0x00000001) = 0 fdsync(6, FSYNC) = 0 close(6) = 0 fdsync(14, FSYNC) = 0 fdsync(14, FSYNC) = 0 close(14) = 0 stat("/ldap/bkp-replica/slapd-bdb4.2.52/ou=cred/id2entry.bdb", 0xFFFFFFFF7FFFEE40) = 0 open("/ldap/bkp-replica/slapd-bdb4.2.52/ou=cred/id2entry.bdb", O_RDWR) = 6 fcntl(6, F_SETFD, 0x00000001) = 0 fdsync(6, FSYNC) = 0 close(6) = 0 time() = 1169715759 lseek(21, 5051001, SEEK_SET) = 5051001 write(21, "\0 Q12 L\0\0\0 G q2E66 v".., 610) = 610 fdsync(21, FSYNC) = 0 time() = 1169715759 munmap(0xFFFFFFFF77C00000, 24576) = 0 munmap(0xFFFFFFFF77E00000, 3145728) = 0 close(21) = 0 munmap(0xFFFFFFFF77D00000, 606208) = 0 munmap(0xFFFFFFFF78200000, 524288) = 0 munmap(0xFFFFFFFF78300000, 8192) = 0 lseek(13, 0, SEEK_SET) = 0 fcntl(13, F_SETLKW, 0xFFFFFFFF7FFFF0B0) = 0 lseek(13, 1024, SEEK_SET) = 1024 read(13, " x V 412\0\0\0\002\0\0\0".., 1024) = 1024 lseek(13, 1024, SEEK_SET) = 1024 write(13, " x V 412\0\0\0\0\0\0\0\0".., 1024) = 1024 lseek(13, 1024, SEEK_SET) = 1024 fcntl(13, F_SETLK, 0xFFFFFFFF7FFFF0B0) = 0 lseek(13, 0, SEEK_SET) = 0 fcntl(13, F_SETLK, 0xFFFFFFFF7FFFF0B0) = 0 close(13) = 0 pwrite(18, "\0\0\0\0\0\0\001\0\0\0\0".., 4096, 0) = 4096 fdsync(18, FSYNC) = 0 fdsync(18, FSYNC) = 0 close(18) = 0 stat("/ldap/logs/bkp-replica/slapd-bdb4.2.52/dc=com/log.0000000001", 0xFFFFFFFF7FFFE820) = 0 open("/ldap/logs/bkp-replica/slapd-bdb4.2.52/dc=com/log.0000000001", O_RDWR|O_CREAT, 0600) = 6 fcntl(6, F_SETFD, 0x00000001) = 0 read(6, "\0\0\0\0\0\0\01C8D & K ~".., 28) = 28 lseek(6, 2735, SEEK_SET) = 2735 write(6, "\0\0\n66\0\0\0 Z ( oD2 C".., 86) = 86 fdsync(6, FSYNC) = 0 pwrite(17, "\0\0\0\0\0\0\001\0\0\0\0".., 16384, 0) = 16384 fdsync(17, FSYNC) = 0 fdsync(17, FSYNC) = 0 close(17) = 0 stat("/ldap/bkp-replica/slapd-bdb4.2.52/dc=com/objectClass.bdb", 0xFFFFFFFF7FFFED90) = 0 open("/ldap/bkp-replica/slapd-bdb4.2.52/dc=com/objectClass.bdb", O_RDWR) = 7 fcntl(7, F_SETFD, 0x00000001) = 0 fdsync(7, FSYNC) = 0 close(7) = 0 stat("/ldap/bkp-replica/slapd-bdb4.2.52/dc=com/dn2id.bdb", 0xFFFFFFFF7FFFED90) = 0 open("/ldap/bkp-replica/slapd-bdb4.2.52/dc=com/dn2id.bdb", O_RDWR) = 7 fcntl(7, F_SETFD, 0x00000001) = 0 fdsync(7, FSYNC) = 0 close(7) = 0 stat("/ldap/bkp-replica/slapd-bdb4.2.52/dc=com/id2entry.bdb", 0xFFFFFFFF7FFFED90) = 0 open("/ldap/bkp-replica/slapd-bdb4.2.52/dc=com/id2entry.bdb", O_RDWR) = 7 fcntl(7, F_SETFD, 0x00000001) = 0 fdsync(7, FSYNC) = 0 close(7) = 0 time() = 1169715759 lseek(6, 2821, SEEK_SET) = 2821 write(6, "\0\0\nBB\0\0\0 Y {D8 6 <".., 141) = 141 fdsync(6, FSYNC) = 0 time() = 1169715759 munmap(0xFFFFFFFF77700000, 24576) = 0 munmap(0xFFFFFFFF77900000, 98304) = 0 close(6) = 0 munmap(0xFFFFFFFF77800000, 606208) = 0 munmap(0xFFFFFFFF77A00000, 270336) = 0 munmap(0xFFFFFFFF77B00000, 8192) = 0 lseek(16, 0, SEEK_SET) = 0 fcntl(16, F_SETLKW, 0xFFFFFFFF7FFFF000) = 0 lseek(16, 1024, SEEK_SET) = 1024 read(16, " x V 412\0\0\0\002\0\0\0".., 1024) = 1024 lseek(16, 1024, SEEK_SET) = 1024 write(16, " x V 412\0\0\0\0\0\0\0\0".., 1024) = 1024 lseek(16, 1024, SEEK_SET) = 1024 fcntl(16, F_SETLK, 0xFFFFFFFF7FFFF000) = 0 lseek(16, 0, SEEK_SET) = 0 fcntl(16, F_SETLK, 0xFFFFFFFF7FFFF000) = 0 close(16) = 0 munmap(0xFFFFFFFF7B700000, 12302) = 0 munmap(0xFFFFFFFF7B804000, 1664) = 0 munmap(0xFFFFFFFF7C200000, 48446) = 0 munmap(0xFFFFFFFF7C30C000, 4176) = 0 munmap(0xFFFFFFFF7BF00000, 92881) = 0 munmap(0xFFFFFFFF7C018000, 36664) = 0 munmap(0xFFFFFFFF7BD00000, 27435) = 0 munmap(0xFFFFFFFF7BE08000, 8931) = 0 munmap(0xFFFFFFFF7C400000, 14822) = 0 munmap(0xFFFFFFFF7C504000, 2192) = 0 munmap(0xFFFFFFFF7D400000, 22014) = 0 munmap(0xFFFFFFFF7D506000, 2536) = 0 munmap(0xFFFFFFFF7D200000, 51247) = 0 munmap(0xFFFFFFFF7D30E000, 8852) = 0 fstat(3, 0xFFFFFFFF7FFFF190) = 0 time() = 1169715759 getpid() = 11142 [1] putmsg(3, 0xFFFFFFFF7FFFE840, 0xFFFFFFFF7FFFE830, 0) = 0 open("/var/run/syslog_door", O_RDONLY) = 6 door_info(6, 0xFFFFFFFF7FFFE728) = 0 getpid() = 11142 [1] door_call(6, 0xFFFFFFFF7FFFE6F8) = 0 close(6) = 0 fstat(3, 0xFFFFFFFF7FFFF400) = 0 close(3) = 0 shutdown(5, SHUT_RDWR, SOV_DEFAULT) Err#95 ENOTSOCK close(5) = 0 shutdown(4, SHUT_RDWR, SOV_DEFAULT) Err#95 ENOTSOCK close(4) = 0 unlink("/ldap/logs/bkp-replica/slapd.pid") = 0 unlink("/ldap/logs/bkp-replica/slapd.args") = 0 _exit(0)
On Thu, 25 Jan 2007, Howard Chu wrote:
matthew sporleder wrote:
I'm running a pretty large slapd 2.3.23 (3G-5G or so) and shutdown is taking anywhere from twenty minutes to an hour. (I'm still waiting for a shutdown and hoping for an hour)
I was just wondering what exactly was happening. I suspect that it's something in bdb_db_close(), but I'm not sure what's really going on since truss is just sitting around and prstat reports the pid in sleep.
My guess is that it's walking the entire cache and free-ing it, but I would expect the memory to change or the truss to return free's in that case. Any ideas? My last log messages are: daemon: shutdown requested and initiated. slapd shutdown: waiting for 0 threads to terminate
Attach to it with gdb and get a stack trace.
Always good advice, but given that heavy disk activity, do you have an appropriate checkpoint directive set?
On Jan 25, 2007, at 6:49 AM, Aaron Richton wrote:
On Thu, 25 Jan 2007, Howard Chu wrote:
matthew sporleder wrote:
I'm running a pretty large slapd 2.3.23 (3G-5G or so) and shutdown is taking anywhere from twenty minutes to an hour. (I'm still waiting for a shutdown and hoping for an hour) I was just wondering what exactly was happening. I suspect that it's something in bdb_db_close(), but I'm not sure what's really going on since truss is just sitting around and prstat reports the pid in sleep. My guess is that it's walking the entire cache and free-ing it, but I would expect the memory to change or the truss to return free's in that case. Any ideas? My last log messages are: daemon: shutdown requested and initiated. slapd shutdown: waiting for 0 threads to terminate
Attach to it with gdb and get a stack trace.
Always good advice, but given that heavy disk activity, do you have an appropriate checkpoint directive set?
Symptoms would be 1) problem occurs only after some period of real activity, not when stopped immediately after start, and 2) large fraction of time in I/O wait state (cf. top, vmstat.) I don't know for sure what an appropriate checkpoint would be, but I know what happens without one.
Donn Cave, donn@u.washington.edu
On 1/25/07, Donn Cave donn@u.washington.edu wrote:
On Jan 25, 2007, at 6:49 AM, Aaron Richton wrote:
On Thu, 25 Jan 2007, Howard Chu wrote:
matthew sporleder wrote:
I'm running a pretty large slapd 2.3.23 (3G-5G or so) and shutdown is taking anywhere from twenty minutes to an hour. (I'm still waiting for a shutdown and hoping for an hour) I was just wondering what exactly was happening. I suspect that it's something in bdb_db_close(), but I'm not sure what's really going on since truss is just sitting around and prstat reports the pid in sleep. My guess is that it's walking the entire cache and free-ing it, but I would expect the memory to change or the truss to return free's in that case. Any ideas? My last log messages are: daemon: shutdown requested and initiated. slapd shutdown: waiting for 0 threads to terminate
Attach to it with gdb and get a stack trace.
Always good advice, but given that heavy disk activity, do you have an appropriate checkpoint directive set?
Symptoms would be 1) problem occurs only after some period of real activity, not when stopped immediately after start, and 2) large fraction of time in I/O wait state (cf. top, vmstat.) I don't know for sure what an appropriate checkpoint would be, but I know what happens without one.
My checkpoint on all three subordinate databases is: checkpoint 5120 15
openldap-software@openldap.org