Hello,
Since update from OpenLDAP 2.4.23 to OpenLDAP 2.4.32 about one to three times a week a slapd process crashes with a coredump.
Seems it's caused by ldap requests as only some of our servers are affected which are all in the same network zone.
The facts I found out so far:
Syslog: Mar 8 20:13:01 vg0092 slapd[220]: [ID 870088 local4.debug] get_filter: unknown filter type=48 Mar 8 20:13:01 vg0092 last message repeated 14 times Mar 8 20:13:01 vg0092 slapd[220]: [ID 870088 local4.debug] get_filter: unknown filter type=48 Mar 8 20:13:01 vg0092 last message repeated 17 times Mar 8 20:13:01 vg0092 slapd[220]: [ID 870088 local4.debug] get_filter: unknown filter type=48 Mar 8 20:13:01 vg0092 last message repeated 15 times Mar 8 20:13:01 vg0092 slapd[220]: [ID 870088 local4.debug] get_filter: unknown filter type=48 Mar 8 20:13:02 vg0092 last message repeated 18 times Mar 8 20:13:02 vg0092 slapd[220]: [ID 870088 local4.debug] get_filter: unknown filter type=48 Mar 8 20:13:11 vg0092 last message repeated 1091 times Mar 8 20:13:11 vg0092 slapd[220]: [ID 870088 local4.debug] get_filter: unknown filter type=48 Mar 8 20:13:20 vg0092 last message repeated 1057 times Mar 8 20:14:14 vg0092 genunix: [ID 603404 kern.notice] NOTICE: core_log: slapd[220] core dumped: /dpool/vg0092-data/ldap/core/core.slapd.220 Mar 8 20:14:14 vg0092 slapd[7288]: [ID 702911 local4.debug] @(#) $OpenLDAP: slapd 2.4.32 (Aug 5 2012 00:09:28) $ Mar 8 20:14:14 vg0092 steve@sunblade2500:/bigdisk/SOURCES/S10/openldap-2.4.32/servers/slapd Mar 8 20:14:14 vg0092 slapd[7299]: [ID 643551 local4.debug] hdb_db_open: database "dc=scom": unclean shutdown detected; attempting recovery. Mar 8 20:14:31 vg0092 last message repeated 2 times Mar 8 20:14:42 vg0092 last message repeated 5 times Mar 8 20:15:03 vg0092 slapd[8246]: [ID 702911 local4.debug] @(#) $OpenLDAP: slapd 2.4.32 (Aug 5 2012 00:09:28) $ Mar 8 20:15:03 vg0092 steve@sunblade2500:/bigdisk/SOURCES/S10/openldap-2.4.32/servers/slapd Mar 8 20:15:03 vg0092 ldap: [ID 702911 user.warning] vg0092 slapd maintenance, rebuilding, WARNING
The 'unknown filter' messages are caused by HPUX clients. By the crash the Berkeley-DB became corrupt and has to be rebuilt.
Coredump: # adb /usr/local/libexec/slapd core.slapd.220 core file = core.slapd.220 -- program ``/usr/local/libexec/slapd'' on platform SUNW,SPARC-Enterprise-T5120 SIGABRT: Abort $c libc.so.1`_lwp_kill+8(6, 0, fed87080, fecede54, ffffffff, 6) libc.so.1`abort+0x110(b07ff4e8, 1, fed833f0, ffba0, fed85518, 0) libc.so.1`_assert+0x64(12d0d0, 12c9d0, 3a8, 0, ff8bc, 19418c) connection_next+0x138(0, b07ff7c4, b07ff7c0, 199d1c, fd17ba00, 1a2000) 0x112574(8000, b07ffcb8, 5e9bb4, 199d1c, b07ff8a8, 1c77a8) monitor_entry_create+0x94(714ba50, b07ffcb8, 0, 545d64, b07ff8a8, 546084) 0xe1eec(714ba50, b07ffcb8, 545d3c, 0, 1, 1a2400) monitor_back_search+0x248(714ba50, b07ffcb8, 0, 142a7da8, e1fb8, 1971d8) fe_op_search+0x420(714ba50, b07ffcb8, 12d838, 0, 1a2928, 1a2a20) do_search+0x618(714ba50, b07ffcb8, fed87940, 0, 3f0f4, b07ffa38) 0x3da44(b07ffe08, 714ba50, fed87940, 0, fd17ba00, 0) 0x3e3d0(0, 2f, fed87940, 0, fd17ba00, 2330ec) libldap_r-2.4.so.2`ldap_int_thread_pool_wrapper+0x190(2330a8, b0800000, 0, 0, ff30ed80, 1) libc.so.1`_lwp_start(0, 0, 0, 0, 0, 0)
pflags shows, that lwp 25 might be the culprit:
# pflags core.slapd.220 core 'core.slapd.220' of 220: /usr/local/libexec/slapd -4 -u ldap -g ldap -f /dpool/vg0092-data/ldap data model = _ILP32 flags = MSACCT|MSFORK /1: flags = STOPPED lwp_wait(0x4,0xffbffb34) why = PR_SUSPENDED /2: flags = STOPPED pollsys(0x4,0x9f,0x0,0x0) why = PR_SUSPENDED /3: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /4: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /5: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /6: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /7: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /8: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /9: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /10: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /11: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /12: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /13: flags = DETACH|STOPPED why = PR_SUSPENDED /14: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /15: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /16: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /17: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /18: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /19: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /20: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /21: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /22: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /23: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /24: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /25: flags = DETACH sigmask = 0xffffbefc,0x0000ffff cursig = SIGABRT /26: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /27: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /28: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /29: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /30: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /31: flags = DETACH|STOPPED why = PR_SUSPENDED /32: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /33: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /34: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED
pstack: ----------------- lwp# 25 / thread# 25 -------------------- fed0e8cc _lwp_kill (6, 0, fed87080, fecede54, ffffffff, 6) + 8 fec82950 abort (b07ff4e8, 1, fed833f0, ffba0, fed85518, 0) + 110 fec82b8c _assert (12d0d0, 12c9d0, 3a8, 0, ff8bc, 19418c) + 64 0003cc64 connection_next (0, b07ff7c4, b07ff7c0, 199d1c, fd17ba00, 1a2000) + 138 00112574 ???????? (8000, b07ffcb8, 5e9bb4, 199d1c, b07ff8a8, 1c77a8) 00114670 monitor_entry_create (714ba50, b07ffcb8, 0, 545d64, b07ff8a8, 546084) + 94 000e1eec ???????? (714ba50, b07ffcb8, 545d3c, 0, 1, 1a2400) 000e2200 monitor_back_search (714ba50, b07ffcb8, 0, 142a7da8, e1fb8, 1971d8) + 248 0004005c fe_op_search (714ba50, b07ffcb8, 12d838, 0, 1a2928, 1a2a20) + 420 0003f70c do_search (714ba50, b07ffcb8, fed87940, 0, 3f0f4, b07ffa38) + 618 0003da44 ???????? (b07ffe08, 714ba50, fed87940, 0, fd17ba00, 0) 0003e3d0 ???????? (0, 2f, fed87940, 0, fd17ba00, 2330ec) ff30ef10 ldap_int_thread_pool_wrapper (2330a8, b0800000, 0, 0, ff30ed80, 1) + 190 fed0abd8 _lwp_start (0, 0, 0, 0, 0, 0)
Questions: - Is this a known problem? - If yes: is it already fixed in OpenLDAP 2.4.34 or can it be circumvented? - If no: Is there any additional info I can provide which might be helpful?
Sending the coredump is no option yet as it contains all password hashes etc.
Regards
Jürgen Sprenger
E-Mail: mailto:juergen.sprenger@swisscom.com Internet: http://www.swisscom.com/it-services
On Wed, 13 Mar 2013, Juergen.Sprenger@swisscom.com wrote:
Well...
# adb /usr/local/libexec/slapd core.slapd.220 core file = core.slapd.220 -- program ``/usr/local/libexec/slapd'' on platform SUNW,SPARC-Enterprise-T5120 SIGABRT: Abort
Obviously you're generating core, but it's not a SEGV; rather...
libc.so.1`abort+0x110(b07ff4e8, 1, fed833f0, ffba0, fed85518, 0) libc.so.1`_assert+0x64(12d0d0, 12c9d0, 3a8, 0, ff8bc, 19418c) connection_next+0x138(0, b07ff7c4, b07ff7c0, 199d1c, fd17ba00, 1a2000)
slapd is "voluntarily" assert()'ing something. In OpenLDAP software, these assertions include __FILE__, __LINE__ which are a really helpful clue (and don't contain passwords/other sensitive information). Perhaps you could track that down as a starting point?
I might suggest you consider using a more symbolic debugger like dbx/gdb; I think people will be more familiar with those than adb/mdb when helping you on non-Solaris-specific lists. If you're good enough with adb to coerce that output, feel free to continue, but the readily documented debugging suggestions (e.g. those in the OpenLDAP FAQ) are not geared for adb/mdb.
(Dangerously/prematurely) dusting off the crystal ball...
data model = _ILP32 flags = MSACCT|MSFORK
A T5120 is a pretty large machine to be running a 32-bit build on. You may well be running into some address-space related limit, and you may see this when you find your assert() string. I'd highly suggest recompiling/reloading your database with 64-bit binaries. There's no reason to buy a modern server and use only a fraction of its capacity.
openldap-technical@openldap.org