Hello,
Since update from OpenLDAP 2.4.23 to OpenLDAP 2.4.32 about one to three times a week a slapd process crashes with a coredump.
Seems it's caused by ldap requests as only some of our servers are affected which are all in the same network zone.
The facts I found out so far:
Syslog: Mar 8 20:13:01 vg0092 slapd[220]: [ID 870088 local4.debug] get_filter: unknown filter type=48 Mar 8 20:13:01 vg0092 last message repeated 14 times Mar 8 20:13:01 vg0092 slapd[220]: [ID 870088 local4.debug] get_filter: unknown filter type=48 Mar 8 20:13:01 vg0092 last message repeated 17 times Mar 8 20:13:01 vg0092 slapd[220]: [ID 870088 local4.debug] get_filter: unknown filter type=48 Mar 8 20:13:01 vg0092 last message repeated 15 times Mar 8 20:13:01 vg0092 slapd[220]: [ID 870088 local4.debug] get_filter: unknown filter type=48 Mar 8 20:13:02 vg0092 last message repeated 18 times Mar 8 20:13:02 vg0092 slapd[220]: [ID 870088 local4.debug] get_filter: unknown filter type=48 Mar 8 20:13:11 vg0092 last message repeated 1091 times Mar 8 20:13:11 vg0092 slapd[220]: [ID 870088 local4.debug] get_filter: unknown filter type=48 Mar 8 20:13:20 vg0092 last message repeated 1057 times Mar 8 20:14:14 vg0092 genunix: [ID 603404 kern.notice] NOTICE: core_log: slapd[220] core dumped: /dpool/vg0092-data/ldap/core/core.slapd.220 Mar 8 20:14:14 vg0092 slapd[7288]: [ID 702911 local4.debug] @(#) $OpenLDAP: slapd 2.4.32 (Aug 5 2012 00:09:28) $ Mar 8 20:14:14 vg0092 steve@sunblade2500:/bigdisk/SOURCES/S10/openldap-2.4.32/servers/slapd Mar 8 20:14:14 vg0092 slapd[7299]: [ID 643551 local4.debug] hdb_db_open: database "dc=scom": unclean shutdown detected; attempting recovery. Mar 8 20:14:31 vg0092 last message repeated 2 times Mar 8 20:14:42 vg0092 last message repeated 5 times Mar 8 20:15:03 vg0092 slapd[8246]: [ID 702911 local4.debug] @(#) $OpenLDAP: slapd 2.4.32 (Aug 5 2012 00:09:28) $ Mar 8 20:15:03 vg0092 steve@sunblade2500:/bigdisk/SOURCES/S10/openldap-2.4.32/servers/slapd Mar 8 20:15:03 vg0092 ldap: [ID 702911 user.warning] vg0092 slapd maintenance, rebuilding, WARNING
The 'unknown filter' messages are caused by HPUX clients. By the crash the Berkeley-DB became corrupt and has to be rebuilt.
Coredump: # adb /usr/local/libexec/slapd core.slapd.220 core file = core.slapd.220 -- program ``/usr/local/libexec/slapd'' on platform SUNW,SPARC-Enterprise-T5120 SIGABRT: Abort $c libc.so.1`_lwp_kill+8(6, 0, fed87080, fecede54, ffffffff, 6) libc.so.1`abort+0x110(b07ff4e8, 1, fed833f0, ffba0, fed85518, 0) libc.so.1`_assert+0x64(12d0d0, 12c9d0, 3a8, 0, ff8bc, 19418c) connection_next+0x138(0, b07ff7c4, b07ff7c0, 199d1c, fd17ba00, 1a2000) 0x112574(8000, b07ffcb8, 5e9bb4, 199d1c, b07ff8a8, 1c77a8) monitor_entry_create+0x94(714ba50, b07ffcb8, 0, 545d64, b07ff8a8, 546084) 0xe1eec(714ba50, b07ffcb8, 545d3c, 0, 1, 1a2400) monitor_back_search+0x248(714ba50, b07ffcb8, 0, 142a7da8, e1fb8, 1971d8) fe_op_search+0x420(714ba50, b07ffcb8, 12d838, 0, 1a2928, 1a2a20) do_search+0x618(714ba50, b07ffcb8, fed87940, 0, 3f0f4, b07ffa38) 0x3da44(b07ffe08, 714ba50, fed87940, 0, fd17ba00, 0) 0x3e3d0(0, 2f, fed87940, 0, fd17ba00, 2330ec) libldap_r-2.4.so.2`ldap_int_thread_pool_wrapper+0x190(2330a8, b0800000, 0, 0, ff30ed80, 1) libc.so.1`_lwp_start(0, 0, 0, 0, 0, 0)
pflags shows, that lwp 25 might be the culprit:
# pflags core.slapd.220 core 'core.slapd.220' of 220: /usr/local/libexec/slapd -4 -u ldap -g ldap -f /dpool/vg0092-data/ldap data model = _ILP32 flags = MSACCT|MSFORK /1: flags = STOPPED lwp_wait(0x4,0xffbffb34) why = PR_SUSPENDED /2: flags = STOPPED pollsys(0x4,0x9f,0x0,0x0) why = PR_SUSPENDED /3: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /4: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /5: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /6: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /7: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /8: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /9: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /10: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /11: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /12: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /13: flags = DETACH|STOPPED why = PR_SUSPENDED /14: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /15: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /16: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /17: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /18: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /19: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /20: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /21: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /22: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /23: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /24: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /25: flags = DETACH sigmask = 0xffffbefc,0x0000ffff cursig = SIGABRT /26: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /27: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /28: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /29: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /30: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /31: flags = DETACH|STOPPED why = PR_SUSPENDED /32: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /33: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED /34: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0) why = PR_SUSPENDED
pstack: ----------------- lwp# 25 / thread# 25 -------------------- fed0e8cc _lwp_kill (6, 0, fed87080, fecede54, ffffffff, 6) + 8 fec82950 abort (b07ff4e8, 1, fed833f0, ffba0, fed85518, 0) + 110 fec82b8c _assert (12d0d0, 12c9d0, 3a8, 0, ff8bc, 19418c) + 64 0003cc64 connection_next (0, b07ff7c4, b07ff7c0, 199d1c, fd17ba00, 1a2000) + 138 00112574 ???????? (8000, b07ffcb8, 5e9bb4, 199d1c, b07ff8a8, 1c77a8) 00114670 monitor_entry_create (714ba50, b07ffcb8, 0, 545d64, b07ff8a8, 546084) + 94 000e1eec ???????? (714ba50, b07ffcb8, 545d3c, 0, 1, 1a2400) 000e2200 monitor_back_search (714ba50, b07ffcb8, 0, 142a7da8, e1fb8, 1971d8) + 248 0004005c fe_op_search (714ba50, b07ffcb8, 12d838, 0, 1a2928, 1a2a20) + 420 0003f70c do_search (714ba50, b07ffcb8, fed87940, 0, 3f0f4, b07ffa38) + 618 0003da44 ???????? (b07ffe08, 714ba50, fed87940, 0, fd17ba00, 0) 0003e3d0 ???????? (0, 2f, fed87940, 0, fd17ba00, 2330ec) ff30ef10 ldap_int_thread_pool_wrapper (2330a8, b0800000, 0, 0, ff30ed80, 1) + 190 fed0abd8 _lwp_start (0, 0, 0, 0, 0, 0)
Questions: - Is this a known problem? - If yes: is it already fixed in OpenLDAP 2.4.34 or can it be circumvented? - If no: Is there any additional info I can provide which might be helpful?
Sending the coredump is no option yet as it contains all password hashes etc.
Regards
Jürgen Sprenger
E-Mail: mailto:juergen.sprenger@swisscom.com Internet: http://www.swisscom.com/it-services