Hi,
I have the following problem:
On a sycrepl provider I have lots (100+) consumers in refresh and persist mode. After upgrading the provider from 2.3.x to 2.4.25 I can crash the server by a single mod on the root object of one database.
Aug 15 14:18:37 trzs721boot kernel: [544888.798212] slapd[2861]: segfault at 0 ip 00007fbf89494522 sp 00007fbe8cfa7ca0 error 4 in slapd[7fbf8942c000+1b6000]
I reproduced this on a test system, even with 2.4.26. The consumer are ldapsearch clients like this "-E!sync=rp/rid=xxx,csn= * +" on a single 2.4.25 machine. All SLES 11 SP1 64bit.
Here is the gbd output.
I tried to create a core dump, but I could not get it work. I used this howto. The "top" example works, I get a core file for user ldap. With slapd it is not.
Why does slapd crash here?
Marc
Marc Patermann schrieb am 15.08.2011 15:00 Uhr:
I tried to create a core dump, but I could not get it work. I used this howto. The "top" example works, I get a core file for user ldap. With slapd it is not.
sorry, I forgot the link: http://www.unix.com/security/55651-how-set-coredump-suse-10-a.html
Marc
--On Monday, August 15, 2011 4:34 PM +0200 Marc Patermann hans.moser@ofd-z.niedersachsen.de wrote:
Marc Patermann schrieb am 15.08.2011 15:00 Uhr:
I tried to create a core dump, but I could not get it work. I used this howto. The "top" example works, I get a core file for user ldap. With slapd it is not.
sorry, I forgot the link: http://www.unix.com/security/55651-how-set-coredump-suse-10-a.html
http://wiki.zimbra.com/wiki/Enabling_Core_Files
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
Quanah,
Quanah Gibson-Mount schrieb am 15.08.2011 19:05 Uhr:
--On Monday, August 15, 2011 4:34 PM +0200 Marc Patermann hans.moser@ofd-z.niedersachsen.de wrote:
Marc Patermann schrieb am 15.08.2011 15:00 Uhr:
I tried to create a core dump, but I could not get it work. I used this howto. The "top" example works, I get a core file for user ldap. With slapd it is not.
sorry, I forgot the link: http://www.unix.com/security/55651-how-set-coredump-suse-10-a.html
Thanks, "echo 2 > /proc/sys/fs/suid_dumpable" did the trick.
I'm have a core dump now. What do you want me to do with it in gdb? "thread apply all bt" or something else?
Marc
Marc Patermann wrote:
Hi,
I have the following problem:
On a sycrepl provider I have lots (100+) consumers in refresh and persist mode. After upgrading the provider from 2.3.x to 2.4.25 I can crash the server by a single mod on the root object of one database.
Aug 15 14:18:37 trzs721boot kernel: [544888.798212] slapd[2861]: segfault at 0 ip 00007fbf89494522 sp 00007fbe8cfa7ca0 error 4 in slapd[7fbf8942c000+1b6000]
I reproduced this on a test system, even with 2.4.26. The consumer are ldapsearch clients like this "-E!sync=rp/rid=xxx,csn= * +" on a single 2.4.25 machine. All SLES 11 SP1 64bit.
Here is the gbd output.
I tried to create a core dump, but I could not get it work. I used this howto. The "top" example works, I get a core file for user ldap. With slapd it is not.
Why does slapd crash here?
This looks like the same trace as ITS#6892, but that was already patched/fixed in 2.4.26. Need a bit more info from the crash. E.g. print *ss print *ss->s_op
Howard,
Howard Chu schrieb am 15.08.2011 23:20 Uhr:
Marc Patermann wrote:
Why does slapd crash here?
This looks like the same trace as ITS#6892, but that was already patched/fixed in 2.4.26.
# rpm -qa openldap2 openldap2-2.4.26-143.1
(the Ralf Haferkamp SLES rpms)
Need a bit more info from the crash. E.g. print *ss print *ss->s_op
Sorry, I need a bit more information, I do not understand what to do.
Marc
Howard,
Howard Chu schrieb am 15.08.2011 23:20 Uhr:
Marc Patermann wrote:
Why does slapd crash here?
This looks like the same trace as ITS#6892, but that was already patched/fixed in 2.4.26. Need a bit more info from the crash. E.g. print *ss print *ss->s_op
Is this, what you like to see?
(gdb) where #0 test_filter (op=0x7fb792fa40a0, e=0x7fb87cd6c088, f=0x0) at filterentry.c:69 #1 0x00007fb87c606b09 in syncprov_matchops (op=0x7fb7cc2174f0, opc=0x7fb7d8a468e8, saveit=1) at syncprov.c:1313 #2 0x00007fb87c606fc7 in syncprov_op_mod (op=0x7fb7cc2174f0, rs=<value optimized out>) at syncprov.c:2140 #3 0x00007fb87c58cc2a in overlay_op_walk (op=0x7fb7cc2174f0, rs=0x7fb792fa5940, which=op_modify, oi=0x7fb87ca21630, on=0x7fb87ca220a0) at backover.c:661 #4 0x00007fb87c58d88d in over_op_func (op=0x7fb7cc2174f0, rs=0x7fb87c58d5a0, which=2089371256) at backover.c:723 #5 0x00007fb87c537a4f in fe_op_modify (op=0x7fb7cc2174f0, rs=0x7fb792fa5940) at modify.c:303 #6 0x00007fb87c5383c5 in do_modify (op=0x7fb7cc2174f0, rs=0x7fb792fa5940) at modify.c:177 #7 0x00007fb87c51d6fd in connection_operation (ctx=0x7fb792fa5b90, arg_v=<value optimized out>) at connection.c:1138 #8 0x00007fb87c51e5ef in connection_read_thread (ctx=0x7fb792fa5b90, argv=0xb) at connection.c:1274 #9 0x00007fb87c0759d8 in ldap_int_thread_pool_wrapper (xpool=<value optimized out>) at tpool.c:685 #10 0x00007fb87b4905f0 in start_thread (arg=<value optimized out>) at pthread_create.c:297 #11 0x00007fb879b2d84d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #12 0x0000000000000000 in ?? () (gdb)
Marc
Howard Chu schrieb am 15.08.2011 23:20 Uhr:
Marc Patermann wrote:
Why does slapd crash here?
This looks like the same trace as ITS#6892, but that was already patched/fixed in 2.4.26. Need a bit more info from the crash. E.g. print *ss print *ss->s_op
(gdb) print *ss No symbol "ss" in current context. (gdb) frame 1 #1 0x00007fb87c606b09 in syncprov_matchops (op=0x7fb7cc2174f0, opc=0x7fb7d8a468e8, saveit=1) at syncprov.c:1313 1313 syncprov.c: No such file or directory. in syncprov.c (gdb) print *ss $1 = {s_next = 0x7fb6d0b37b60, s_base = {bv_len = 56, bv_val = 0x7fb6d9b62450 "ou=linux,ou=steuer,o=landesverwaltung niedersachsen,c=de"}, s_eid = 1, s_op = 0x7fb7d8838da0, s_rid = 159, s_sid = -1, s_filterstr = {bv_len = 15, bv_val = 0x7fb7e4064760 "(objectClass=*)"}, s_flags = 1, s_inuse = 1, s_res = 0x0, s_restail = 0x0, s_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}} (gdb) print *ss->s_op $2 = {o_hdr = 0x7fb7d8838f10, o_tag = 0, o_time = 0, o_tincr = 1, o_bd = 0x0, o_req_dn = {bv_len = 0, bv_val = 0x0}, o_req_ndn = {bv_len = 0, bv_val = 0x0}, o_request = {oq_add = {rs_modlist = 0x0, rs_e = 0x0}, oq_bind = {rb_method = 0, rb_cred = {bv_len = 0, bv_val = 0x0}, rb_edn = { bv_len = 0, bv_val = 0x0}, rb_ssf = 0, rb_mech = {bv_len = 0, bv_val = 0x0}}, oq_compare = {rs_ava = 0x0}, oq_modify = {rs_mods = { rs_modlist = 0x0, rs_no_opattrs = 0 '\000'}, rs_increment = 0}, oq_modrdn = {rs_mods = {rs_modlist = 0x0, rs_no_opattrs = 0 '\000'}, rs_deleteoldrdn = 0, rs_newrdn = {bv_len = 0, bv_val = 0x0}, rs_nnewrdn = {bv_len = 0, bv_val = 0x0}, rs_newSup = 0x0, rs_nnewSup = 0x0}, oq_search = {rs_scope = 0, rs_deref = 0, rs_slimit = 0, rs_tlimit = 0, rs_limit = 0x0, rs_attrsonly = 0, rs_attrs = 0x0, rs_filter = 0x0, rs_filterstr = {bv_len = 0, bv_val = 0x0}}, oq_abandon = {rs_msgid = 0}, oq_cancel = {rs_msgid = 0}, oq_extended = {rs_reqoid = {bv_len = 0, bv_val = 0x0}, rs_flags = 0, rs_reqdata = 0x0}, oq_pwdexop = {rs_extended = {rs_reqoid = {bv_len = 0, bv_val = 0x0}, rs_flags = 0, rs_reqdata = 0x0}, rs_old = {bv_len = 0, bv_val = 0x0}, rs_new = {bv_len = 0, bv_val = 0x0}, rs_mods = 0x0, rs_modtail = 0x0}}, o_abandon = 0, o_cancel = 0, o_groups = 0x0, o_do_not_cache = 0 '\000', o_is_auth_check = 0 '\000', o_dont_replicate = 0 '\000', o_acl_priv = ACL_NONE, o_nocaching = 0 '\000', o_delete_glue_parent = 0 '\000', o_no_schema_check = 0 '\000', o_no_subordinate_glue = 0 '\000', o_ctrlflag = '\000' <repeats 31 times>, o_controls = 0x7fb7d8839058, o_authz = {sai_method = 0, sai_mech = {bv_len = 0, bv_val = 0x0}, sai_dn = { bv_len = 0, bv_val = 0x0}, sai_ndn = {bv_len = 0, bv_val = 0x0}, sai_ssf = 0, sai_transport_ssf = 0, sai_tls_ssf = 0, sai_sasl_ssf = 0}, o_ber = 0x0, o_res_ber = 0x0, o_callback = 0x0, o_ctrls = 0x0, o_csn = {bv_len = 0, bv_val = 0x0}, o_private = 0x0, o_extra = {slh_first = 0x0}, o_next = {stqe_next = 0x0}} (gdb)
Marc
Marc Patermann wrote:
Howard Chu schrieb am 15.08.2011 23:20 Uhr:
Marc Patermann wrote:
Why does slapd crash here?
This looks like the same trace as ITS#6892, but that was already patched/fixed in 2.4.26. Need a bit more info from the crash. E.g. print *ss print *ss->s_op
This output shows that a consumer making a persistent search abandoned the search while syncprov was still setting up the psearch, and a modification whose target belonged to the psearch occurred at that time.
syncprov of course sets up locks to prevent this kind of crash from happening, but apparently there's still a bug there.
(gdb) print *ss No symbol "ss" in current context. (gdb) frame 1 #1 0x00007fb87c606b09 in syncprov_matchops (op=0x7fb7cc2174f0, opc=0x7fb7d8a468e8, saveit=1) at syncprov.c:1313 1313 syncprov.c: No such file or directory. in syncprov.c (gdb) print *ss $1 = {s_next = 0x7fb6d0b37b60, s_base = {bv_len = 56, bv_val = 0x7fb6d9b62450 "ou=linux,ou=steuer,o=landesverwaltung niedersachsen,c=de"}, s_eid = 1, s_op = 0x7fb7d8838da0, s_rid = 159, s_sid = -1, s_filterstr = {bv_len = 15, bv_val = 0x7fb7e4064760 "(objectClass=*)"}, s_flags = 1, s_inuse = 1, s_res = 0x0, s_restail = 0x0, s_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000'<repeats 39 times>, __align = 0}} (gdb) print *ss->s_op $2 = {o_hdr = 0x7fb7d8838f10, o_tag = 0, o_time = 0, o_tincr = 1, o_bd = 0x0, o_req_dn = {bv_len = 0, bv_val = 0x0}, o_req_ndn = {bv_len = 0, bv_val = 0x0}, o_request = {oq_add = {rs_modlist = 0x0, rs_e = 0x0}, oq_bind = {rb_method = 0, rb_cred = {bv_len = 0, bv_val = 0x0}, rb_edn = { bv_len = 0, bv_val = 0x0}, rb_ssf = 0, rb_mech = {bv_len = 0, bv_val = 0x0}}, oq_compare = {rs_ava = 0x0}, oq_modify = {rs_mods = { rs_modlist = 0x0, rs_no_opattrs = 0 '\000'}, rs_increment = 0}, oq_modrdn = {rs_mods = {rs_modlist = 0x0, rs_no_opattrs = 0 '\000'}, rs_deleteoldrdn = 0, rs_newrdn = {bv_len = 0, bv_val = 0x0}, rs_nnewrdn = {bv_len = 0, bv_val = 0x0}, rs_newSup = 0x0, rs_nnewSup = 0x0}, oq_search = {rs_scope = 0, rs_deref = 0, rs_slimit = 0, rs_tlimit = 0, rs_limit = 0x0, rs_attrsonly = 0, rs_attrs = 0x0, rs_filter = 0x0, rs_filterstr = {bv_len = 0, bv_val = 0x0}}, oq_abandon = {rs_msgid = 0}, oq_cancel = {rs_msgid = 0}, oq_extended = {rs_reqoid = {bv_len = 0, bv_val = 0x0}, rs_flags = 0, rs_reqdata = 0x0}, oq_pwdexop = {rs_extended = {rs_reqoid = {bv_len = 0, bv_val = 0x0}, rs_flags = 0, rs_reqdata = 0x0}, rs_old = {bv_len = 0, bv_val = 0x0}, rs_new = {bv_len = 0, bv_val = 0x0}, rs_mods = 0x0, rs_modtail = 0x0}}, o_abandon = 0, o_cancel = 0, o_groups = 0x0, o_do_not_cache = 0 '\000', o_is_auth_check = 0 '\000', o_dont_replicate = 0 '\000', o_acl_priv = ACL_NONE, o_nocaching = 0 '\000', o_delete_glue_parent = 0 '\000', o_no_schema_check = 0 '\000', o_no_subordinate_glue = 0 '\000', o_ctrlflag = '\000'<repeats 31 times>, o_controls = 0x7fb7d8839058, o_authz = {sai_method = 0, sai_mech = {bv_len = 0, bv_val = 0x0}, sai_dn = { bv_len = 0, bv_val = 0x0}, sai_ndn = {bv_len = 0, bv_val = 0x0}, sai_ssf = 0, sai_transport_ssf = 0, sai_tls_ssf = 0, sai_sasl_ssf = 0}, o_ber = 0x0, o_res_ber = 0x0, o_callback = 0x0, o_ctrls = 0x0, o_csn = {bv_len = 0, bv_val = 0x0}, o_private = 0x0, o_extra = {slh_first = 0x0}, o_next = {stqe_next = 0x0}} (gdb)
Marc
openldap-technical@openldap.org