https://bugs.openldap.org/show_bug.cgi?id=9421
Issue ID: 9421 Summary: SIGSEGV in the MMR synchro Product: OpenLDAP Version: 2.4.56 Hardware: x86_64 OS: Linux Status: UNCONFIRMED Severity: normal Priority: --- Component: overlays Assignee: bugs@openldap.org Reporter: benjamin.demarteau@liege.be Target Milestone: ---
We are in the process of migrating from a single outdated node to an up to date MMR cluster. Through this process we write LSC synchronizations from the old server to the new server so we can keep the old server around. Our preliminary tests show that when LSC hammers the ldap using multiple threads while another node is included in the replication, we get segmentation faults with the following backtrace:
#0 0x00007f7f578748ef in __strncasecmp_l_avx () from /lib64/libc.so.6 #1 0x000056094a7ca298 in avl_find (root=0x56094bb28820, data=data@entry=0x7f7e74000cd0, fcmp=fcmp@entry=0x56094a7166a0 <oc_index_name_cmp>) at avl.c:545 #2 0x000056094a716bde in oc_bvfind (ocname=0x7f7e74000cd0) at oc.c:186 #3 oc_bvfind (ocname=ocname@entry=0x7f7e74000cd0) at oc.c:178 #4 0x000056094a70ec5a in objectSubClassMatch (matchp=0x7f7e5fff8c8c, flags=256, syntax=<optimized out>, mr=<optimized out>, value=<optimized out>, assertedValue=0x7f7e74000cd0) at schema_prep.c:214 #5 0x000056094a6e9fb9 in ordered_value_match (match=match@entry=0x7f7e5fff8c8c, ad=0x56094bb184e0, mr=mr@entry=0x56094bb09810, flags=flags@entry=256, v1=v1@entry=0x7f7e5810f470, v2=v2@entry=0x7f7e74000cd0, text=0x7f7e5fff8c90) at value.c:693 #6 0x000056094a6ec44d in test_ava_filter (op=op@entry=0x7f7e5fff90c0, e=e@entry=0x56094bb54a88, ava=0x7f7e74000cc8, type=type@entry=163) at filterentry.c:777 #7 0x000056094a6ecfec in test_filter (op=op@entry=0x7f7e5fff90c0, e=e@entry=0x56094bb54a88, f=f@entry=0x7f7e74000d08) at filterentry.c:88 #8 0x000056094a6ecc81 in test_filter_and (flist=<optimized out>, e=0x56094bb54a88, op=0x7f7e5fff90c0) at filterentry.c:879 #9 test_filter (op=op@entry=0x7f7e5fff90c0, e=0x56094bb54a88, f=<optimized out>) at filterentry.c:118 #10 0x00007f7f5382c58f in syncprov_matchops (op=op@entry=0x7f7e5fff9c80, opc=opc@entry=0x7f7e58001808, saveit=saveit@entry=0) at syncprov.c:1393 #11 0x00007f7f5382e37f in syncprov_op_response (op=0x7f7e5fff9c80, rs=<optimized out>) at syncprov.c:2115 #12 0x000056094a6dcb98 in slap_response_play (op=op@entry=0x7f7e5fff9c80, rs=rs@entry=0x7f7e5fff9c10) at result.c:508 #13 0x000056094a6dd11c in send_ldap_response (op=op@entry=0x7f7e5fff9c80, rs=rs@entry=0x7f7e5fff9c10) at result.c:583 #14 0x000056094a6ddd43 in slap_send_ldap_result (op=0x7f7e5fff9c80, rs=0x7f7e5fff9c10) at result.c:861 #15 0x000056094a7a86fd in mdb_add (op=0x7f7e5fff9c80, rs=0x7f7e5fff9c10) at add.c:435 #16 0x000056094a73cd78 in overlay_op_walk (op=op@entry=0x7f7e5fff9c80, rs=0x7f7e5fff9c10, which=op_add, oi=0x56094bb8a720, on=<optimized out>) at backover.c:677 #17 0x000056094a73ceab in over_op_func (op=0x7f7e5fff9c80, rs=<optimized out>, which=<optimized out>) at backover.c:730 #18 0x00007f7f5361ff6a in accesslog_response (op=<optimized out>, rs=<optimized out>) at accesslog.c:1877 #19 0x000056094a6dcb98 in slap_response_play (op=op@entry=0x7f7e7410fff0, rs=rs@entry=0x7f7e5fffa870) at result.c:508 #20 0x000056094a6dd11c in send_ldap_response (op=op@entry=0x7f7e7410fff0, rs=rs@entry=0x7f7e5fffa870) at result.c:583 #21 0x000056094a6ddd43 in slap_send_ldap_result (op=0x7f7e7410fff0, rs=0x7f7e5fffa870) at result.c:861 #22 0x000056094a7a86fd in mdb_add (op=0x7f7e7410fff0, rs=0x7f7e5fffa870) at add.c:435 #23 0x000056094a73cd78 in overlay_op_walk (op=op@entry=0x7f7e7410fff0, rs=0x7f7e5fffa870, which=op_add, oi=0x56094bb8a900, on=<optimized out>) at backover.c:677 #24 0x000056094a73ceab in over_op_func (op=0x7f7e7410fff0, rs=<optimized out>, which=<optimized out>) at backover.c:730 #25 0x000056094a6d32bd in fe_op_add (op=0x7f7e7410fff0, rs=0x7f7e5fffa870) at add.c:334 #26 0x000056094a6d4139 in do_add (op=0x7f7e7410fff0, rs=0x7f7e5fffa870) at add.c:194 #27 0x000056094a6cbfc0 in connection_operation (ctx=ctx@entry=0x7f7e5fffaab0, arg_v=arg_v@entry=0x7f7e7410fff0) at connection.c:1175 #28 0x000056094a6ccdbe in connection_read_thread (ctx=0x7f7e5fffaab0, argv=0x1a) at connection.c:1311 #29 0x00007f7f5903bead in ldap_int_thread_pool_wrapper (xpool=0x56094bb2a1d0) at tpool.c:696 #30 0x00007f7f57ae414a in start_thread () from /lib64/libpthread.so.0 #31 0x00007f7f57815f23 in clone () from /lib64/libc.so.6
If we take down the second node, we cannot reproduce the segfaults anymore.
Let me know if we can provide more information (we can't provide the core dump since it's full of passwords).