openldap-bugs October 2013

openldap-bugs@openldap.org

25 participants
95 discussions

Re: (ITS#7716) slapd crashes after search immediately followed by (abandon+) unbind
by michael＠stroeder.com 02 Oct '13

02 Oct '13

michael.vishchers(a)7p-group.com wrote: > Full_Name: Michael Vishchers > Version: 2.4.23 > OS: Red Hat Enterprise Linux Server release 6.2 Did you check whether this problem still remains with the recent release 2.4.36? (I vaguely remember issues like this being fixed after 2.4.23 but I'm not sure.) Ciao, Michael.

1 0

(ITS#7716) slapd crashes after search immediately followed by (abandon+) unbind
by michael.vishchers＠7p-group.com 02 Oct '13

02 Oct '13

Full_Name: Michael Vishchers Version: 2.4.23 OS: Red Hat Enterprise Linux Server release 6.2 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (178.15.66.50) slapd, running as a proxy to rewrite incoming connections based on user dn for later routing to different back ends, dies sporadically after receiving a (network delayed) search request that is "immediately" followed by an (optional) abandon request and an unbind request. We suspect that the abandon or unbind code tries to clean up data structures that belong to a not yet completely initialized search operation. The problem can unfortunately not easily be reproduced. Last time we had to wait at least two weeks before it appeared. It may be a timing problem between two or more threads. This is the stacktrace, core and other files could be provided if necessary. Program terminated with signal 11, Segmentation fault. #0 0x00007f299c3e71bc in ?? () #0 0x00007f299c3e71bc in ?? () No symbol table info available. #1 0x00007f2999bbd983 in rwm_op_rollback (op=0x7f2984002190, rs=<value optimized out>, ros=0x7f298c003570) at ../../../../servers/slapd/overlays/rwm.c:107 __PRETTY_FUNCTION__ = "rwm_op_rollback" #2 0x00007f2999bbe988 in rwm_op_search (op=0x7f2984002190, rs=0x7f2995bafaa0) at ../../../../servers/slapd/overlays/rwm.c:984 on = 0x7f299fb95210 rwmap = 0x7f299fb94f70 rc = <value optimized out> dc = {rwmap = 0x7f2984002190, conn = 0x7f298c003468, ctx = 0x12 <Address 0x12 out of bounds>, rs = 0x7f299e7963b0} fstr = {bv_len = 0, bv_val = 0x0} f = 0x0 an = 0x0 text = <value optimized out> roc = 0x7f298c003550 #3 0x00007f299e7fe02a in overlay_op_walk (op=0x7f2984002190, rs=0x7f2995bafaa0, which=op_search, oi=0x7f299fb95030, on=0x7f299fb95210) at ../../../servers/slapd/backover.c:659 func = 0x7f299fb95268 rc = 32768 #4 0x00007f299e8d29a1 in slapi_op_func (op=0x7f2984002190, rs=0x7f2995bafaa0) at ../../../../servers/slapd/slapi/slapi_overlay.c:647 pb = 0x7f298c1051b0 which = op_search opinfo = <value optimized out> rc = <value optimized out> oi = <value optimized out> on = <value optimized out> cb = {sc_next = 0x7f2995bae7e0, sc_response = 0x7f299e8d1fc0 <slapi_over_response>, sc_cleanup = 0x7f299e8d1ed0 <slapi_over_cleanup>, sc_private = 0x7f298c1051b0} internal_op = 0 preop_type = <value optimized out> postop_type = 503 be = 0x7f2995bae800 #5 0x00007f299e7fe02a in overlay_op_walk (op=0x7f2984002190, rs=0x7f2995bafaa0, which=op_search, oi=0x7f299fb95030, on=0x7f299fb9e8c0) at ../../../servers/slapd/backover.c:659 func = 0x7f299fb9e918 rc = 32768 #6 0x00007f299e7feb6b in over_op_func (op=0x7f2984002190, rs=<value optimized out>, which=<value optimized out>) at ../../../servers/slapd/backover.c:721 oi = <value optimized out> on = <value optimized out> be = 0x7f299fb940b0 db = {bd_info = 0x7f299fb95210, bd_self = 0x7f299fb940b0, be_ctrls = "\000", '\001' <repeats 17 times>, '\000' <repeats 14 times>, "\001", be_flags = 257, be_restrictops = 0, be_requires = 5, be_ssf_set = {sss_ssf = 0, sss_transport = 0, sss_tls = 0, sss_sasl = 0, sss_update_ssf = 0, sss_update_transport = 0, sss_update_tls = 0, sss_update_sasl = 0, sss_simple_bind = 0}, be_suffix = 0x7f299fb94ed0, be_nsuffix = 0x7f299fb94f00, be_schemadn = {bv_len = 0, bv_val = 0x0}, be_schemandn = {bv_len = 0, bv_val = 0x0}, be_rootdn = {bv_len = 0, bv_val = 0x0}, be_rootndn = {bv_len = 0, bv_val = 0x0}, be_rootpw = {bv_len = 0, bv_val = 0x0}, be_max_deref_depth = 15, be_def_limit = {lms_t_soft = 3600, lms_t_hard = 0, lms_s_soft = 500, lms_s_hard = 0, lms_s_unchecked = -1, lms_s_pr = 0, lms_s_pr_hide = 0, lms_s_pr_total = 0}, be_limits = 0x0, be_acl = 0x0, be_dfltaccess = ACL_READ, be_update_ndn = {bv_len = 0, bv_val = 0x0}, be_update_refs = 0x0, be_pending_csn_list = 0x7f299fc738f0, be_pcl_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, be_syncinfo = 0x0, be_pb = 0x7f299fb9eaa0, be_cf_ocs = 0x7f299eb6da00, be_private = 0x7f299fb94240, be_next = {stqe_next = 0x0}} cb = {sc_next = 0x0, sc_response = 0x7f299e7fdd40 <over_back_response>, sc_cleanup = 0, sc_private = 0x7f299fb95030} sc = <value optimized out> rc = 32768 __PRETTY_FUNCTION__ = "over_op_func" #7 0x00007f299e794999 in fe_op_search (op=0x7f2984002190, rs=0x7f2995bafaa0) at ../../../servers/slapd/search.c:366 bd = 0x7f299eb72760 #8 0x00007f299e795177 in do_search (op=0x7f2984002190, rs=<value optimized out>) at ../../../servers/slapd/search.c:217 base = {bv_len = 55, bv_val = 0x7f298c11fef9 "vfsid=491722472236,ou=subscriber,ou=mmo,c=de,o=vodafone"} siz = 0 off = 0 i = <value optimized out> #9 0x00007f299e7920f9 in connection_operation (ctx=0x7f2995bafb70, arg_v=0x7f2984002190) at ../../../servers/slapd/connection.c:1109 rc = 80 cancel = <value optimized out> op = 0x7f2984002190 rs = {sr_type = REP_RESULT, sr_tag = 0, sr_msgid = 0, sr_err = 80, sr_matched = 0x0, sr_text = 0x7f2999bc4122 "Rewrite error", sr_ref = 0x0, sr_ctrls = 0x0, sr_un = {sru_search = {r_entry = 0x0, r_attr_flags = 0, r_operational_attrs = 0x0, r_attrs = 0x0, r_nentries = 0, r_v2ref = 0x0}, sru_sasl = {r_sasldata = 0x0}, sru_extended = {r_rspoid = 0x0, r_rspdata = 0x0}}, sr_flags = 0} tag = 99 opidx = SLAP_OP_SEARCH conn = 0x7f2996db74d0 memctx = 0x7f298c002820 memctx_null = 0x0 memsiz = 1048576 __PRETTY_FUNCTION__ = "connection_operation" #10 0x00007f299e892678 in ldap_int_thread_pool_wrapper (xpool=0x7f299fae9ae0) at ../../../libraries/libldap_r/tpool.c:685 pool = 0x7f299fae9ae0 task = 0x7f2988000a20 work_list = <value optimized out> ctx = {ltu_id = 139816582448896, ltu_key = {{ltk_key = 0x7f299e790d50, ltk_data = 0x7f298c002d40, ltk_free = 0x7f299e790e30 <conn_counter_destroy>}, {ltk_key = 0x7f299e7eaf70, ltk_data = 0x7f298c002820, ltk_free = 0x7f299e7eae50 <slap_sl_mem_destroy>}, {ltk_key = 0x7f299e7a6b70, ltk_data = 0x0, ltk_free = 0x7f299e7a6940 <slap_op_q_destroy>}, {ltk_key = 0x0, ltk_data = 0x0, ltk_free = 0} <repeats 29 times>}} kctx = <value optimized out> keyslot = <value optimized out> hash = <value optimized out> __PRETTY_FUNCTION__ = "ldap_int_thread_pool_wrapper" #11 0x00007f299c91e7f1 in ?? () No symbol table info available. #12 0x00007f2995bb0700 in ?? () No symbol table info available. #13 0x0000000000000000 in ?? () No symbol table info available.

1 0

Re: (ITS#7715) SIGBUS when mdb is configured with writemap
by hyc＠symas.com 02 Oct '13

02 Oct '13

Željko Nejašmić wrote: > Here you go http://hastebin.com/fukecejuje.tex Interestingly enough, I got the same result as you on an initial compile/run of slapd. Unfortunately, with optimization, the backtrace wasn't all that useful. Recompiling back-mdb with just -g, no optimization, gets a different result though - slapd is fine, and ldclt dies with a heap corruption or double-free. ldclt -h localhost -p 9011 -D cn=manager,dc=example,dc=com -w secret -b ou=people,dc=example,dc=com -e object=xx.txt,rdn='uid:[A=INCRNNOLOOP(200000;999999;6)]' -e add,commoncounter -I 68 ldclt version 4.23 ldclt[10503]: Starting at Wed Oct 2 04:13:03 2013 *** glibc detected *** ldclt: double free or corruption (fasttop): 0x00007fa448003270 *** ======= Backtrace: ========= /lib/x86_64-linux-gnu/libc.so.6(+0x7eb96)[0x7fa463f8cb96] /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2(ldap_pvt_tls_set_option+0x1eb)[0x7fa46472e06b] /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2(ldap_int_tls_config+0x54)[0x7fa46472e234] /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2(+0x2b8b7)[0x7fa4647238b7] /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2(ldap_int_initialize+0x104)[0x7fa464723e84] /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2(ldap_create+0x29)[0x7fa4647074f9] /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2(ldap_initialize+0x2f)[0x7fa464707a7f] ldclt(+0x664a)[0x7fa4656fb64a] ldclt(+0x74a4)[0x7fa4656fc4a4] ldclt(+0x9d48)[0x7fa4656fed48] ldclt(threadMain+0x329)[0x7fa4657085d9] /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7fa4642d4e9a] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fa464001cbd] All subsequent runs give the same result. Still looking into this; getting away from the Ubuntu bundled libldap will probably help. > > > Zeljko > > > On Wed, Oct 2, 2013 at 11:34 AM, <hyc(a)symas.com <mailto:hyc@symas.com>> wrote: > > nejasmicz(a)gmail.com <mailto:nejasmicz@gmail.com> wrote: > > Full_Name: Željko Nejašmić > > Version: latest git pull of OPENLDAP_REL_ENG_2_4 > > OS: RedHat 6.3 > > URL: > > Submission from: (NULL) (213.147.123.33) > > > > > Hardware underneath all of that is: > > 1) HP ProLiant BL460c Gen8, dual Xeon E5-2658 with attached > storage blade > > D2200sb with SSD raid, 256GB RAM -- RedHat 6.3 tests > > 2) Intel server blade S2400BB, dual Xeon E5-2403, 48GB RAM -- > Ubuntu 12.04 > > tests > > > > If anything more is required to assist you in troubleshooting, please > let me > > know. > > Can you also post the gdb output for "bt 5 full" ? > > > > > > Zeljko > > > > > > > -- > -- Howard Chu > CTO, Symas Corp. http://www.symas.com > Director, Highland Sun http://highlandsun.com/hyc/ > Chief Architect, OpenLDAP http://www.openldap.org/project/ > > > -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

1 0

Re: (ITS#7715) SIGBUS when mdb is configured with writemap
by nejasmicz＠gmail.com 02 Oct '13

02 Oct '13

--047d7b67749a03fb9b04e7c010b4 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Here you go http://hastebin.com/fukecejuje.tex Zeljko On Wed, Oct 2, 2013 at 11:34 AM, <hyc(a)symas.com> wrote: > nejasmicz(a)gmail.com wrote: > > Full_Name: =C5=BDeljko Neja=C5=A1mić > > Version: latest git pull of OPENLDAP_REL_ENG_2_4 > > OS: RedHat 6.3 > > URL: > > Submission from: (NULL) (213.147.123.33) > > > > > Hardware underneath all of that is: > > 1) HP ProLiant BL460c Gen8, dual Xeon E5-2658 with attached storag= e > blade > > D2200sb with SSD raid, 256GB RAM -- RedHat 6.3 tests > > 2) Intel server blade S2400BB, dual Xeon E5-2403, 48GB RAM -- > Ubuntu 12.04 > > tests > > > > If anything more is required to assist you in troubleshooting, please > let me > > know. > > Can you also post the gdb output for "bt 5 full" ? > > > > > > Zeljko > > > > > > > -- > -- Howard Chu > CTO, Symas Corp. http://www.symas.com > Director, Highland Sun http://highlandsun.com/hyc/ > Chief Architect, OpenLDAP http://www.openldap.org/project/ > > > --047d7b67749a03fb9b04e7c010b4 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div>Here you go=C2=A0<a href=3D"http://hastebin.com/fukec= ejuje.tex">http://hastebin.com/fukecejuje.tex</a></div><div><br></div><div>= <br></div><div>Zeljko</div></div><div class=3D"gmail_extra"><br><br><div cl= ass=3D"gmail_quote"> On Wed, Oct 2, 2013 at 11:34 AM, <span dir=3D"ltr"><<a href=3D"mailto:h= yc(a)symas.com" target=3D"_blank">hyc(a)symas.com</a>></span> wrote:<br><blo= ckquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #c= cc solid;padding-left:1ex"> <div class=3D"HOEnZb"><div class=3D"h5"><a href=3D"mailto:nejasmicz@gmail.c= om">nejasmicz(a)gmail.com</a> wrote:<br> > Full_Name: =C5=BDeljko Neja=C5=A1mi&#263;<br> > Version: latest git pull of OPENLDAP_REL_ENG_2_4<br> > OS: RedHat 6.3<br> > URL:<br> > Submission from: (NULL) (213.147.123.33)<br> ><br> <br> > Hardware underneath all of that is:<br> > =C2=A0 =C2=A0 =C2=A01) HP ProLiant BL460c Gen8, dual Xeon E5-2658 with= attached storage blade<br> > D2200sb with SSD raid, 256GB RAM -- RedHat 6.3 tests<br> > =C2=A0 =C2=A0 =C2=A02) Intel server blade S2400BB, dual Xeon E5-2403, = 48GB RAM -- Ubuntu 12.04<br> > tests<br> ><br> > If anything more is required to assist you in troubleshooting, please = let me<br> > know.<br> <br> Can you also post the gdb output for "bt 5 full" ?<br> ><br> ><br> > Zeljko<br> ><br> ><br> <br> <br> --<br> =C2=A0 =C2=A0-- Howard Chu<br> =C2=A0 =C2=A0CTO, Symas Corp. =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 <a href=3D= "http://www.symas.com" target=3D"_blank">http://www.symas.com</a><br> =C2=A0 =C2=A0Director, Highland Sun =C2=A0 =C2=A0 <a href=3D"http://highlan= dsun.com/hyc/" target=3D"_blank">http://highlandsun.com/hyc/</a><br> =C2=A0 =C2=A0Chief Architect, OpenLDAP =C2=A0<a href=3D"http://www.openldap= .org/project/" target=3D"_blank">http://www.openldap.org/project/</a><br> <br> <br> </div></div></blockquote></div><br></div> --047d7b67749a03fb9b04e7c010b4--

1 0

Re: (ITS#7715) SIGBUS when mdb is configured with writemap
by hyc＠symas.com 02 Oct '13

02 Oct '13

nejasmicz(a)gmail.com wrote: > Full_Name: eljko Nejamić > Version: latest git pull of OPENLDAP_REL_ENG_2_4 > OS: RedHat 6.3 > URL: > Submission from: (NULL) (213.147.123.33) > > Hardware underneath all of that is: > 1) HP ProLiant BL460c Gen8, dual Xeon E5-2658 with attached storage blade > D2200sb with SSD raid, 256GB RAM -- RedHat 6.3 tests > 2) Intel server blade S2400BB, dual Xeon E5-2403, 48GB RAM -- Ubuntu 12.04 > tests > > If anything more is required to assist you in troubleshooting, please let me > know. Can you also post the gdb output for "bt 5 full" ? > > > Zeljko > > -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

1 0

(ITS#7715) SIGBUS when mdb is configured with writemap
by nejasmicz＠gmail.com 02 Oct '13

02 Oct '13

Full_Name: eljko Nejamić Version: latest git pull of OPENLDAP_REL_ENG_2_4 OS: RedHat 6.3 URL: Submission from: (NULL) (213.147.123.33) Using ldclt tool to stress test our OpenLDAP mirror sync setup I encountered a SIGBUS. Do note that the same issue occurs on only one node too, without sync. I've tested using the aforementioned tool and the same arguments on both Red Hat 6.3 (2.6.32-279.el6.x86_64) and Ubuntu Server 12.04 (Linux 3.2.0-54-generic x86_64) with the exact outcome. In both cases the OpenLDAP was compiled from sources (origin/OPENLDAP_REL_ENG_2_4), configured with --disable-{hdb,bdb}, --prefix=/opt/openldap, --enable-local=yes and using mdb as a backend, tweaked additionally with: * nometasync * writemap Without the writemap tweak, SIGBUS isn't happening. The command used was: ldclt -h 172.17.101.150 -p 389 -D "cn=xxx,dc=xxx" -w "xxx" -b "ds=USERS,o=STANDARD,dc=xxx" \ -e object=xxx.txt,rdn='uid:[A=INCRNNOLOOP(200000;999999;6)]' -e add,commoncounter -I 68 ...where the xxx.txt has the following content: objectclass: xxxUser The ldclt command uses 10 threads to do the add operation with the incrementing uid parameter on the base dn: ds=USERS,o=STANDARD,dc=xxx. Ulimits are: ulimit -a core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 2066206 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 4096 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) unlimited real-time priority (-r) 0 stack size (kbytes, -s) unlimited cpu time (seconds, -t) unlimited max user processes (-u) 1024 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited At first sight, gdb seems to point to mdb_page_alloc: Starting program: /opt/openldap/libexec/slapd -h ldap:///\ ldapi:/// -F /opt/openldap/etc/openldap/slapd.d -g openldap -u openldap -d 0 [Thread debugging using libthread_db enabled] [New Thread 0x2aaaac764700 (LWP 20415)] [Thread 0x2aaaac764700 (LWP 20415) exited] [New Thread 0x2aaaac764700 (LWP 20416)] [New Thread 0x2ab3ad168700 (LWP 20417)] [New Thread 0x2ab3ad969700 (LWP 20418)] [New Thread 0x2ab3ae16a700 (LWP 20419)] [New Thread 0x2ab3ae96b700 (LWP 20420)] [New Thread 0x2ab3b8800700 (LWP 20421)] [New Thread 0x2ab3d4800700 (LWP 20422)] Program received signal SIGBUS, Bus error. [Switching to Thread 0x2ab3b8800700 (LWP 20421)] mdb_page_alloc (mc=<value optimized out>, num=1, mp=0x2ab3b87fd8b8) at ./../../../libraries/liblmdb/mdb.c:1759 warning: Source file is more recent than executable. 1759 np->mp_pgno = pgno; And the backtrace is: #0 mdb_page_alloc (mc=<value optimized out>, num=1, mp=0x2ab3b87fd8b8) at ./../../../libraries/liblmdb/mdb.c:1759 #1 0x00000000004afb19 in mdb_page_touch (mc=0x2ab3bc1103f0) at ./../../../libraries/liblmdb/mdb.c:1889 #2 0x00000000004b1c8c in mdb_cursor_touch (mc=0x2ab3bc1103f0) at ./../../../libraries/liblmdb/mdb.c:5597 #3 0x00000000004b3a85 in mdb_cursor_put (mc=0x2ab3bc1103f0, key=0x2ab3b87ff000, data=0x2ab3b87feff0, flags=32) at ./../../../libraries/liblmdb/mdb.c:5727 #4 0x00000000004f8586 in mdb_idl_insert_keys (be=<value optimized out>, cursor=0x2ab3bc1103f0, keys=<value optimized out>, id=13) at idl.c:534 #5 0x00000000004f9116 in indexer (op=0x2ab3bc10dbd0, txn=<value optimized out>, ai=<value optimized out>, ad=0x88e0e0, atname=0x88dfb8, vals=0x2ab3bc110120, id=13, opid=1, mask=4) at index.c:219 #6 0x00000000004f95d1 in index_at_values (op=0x2ab3bc10dbd0, txn=0x2ab3bc10e2f0, ad=<value optimized out>, type=0x88df50, tags=0x88e100, vals=0x2ab3bc110120, id=13, opid=1) at index.c:337 #7 0x00000000004f9627 in mdb_index_values (op=<value optimized out>, txn=<value optimized out>, desc=<value optimized out>, vals=<value optimized out>, id=<value optimized out>, opid=<value optimized out>) at index.c:386 #8 0x00000000004f96f9 in mdb_index_entry (op=0x2ab3bc10dbd0, txn=0x2ab3bc10e2f0, opid=1, e=0x8c18d8) at index.c:558 #9 0x00000000004ed77e in mdb_add (op=0x2ab3bc10dbd0, rs=0x2ab3b87ff950) at add.c:359 #10 0x0000000000487ac7 in overlay_op_walk (op=0x2ab3bc10dbd0, rs=0x2ab3b87ff950, which=op_add, oi=0x932280, on=0x0) at backover.c:671 #11 0x00000000004884a7 in over_op_func (op=0x2ab3bc10dbd0, rs=<value optimized out>, which=<value optimized out>) at backover.c:723 #12 0x00000000004281c0 in fe_op_add (op=0x2ab3bc10dbd0, rs=0x2ab3b87ff950) at add.c:334 #13 0x0000000000428a16 in do_add (op=0x2ab3bc10dbd0, rs=0x2ab3b87ff950) at add.c:194 #14 0x0000000000421259 in connection_operation (ctx=0x2ab3b87ffab0, arg_v=0x2ab3bc10dbd0) at connection.c:1155 #15 0x0000000000421a35 in connection_read_thread (ctx=0x2ab3b87ffab0, argv=<value optimized out>) at connection.c:1291 #16 0x0000000000516380 in ldap_int_thread_pool_wrapper (xpool=0x898160) at tpool.c:688 #17 0x000000384cc07851 in start_thread () from /lib64/libpthread.so.0 #18 0x000000384c8e767d in clone () from /lib64/libc.so.6 For context, the assembly land around the offending pointer dereferencing looks like: 0x4af949 <mdb_page_alloc+665> movslq %r13d,%rax 0x4af94c <mdb_page_alloc+668> lea (%rcx,%rax,1),%rax 0x4af950 <mdb_page_alloc+672> mov %rax,0x10(%r14) 0x4af954 <mdb_page_alloc+676> mov %rcx,0x0(%rbp) Hardware underneath all of that is: 1) HP ProLiant BL460c Gen8, dual Xeon E5-2658 with attached storage blade D2200sb with SSD raid, 256GB RAM -- RedHat 6.3 tests 2) Intel server blade S2400BB, dual Xeon E5-2403, 48GB RAM -- Ubuntu 12.04 tests If anything more is required to assist you in troubleshooting, please let me know. Zeljko

1 0

(ITS#7714) Making slapd easier to jail (enhancement?)
by blance3459＠hotmail.com 01 Oct '13

01 Oct '13

Full_Name: Barry Lance Version: 2.4.35 OS: Linux (Debian 7) URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (70.226.37.254) Operating System: Debian 7.1.0 (Wheezy) 64-bit Openldap version: 2.4.35 Configure Options: --prefix=/usr/local --enable-shared --enable-static --enable-debug --enable-dynamic --enable-syslog --enable-local --enable-slapd --enable-spasswd --enable-modules --enable-backends=mod --disable-ndb --disable-passwd --disable-perl --disable-shell --disable-sock --disable-sql --enable-overlays=mod --with-threads --with-cyrus-sasl --with-tls=openssl config: slapd.conf from make install Jail directory: /var/chroot/openldap Files copied into jail: /etc/openldap -> <jaildir>/etc/openldap /etc/nsswitch.cond -> <jaildir>/etc /etc/pam.d -> <jaildir>/etc/pam.d /etc/passwd (or fragment of) -> <jaildir>/etc/passwd /etc/groups (or fragment of) -> <jaildir>/etc/groups /etc/shadow -> <jaildir>/etc/shadow all other libs referenced by ldd slapd into respective <jaildir>/dir /usr/local/libexec/openldap -> <jaildir>//usr/local/libexec/openldap /lib/x86_64-linux-gnu/libnss_* -> <jaildir>/lib/x86_64-linux-gnu commandline: /usr/local/libexec/slapd -d -1 -f /etc/openldap/slapd.conf -h "ldap:/// ldapi:///" -n slapd -r /var/chroot/openldap -u ldap -g ldap The behavior I have experienced is as follows: 1, Launch slapd without user (-u), group (-g), and jail dir (-r) options is successful. Slapd is running under the current user id (root). 2. Launch slapd with user and group parameters, but without a jail directory successful and the root privilege is dropped to the username given Takeaway - slapd is able to read passwd and groups outside jail. This is definitely expected. 3, Launch slapd with user, group and jail dir options, slapd fails with a message that no such user exists in passwd. 4. Launching slapd given a jail directory, but no user or group options succeeds with the daemon jailed in jail dir, butrunning as root (undesirable). Takeaway - chroot code works, but passwd/groups cannot be accessed after it (as seen in (3)). (4) is expected given the code in servers/slapd/main.c attempts to get the real uid/gid and drop root permission only if the -u and -g options are given on the command line. Comparing servers/slapd/main.c to the code for a few other daemons (ntpd, named, isc-dhcp), the jailing process follows the chdir/chroot process as expected. The difference in these other daemon is that, in all cases, the real uid/gid are retrieved before the chroot code. By doing this, nsswitch.conf, passwd, groups, etc are all still available outside the jail. Once the chroot is completed, they then drop root permission to the non privileged user given on the command line. The code in servers/slapd/main.c gets the real uid/gid AFTER the chroot. As such, the authentication infrastructure (nsswitch.conf, passwd, groups, etc) must be duplicated in the jail. In my opinion, this makes jailing slapd more difficult (inconvenient) than the other daemons mentioned. The jailing code in slapd may work, but I was unable to make it go as a non-root user. Didn't find very much useful information available via Google with respect to troubleshooting my chroot issue. To test a few theories, I added some scaffolding code in main.c and user.c to see where the process was going bad for me. As expected after looking at the source, the failure was happening in the slap_init_user function of user.c. More specifically, the call to getpwnam was returning NULL (failure) and causing the corresponding Debug statement to print the error message I am seeing when attempting to jail as a non-privileged user. No surprise there. Not being that familiar with the code in these two files, I am reluctant to modify too much for fear of introducing unintentional side effects. But in testing I found a few ways of working around the issue. Initially, I thought it might be easiest to move the call to slap_init_user before the chroot code. But then I realized that cannot work, because this function drops root permission before returning which will then cause the chroot code to fail. The first, and simplest, workaround I found was that by making an initial call to getpwnam and discarding the result before hitting the chroot code seemed to make the subsequent call in slap_init_user succeed. Wierd. I can only speculate two possibilities for that. The most reasonable is that the getpwnam call before the chroot code loads some shared library(ies) into memory that I'm missing in my jail allowing the later call after the chroot call to succeed. The second, and least reasonable, is that the man page for getpwnam states the returned pointer is to a static passwd struct which when initialized before the chroot code, is returned despite the later failure in slap_init_user. Like I said, least likely. I don't have the knowledge to test either theory and it just works despite the wasteful additional call to getpwnam. This workaround seems to have the least amount of potential side-effects elsewhere in the code. The second workaround involved moving the code blocks for the getpwnam and getgrnam calls out of slap_init_user and into main just prior to the the chroot code. I would also then have to drop the root permission just after the chroot code block based on got_uid/gid variable values. I didn't dig far enough into the code to know if there is anywhere else that the code drops root permission. If so, this may cause one of those unintentional side effects. This also worked, but seemed like bad coding practice. My third, and final, work around I think fits in the best with the spirit of the existing code. In this workaround I split up the slap_init_user code in user.c into four seperate functions: slap_init_user, slap_init_group, slap_set_user, slap_set_group. The slap_init_xxx functions use two variables added to main of type uid_t and gid_t that are passed in by reference. Each function performs the respective passwd/group calls in slap_init_user, store the result, if successful, into the byref parameter(s) and return a got_uid/gid value back as an integer return value indicating if the uid/gid parameters contain useful id's. The slapd_set_xxx functions perform the actual drop of root permission. Back in main, the slap_init_user/group functions get called prior to the chroot code and the corresponding slap_set_xxx calls are added in the relevant code block that follows the chroot call if the "got" variables indicate the uid/gid values are useful. This allows the uid/gid to be looked up prior to chroot eliminating the need for security info in the jail, root permission is still held to chroot, and the dropping of root authority sill occurs immediately after the chroot call. At the end of the day, I think the third workaround allow the chroot code for slapd to perform in the same way as the other daemons I mentioned creating a more predictable user experience. I certainly can't speak for everyone, but to me, a predictable user experience is more definitely more convenient. In the same light, I wonder if moving the creation of the pid and args files before the chroot adds value for the user when dealing with the daemon via an init script. The init script might be a bit more simple without having to worry about prepending a jail path to the pid file detected from the config when the daemon is being run as such. Not a big issue. If you think the code I used in my workarounds add value to the project, I will gladly send them along. But, honestly, it is so rudimentary that I doubt my copy offers you much value since I'm sure you could duplicate it in less than 30 minutes. Thanks, Barry Lance

1 0

Re: (ITS#7710) contextCSN values not updated by internal non-replicated ops
by marco.pizzoli＠gmail.com 01 Oct '13

01 Oct '13

--047d7bdc109e14c94f04e7b17008 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Michael, yes I was, but I'm quite sure I didn't have any memberOf-operation running in the meantime. Marco On Tue, Oct 1, 2013 at 7:36 PM, Michael Str=F6der <michael(a)stroeder.com>wro= te: > Marco Pizzoli wrote: > > Hi, this *could* be also the root cause of a problem I found some times > ago > > joking with a particular OL cluster scenario in which I failed to obtai= n > > all entries correctly populated. Please tell me if this could be relate= d. > > [..] > > All the members of the cluster were using the contrib/slapo-lastbind > > overlay. > > We're also using slapo-lastbind but deactivating does not make a differen= ce > when modifying group entries (tested today). > > Are you using slapo-memberof at all? > > Ciao, Michael. > --047d7bdc109e14c94f04e7b17008 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div><div>Hi Michael,<br></div>yes I was, but I'm quit= e sure I didn't have any memberOf-operation running in the meantime.<br= ><br></div>Marco<br></div><div class=3D"gmail_extra"><br><br><div class=3D"= gmail_quote"> On Tue, Oct 1, 2013 at 7:36 PM, Michael Str=F6der <span dir=3D"ltr"><<a = href=3D"mailto:michael@stroeder.com" target=3D"_blank">michael(a)stroeder.com= </a>></span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin= :0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div class=3D"im">Marco Pizzoli wrote:<br> > Hi, this *could* be also the root cause of a problem I found some time= s ago<br> > joking with a particular OL cluster scenario in which I failed to obta= in<br> > all entries correctly populated. Please tell me if this could be relat= ed.<br> </div>> [..]<br> <div class=3D"im">> All the members of the cluster were using the contri= b/slapo-lastbind<br> > overlay.<br> <br> </div>We're also using slapo-lastbind but deactivating does not make a = difference<br> when modifying group entries (tested today).<br> <br> Are you using slapo-memberof at all?<br> <br> Ciao, Michael.<br> </blockquote></div><br></div> --047d7bdc109e14c94f04e7b17008--

1 0

Re: (ITS#7710) contextCSN values not updated by internal non-replicated ops
by michael＠stroeder.com 01 Oct '13

01 Oct '13

Marco Pizzoli wrote: > Hi, this *could* be also the root cause of a problem I found some times ago > joking with a particular OL cluster scenario in which I failed to obtain > all entries correctly populated. Please tell me if this could be related. > [..] > All the members of the cluster were using the contrib/slapo-lastbind > overlay. We're also using slapo-lastbind but deactivating does not make a difference when modifying group entries (tested today). Are you using slapo-memberof at all? Ciao, Michael.

1 0

Re: (ITS#7710) contextCSN values not updated by internal non-replicated ops
by marco.pizzoli＠gmail.com 01 Oct '13

01 Oct '13

--20cf303ea4b89c8b7504e7abe440 Content-Type: text/plain; charset=ISO-8859-1 Hi, this *could* be also the root cause of a problem I found some times ago joking with a particular OL cluster scenario in which I failed to obtain all entries correctly populated. Please tell me if this could be related. Long story short: 4-way multimaster cluster --> cluster A 3-way multimaster cluster --> cluster B one of the members of cluster A has also configured, as provider, one of the member of cluster B. By modifying data on any of the members of cluster A I should be able to see the modification also on any member of cluster B. Correct? Well, this was failing sometimes on 1 or 2 members. I had load balancer health-checks continuously polling all of members from both A and B: - bind - search - unbind All the members of the cluster were using the contrib/slapo-lastbind overlay. So the internal authTimestamp attribute populated by an internal operation. Could it be that the contextCSN of one node of the cluster were newer than the one of the providers? I'm not too expert, just trying to be of help by sharing experiences. Thanks for reading Marco --20cf303ea4b89c8b7504e7abe440 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr">Hi, this *could* be also the root cause of a problem I fou= nd some times ago joking with a particular OL cluster scenario in which I f= ailed to obtain all entries correctly populated. Please tell me if this cou= ld be related.<div> <br></div><div>Long story short:</div><div>4-way multimaster cluster -->= cluster A</div><div>3-way multimaster cluster --> cluster B</div><div><= br></div><div>one of the members of cluster A has=A0also=A0configured, as p= rovider, one of the member of cluster B.</div> <div><br></div><div>By modifying data on any of the members of cluster A I = should be able to see the modification also on any member of cluster B. Cor= rect?</div><div>Well, this was failing sometimes on 1 or 2 members.<br> </div><div><br></div><div>I had load balancer health-checks continuously po= lling all of members from both A and B:</div><div>- bind</div><div>- search= </div><div>- unbind</div><div><br></div><div>All the members of the cluster= were using the contrib/slapo-lastbind overlay. So the internal authTimesta= mp attribute populated by an internal operation.</div> <div>Could it be that the contextCSN of one node of the cluster were newer = than the one of the providers?</div><div><br></div><div>I'm not too exp= ert, just trying to be of help by sharing experiences.</div><div>Thanks for= reading</div> <div>Marco</div></div> --20cf303ea4b89c8b7504e7abe440--

1 0

← Newer
1
2
3
4
5
6
7
8
9
10
Older →

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

openldap-bugs October 2013