Re: (ITS#7716) slapd crashes after search immediately followed by (abandon+) unbind
by michael@stroeder.com
michael.vishchers(a)7p-group.com wrote:
> Full_Name: Michael Vishchers
> Version: 2.4.23
> OS: Red Hat Enterprise Linux Server release 6.2
Did you check whether this problem still remains with the recent release 2.4.36?
(I vaguely remember issues like this being fixed after 2.4.23 but I'm not sure.)
Ciao, Michael.
9 years, 11 months
(ITS#7716) slapd crashes after search immediately followed by (abandon+) unbind
by michael.vishchers@7p-group.com
Full_Name: Michael Vishchers
Version: 2.4.23
OS: Red Hat Enterprise Linux Server release 6.2
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (178.15.66.50)
slapd, running as a proxy to rewrite incoming connections based on user dn for
later routing to different back ends, dies sporadically after receiving a
(network delayed) search request that is "immediately" followed by an (optional)
abandon request and an unbind request.
We suspect that the abandon or unbind code tries to clean up data structures
that belong to a not yet completely initialized search operation.
The problem can unfortunately not easily be reproduced. Last time we had to wait
at least two weeks before it appeared. It may be a timing problem between two or
more threads.
This is the stacktrace, core and other files could be provided if necessary.
Program terminated with signal 11, Segmentation fault.
#0 0x00007f299c3e71bc in ?? ()
#0 0x00007f299c3e71bc in ?? ()
No symbol table info available.
#1 0x00007f2999bbd983 in rwm_op_rollback (op=0x7f2984002190, rs=<value
optimized out>, ros=0x7f298c003570) at
../../../../servers/slapd/overlays/rwm.c:107
__PRETTY_FUNCTION__ = "rwm_op_rollback"
#2 0x00007f2999bbe988 in rwm_op_search (op=0x7f2984002190, rs=0x7f2995bafaa0)
at ../../../../servers/slapd/overlays/rwm.c:984
on = 0x7f299fb95210
rwmap = 0x7f299fb94f70
rc = <value optimized out>
dc = {rwmap = 0x7f2984002190, conn = 0x7f298c003468, ctx = 0x12 <Address
0x12 out of bounds>, rs = 0x7f299e7963b0}
fstr = {bv_len = 0, bv_val = 0x0}
f = 0x0
an = 0x0
text = <value optimized out>
roc = 0x7f298c003550
#3 0x00007f299e7fe02a in overlay_op_walk (op=0x7f2984002190, rs=0x7f2995bafaa0,
which=op_search, oi=0x7f299fb95030, on=0x7f299fb95210) at
../../../servers/slapd/backover.c:659
func = 0x7f299fb95268
rc = 32768
#4 0x00007f299e8d29a1 in slapi_op_func (op=0x7f2984002190, rs=0x7f2995bafaa0)
at ../../../../servers/slapd/slapi/slapi_overlay.c:647
pb = 0x7f298c1051b0
which = op_search
opinfo = <value optimized out>
rc = <value optimized out>
oi = <value optimized out>
on = <value optimized out>
cb = {sc_next = 0x7f2995bae7e0, sc_response = 0x7f299e8d1fc0
<slapi_over_response>, sc_cleanup = 0x7f299e8d1ed0 <slapi_over_cleanup>,
sc_private = 0x7f298c1051b0}
internal_op = 0
preop_type = <value optimized out>
postop_type = 503
be = 0x7f2995bae800
#5 0x00007f299e7fe02a in overlay_op_walk (op=0x7f2984002190, rs=0x7f2995bafaa0,
which=op_search, oi=0x7f299fb95030, on=0x7f299fb9e8c0) at
../../../servers/slapd/backover.c:659
func = 0x7f299fb9e918
rc = 32768
#6 0x00007f299e7feb6b in over_op_func (op=0x7f2984002190, rs=<value optimized
out>, which=<value optimized out>) at ../../../servers/slapd/backover.c:721
oi = <value optimized out>
on = <value optimized out>
be = 0x7f299fb940b0
db = {bd_info = 0x7f299fb95210, bd_self = 0x7f299fb940b0, be_ctrls =
"\000", '\001' <repeats 17 times>, '\000' <repeats 14 times>, "\001", be_flags =
257, be_restrictops = 0, be_requires = 5, be_ssf_set = {sss_ssf = 0,
sss_transport = 0, sss_tls = 0, sss_sasl = 0, sss_update_ssf = 0,
sss_update_transport = 0, sss_update_tls = 0, sss_update_sasl = 0,
sss_simple_bind = 0}, be_suffix = 0x7f299fb94ed0, be_nsuffix = 0x7f299fb94f00,
be_schemadn = {bv_len = 0, bv_val = 0x0}, be_schemandn = {bv_len = 0, bv_val =
0x0}, be_rootdn = {bv_len = 0, bv_val = 0x0}, be_rootndn = {bv_len = 0, bv_val =
0x0}, be_rootpw = {bv_len = 0, bv_val = 0x0}, be_max_deref_depth = 15,
be_def_limit = {lms_t_soft = 3600, lms_t_hard = 0, lms_s_soft = 500, lms_s_hard
= 0, lms_s_unchecked = -1, lms_s_pr = 0, lms_s_pr_hide = 0, lms_s_pr_total = 0},
be_limits = 0x0, be_acl = 0x0, be_dfltaccess = ACL_READ, be_update_ndn = {bv_len
= 0, bv_val = 0x0}, be_update_refs = 0x0, be_pending_csn_list = 0x7f299fc738f0,
be_pcl_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0,
__kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000'
<repeats 39 times>, __align = 0}, be_syncinfo = 0x0, be_pb = 0x7f299fb9eaa0,
be_cf_ocs = 0x7f299eb6da00, be_private = 0x7f299fb94240, be_next = {stqe_next =
0x0}}
cb = {sc_next = 0x0, sc_response = 0x7f299e7fdd40 <over_back_response>,
sc_cleanup = 0, sc_private = 0x7f299fb95030}
sc = <value optimized out>
rc = 32768
__PRETTY_FUNCTION__ = "over_op_func"
#7 0x00007f299e794999 in fe_op_search (op=0x7f2984002190, rs=0x7f2995bafaa0) at
../../../servers/slapd/search.c:366
bd = 0x7f299eb72760
#8 0x00007f299e795177 in do_search (op=0x7f2984002190, rs=<value optimized
out>) at ../../../servers/slapd/search.c:217
base = {bv_len = 55, bv_val = 0x7f298c11fef9
"vfsid=491722472236,ou=subscriber,ou=mmo,c=de,o=vodafone"}
siz = 0
off = 0
i = <value optimized out>
#9 0x00007f299e7920f9 in connection_operation (ctx=0x7f2995bafb70,
arg_v=0x7f2984002190) at ../../../servers/slapd/connection.c:1109
rc = 80
cancel = <value optimized out>
op = 0x7f2984002190
rs = {sr_type = REP_RESULT, sr_tag = 0, sr_msgid = 0, sr_err = 80,
sr_matched = 0x0, sr_text = 0x7f2999bc4122 "Rewrite error", sr_ref = 0x0,
sr_ctrls = 0x0, sr_un = {sru_search = {r_entry = 0x0, r_attr_flags = 0,
r_operational_attrs = 0x0, r_attrs = 0x0, r_nentries = 0, r_v2ref = 0x0},
sru_sasl = {r_sasldata = 0x0}, sru_extended = {r_rspoid = 0x0, r_rspdata =
0x0}}, sr_flags = 0}
tag = 99
opidx = SLAP_OP_SEARCH
conn = 0x7f2996db74d0
memctx = 0x7f298c002820
memctx_null = 0x0
memsiz = 1048576
__PRETTY_FUNCTION__ = "connection_operation"
#10 0x00007f299e892678 in ldap_int_thread_pool_wrapper (xpool=0x7f299fae9ae0) at
../../../libraries/libldap_r/tpool.c:685
pool = 0x7f299fae9ae0
task = 0x7f2988000a20
work_list = <value optimized out>
ctx = {ltu_id = 139816582448896, ltu_key = {{ltk_key = 0x7f299e790d50,
ltk_data = 0x7f298c002d40, ltk_free = 0x7f299e790e30 <conn_counter_destroy>},
{ltk_key = 0x7f299e7eaf70, ltk_data = 0x7f298c002820, ltk_free = 0x7f299e7eae50
<slap_sl_mem_destroy>}, {ltk_key = 0x7f299e7a6b70, ltk_data = 0x0, ltk_free =
0x7f299e7a6940 <slap_op_q_destroy>}, {ltk_key = 0x0, ltk_data = 0x0, ltk_free =
0} <repeats 29 times>}}
kctx = <value optimized out>
keyslot = <value optimized out>
hash = <value optimized out>
__PRETTY_FUNCTION__ = "ldap_int_thread_pool_wrapper"
#11 0x00007f299c91e7f1 in ?? ()
No symbol table info available.
#12 0x00007f2995bb0700 in ?? ()
No symbol table info available.
#13 0x0000000000000000 in ?? ()
No symbol table info available.
9 years, 11 months
Re: (ITS#7715) SIGBUS when mdb is configured with writemap
by hyc@symas.com
Željko Nejašmić wrote:
> Here you go http://hastebin.com/fukecejuje.tex
Interestingly enough, I got the same result as you on an initial compile/run
of slapd. Unfortunately, with optimization, the backtrace wasn't all that
useful. Recompiling back-mdb with just -g, no optimization, gets a different
result though - slapd is fine, and ldclt dies with a heap corruption or
double-free.
ldclt -h localhost -p 9011 -D cn=manager,dc=example,dc=com -w secret -b
ou=people,dc=example,dc=com -e
object=xx.txt,rdn='uid:[A=INCRNNOLOOP(200000;999999;6)]' -e add,commoncounter
-I 68
ldclt version 4.23
ldclt[10503]: Starting at Wed Oct 2 04:13:03 2013
*** glibc detected *** ldclt: double free or corruption (fasttop):
0x00007fa448003270 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x7eb96)[0x7fa463f8cb96]
/usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2(ldap_pvt_tls_set_option+0x1eb)[0x7fa46472e06b]
/usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2(ldap_int_tls_config+0x54)[0x7fa46472e234]
/usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2(+0x2b8b7)[0x7fa4647238b7]
/usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2(ldap_int_initialize+0x104)[0x7fa464723e84]
/usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2(ldap_create+0x29)[0x7fa4647074f9]
/usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2(ldap_initialize+0x2f)[0x7fa464707a7f]
ldclt(+0x664a)[0x7fa4656fb64a]
ldclt(+0x74a4)[0x7fa4656fc4a4]
ldclt(+0x9d48)[0x7fa4656fed48]
ldclt(threadMain+0x329)[0x7fa4657085d9]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7fa4642d4e9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fa464001cbd]
All subsequent runs give the same result. Still looking into this; getting
away from the Ubuntu bundled libldap will probably help.
>
>
> Zeljko
>
>
> On Wed, Oct 2, 2013 at 11:34 AM, <hyc(a)symas.com <mailto:hyc@symas.com>> wrote:
>
> nejasmicz(a)gmail.com <mailto:nejasmicz@gmail.com> wrote:
> > Full_Name: Željko Nejašmić
> > Version: latest git pull of OPENLDAP_REL_ENG_2_4
> > OS: RedHat 6.3
> > URL:
> > Submission from: (NULL) (213.147.123.33)
> >
>
> > Hardware underneath all of that is:
> > 1) HP ProLiant BL460c Gen8, dual Xeon E5-2658 with attached
> storage blade
> > D2200sb with SSD raid, 256GB RAM -- RedHat 6.3 tests
> > 2) Intel server blade S2400BB, dual Xeon E5-2403, 48GB RAM --
> Ubuntu 12.04
> > tests
> >
> > If anything more is required to assist you in troubleshooting, please
> let me
> > know.
>
> Can you also post the gdb output for "bt 5 full" ?
> >
> >
> > Zeljko
> >
> >
>
>
> --
> -- Howard Chu
> CTO, Symas Corp. http://www.symas.com
> Director, Highland Sun http://highlandsun.com/hyc/
> Chief Architect, OpenLDAP http://www.openldap.org/project/
>
>
>
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
9 years, 11 months
Re: (ITS#7715) SIGBUS when mdb is configured with writemap
by nejasmicz@gmail.com
--047d7b67749a03fb9b04e7c010b4
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Here you go http://hastebin.com/fukecejuje.tex
Zeljko
On Wed, Oct 2, 2013 at 11:34 AM, <hyc(a)symas.com> wrote:
> nejasmicz(a)gmail.com wrote:
> > Full_Name: =C5=BDeljko Neja=C5=A1mić
> > Version: latest git pull of OPENLDAP_REL_ENG_2_4
> > OS: RedHat 6.3
> > URL:
> > Submission from: (NULL) (213.147.123.33)
> >
>
> > Hardware underneath all of that is:
> > 1) HP ProLiant BL460c Gen8, dual Xeon E5-2658 with attached storag=
e
> blade
> > D2200sb with SSD raid, 256GB RAM -- RedHat 6.3 tests
> > 2) Intel server blade S2400BB, dual Xeon E5-2403, 48GB RAM --
> Ubuntu 12.04
> > tests
> >
> > If anything more is required to assist you in troubleshooting, please
> let me
> > know.
>
> Can you also post the gdb output for "bt 5 full" ?
> >
> >
> > Zeljko
> >
> >
>
>
> --
> -- Howard Chu
> CTO, Symas Corp. http://www.symas.com
> Director, Highland Sun http://highlandsun.com/hyc/
> Chief Architect, OpenLDAP http://www.openldap.org/project/
>
>
>
--047d7b67749a03fb9b04e7c010b4
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr"><div>Here you go=C2=A0<a href=3D"http://hastebin.com/fukec=
ejuje.tex">http://hastebin.com/fukecejuje.tex</a></div><div><br></div><div>=
<br></div><div>Zeljko</div></div><div class=3D"gmail_extra"><br><br><div cl=
ass=3D"gmail_quote">
On Wed, Oct 2, 2013 at 11:34 AM, <span dir=3D"ltr"><<a href=3D"mailto:h=
yc(a)symas.com" target=3D"_blank">hyc(a)symas.com</a>></span> wrote:<br><blo=
ckquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #c=
cc solid;padding-left:1ex">
<div class=3D"HOEnZb"><div class=3D"h5"><a href=3D"mailto:nejasmicz@gmail.c=
om">nejasmicz(a)gmail.com</a> wrote:<br>
> Full_Name: =C5=BDeljko Neja=C5=A1mi&#263;<br>
> Version: latest git pull of OPENLDAP_REL_ENG_2_4<br>
> OS: RedHat 6.3<br>
> URL:<br>
> Submission from: (NULL) (213.147.123.33)<br>
><br>
<br>
> Hardware underneath all of that is:<br>
> =C2=A0 =C2=A0 =C2=A01) HP ProLiant BL460c Gen8, dual Xeon E5-2658 with=
attached storage blade<br>
> D2200sb with SSD raid, 256GB RAM -- RedHat 6.3 tests<br>
> =C2=A0 =C2=A0 =C2=A02) Intel server blade S2400BB, dual Xeon E5-2403, =
48GB RAM -- Ubuntu 12.04<br>
> tests<br>
><br>
> If anything more is required to assist you in troubleshooting, please =
let me<br>
> know.<br>
<br>
Can you also post the gdb output for "bt 5 full" ?<br>
><br>
><br>
> Zeljko<br>
><br>
><br>
<br>
<br>
--<br>
=C2=A0 =C2=A0-- Howard Chu<br>
=C2=A0 =C2=A0CTO, Symas Corp. =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 <a href=3D=
"http://www.symas.com" target=3D"_blank">http://www.symas.com</a><br>
=C2=A0 =C2=A0Director, Highland Sun =C2=A0 =C2=A0 <a href=3D"http://highlan=
dsun.com/hyc/" target=3D"_blank">http://highlandsun.com/hyc/</a><br>
=C2=A0 =C2=A0Chief Architect, OpenLDAP =C2=A0<a href=3D"http://www.openldap=
.org/project/" target=3D"_blank">http://www.openldap.org/project/</a><br>
<br>
<br>
</div></div></blockquote></div><br></div>
--047d7b67749a03fb9b04e7c010b4--
9 years, 11 months
Re: (ITS#7715) SIGBUS when mdb is configured with writemap
by hyc@symas.com
nejasmicz(a)gmail.com wrote:
> Full_Name: eljko Nejamić
> Version: latest git pull of OPENLDAP_REL_ENG_2_4
> OS: RedHat 6.3
> URL:
> Submission from: (NULL) (213.147.123.33)
>
> Hardware underneath all of that is:
> 1) HP ProLiant BL460c Gen8, dual Xeon E5-2658 with attached storage blade
> D2200sb with SSD raid, 256GB RAM -- RedHat 6.3 tests
> 2) Intel server blade S2400BB, dual Xeon E5-2403, 48GB RAM -- Ubuntu 12.04
> tests
>
> If anything more is required to assist you in troubleshooting, please let me
> know.
Can you also post the gdb output for "bt 5 full" ?
>
>
> Zeljko
>
>
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
9 years, 11 months
(ITS#7715) SIGBUS when mdb is configured with writemap
by nejasmicz@gmail.com
Full_Name: eljko Nejamić
Version: latest git pull of OPENLDAP_REL_ENG_2_4
OS: RedHat 6.3
URL:
Submission from: (NULL) (213.147.123.33)
Using ldclt tool to stress test our OpenLDAP mirror sync setup I encountered a
SIGBUS. Do note that the same issue occurs on only one node too, without sync.
I've tested using the aforementioned tool and the same arguments on both Red Hat
6.3 (2.6.32-279.el6.x86_64) and Ubuntu Server 12.04 (Linux 3.2.0-54-generic
x86_64) with the exact outcome.
In both cases the OpenLDAP was compiled from sources
(origin/OPENLDAP_REL_ENG_2_4), configured with --disable-{hdb,bdb},
--prefix=/opt/openldap, --enable-local=yes and using mdb as a backend, tweaked
additionally with:
* nometasync
* writemap
Without the writemap tweak, SIGBUS isn't happening.
The command used was:
ldclt -h 172.17.101.150 -p 389 -D "cn=xxx,dc=xxx" -w "xxx" -b
"ds=USERS,o=STANDARD,dc=xxx" \
-e object=xxx.txt,rdn='uid:[A=INCRNNOLOOP(200000;999999;6)]' -e
add,commoncounter -I 68
...where the xxx.txt has the following content:
objectclass: xxxUser
The ldclt command uses 10 threads to do the add operation with the incrementing
uid parameter on the base dn: ds=USERS,o=STANDARD,dc=xxx.
Ulimits are:
ulimit -a
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 2066206
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 4096
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) unlimited
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
At first sight, gdb seems to point to mdb_page_alloc:
Starting program: /opt/openldap/libexec/slapd -h ldap:///\ ldapi:/// -F
/opt/openldap/etc/openldap/slapd.d -g openldap -u openldap -d 0
[Thread debugging using libthread_db enabled]
[New Thread 0x2aaaac764700 (LWP 20415)]
[Thread 0x2aaaac764700 (LWP 20415) exited]
[New Thread 0x2aaaac764700 (LWP 20416)]
[New Thread 0x2ab3ad168700 (LWP 20417)]
[New Thread 0x2ab3ad969700 (LWP 20418)]
[New Thread 0x2ab3ae16a700 (LWP 20419)]
[New Thread 0x2ab3ae96b700 (LWP 20420)]
[New Thread 0x2ab3b8800700 (LWP 20421)]
[New Thread 0x2ab3d4800700 (LWP 20422)]
Program received signal SIGBUS, Bus error.
[Switching to Thread 0x2ab3b8800700 (LWP 20421)]
mdb_page_alloc (mc=<value optimized out>, num=1, mp=0x2ab3b87fd8b8)
at ./../../../libraries/liblmdb/mdb.c:1759
warning: Source file is more recent than executable.
1759 np->mp_pgno = pgno;
And the backtrace is:
#0 mdb_page_alloc (mc=<value optimized out>, num=1, mp=0x2ab3b87fd8b8)
at ./../../../libraries/liblmdb/mdb.c:1759
#1 0x00000000004afb19 in mdb_page_touch (mc=0x2ab3bc1103f0)
at ./../../../libraries/liblmdb/mdb.c:1889
#2 0x00000000004b1c8c in mdb_cursor_touch (mc=0x2ab3bc1103f0)
at ./../../../libraries/liblmdb/mdb.c:5597
#3 0x00000000004b3a85 in mdb_cursor_put (mc=0x2ab3bc1103f0,
key=0x2ab3b87ff000,
data=0x2ab3b87feff0, flags=32) at
./../../../libraries/liblmdb/mdb.c:5727
#4 0x00000000004f8586 in mdb_idl_insert_keys (be=<value optimized out>,
cursor=0x2ab3bc1103f0,
keys=<value optimized out>, id=13) at idl.c:534
#5 0x00000000004f9116 in indexer (op=0x2ab3bc10dbd0, txn=<value optimized
out>,
ai=<value optimized out>, ad=0x88e0e0, atname=0x88dfb8,
vals=0x2ab3bc110120, id=13, opid=1,
mask=4) at index.c:219
#6 0x00000000004f95d1 in index_at_values (op=0x2ab3bc10dbd0,
txn=0x2ab3bc10e2f0,
ad=<value optimized out>, type=0x88df50, tags=0x88e100,
vals=0x2ab3bc110120, id=13, opid=1)
at index.c:337
#7 0x00000000004f9627 in mdb_index_values (op=<value optimized out>,
txn=<value optimized out>,
desc=<value optimized out>, vals=<value optimized out>, id=<value
optimized out>,
opid=<value optimized out>) at index.c:386
#8 0x00000000004f96f9 in mdb_index_entry (op=0x2ab3bc10dbd0,
txn=0x2ab3bc10e2f0, opid=1,
e=0x8c18d8) at index.c:558
#9 0x00000000004ed77e in mdb_add (op=0x2ab3bc10dbd0, rs=0x2ab3b87ff950) at
add.c:359
#10 0x0000000000487ac7 in overlay_op_walk (op=0x2ab3bc10dbd0,
rs=0x2ab3b87ff950, which=op_add,
oi=0x932280, on=0x0) at backover.c:671
#11 0x00000000004884a7 in over_op_func (op=0x2ab3bc10dbd0, rs=<value
optimized out>,
which=<value optimized out>) at backover.c:723
#12 0x00000000004281c0 in fe_op_add (op=0x2ab3bc10dbd0, rs=0x2ab3b87ff950)
at add.c:334
#13 0x0000000000428a16 in do_add (op=0x2ab3bc10dbd0, rs=0x2ab3b87ff950) at
add.c:194
#14 0x0000000000421259 in connection_operation (ctx=0x2ab3b87ffab0,
arg_v=0x2ab3bc10dbd0)
at connection.c:1155
#15 0x0000000000421a35 in connection_read_thread (ctx=0x2ab3b87ffab0,
argv=<value optimized out>)
at connection.c:1291
#16 0x0000000000516380 in ldap_int_thread_pool_wrapper (xpool=0x898160) at
tpool.c:688
#17 0x000000384cc07851 in start_thread () from /lib64/libpthread.so.0
#18 0x000000384c8e767d in clone () from /lib64/libc.so.6
For context, the assembly land around the offending pointer dereferencing looks
like:
0x4af949 <mdb_page_alloc+665> movslq %r13d,%rax
0x4af94c <mdb_page_alloc+668> lea (%rcx,%rax,1),%rax
0x4af950 <mdb_page_alloc+672> mov %rax,0x10(%r14)
0x4af954 <mdb_page_alloc+676> mov %rcx,0x0(%rbp)
Hardware underneath all of that is:
1) HP ProLiant BL460c Gen8, dual Xeon E5-2658 with attached storage blade
D2200sb with SSD raid, 256GB RAM -- RedHat 6.3 tests
2) Intel server blade S2400BB, dual Xeon E5-2403, 48GB RAM -- Ubuntu 12.04
tests
If anything more is required to assist you in troubleshooting, please let me
know.
Zeljko
9 years, 11 months
(ITS#7714) Making slapd easier to jail (enhancement?)
by blance3459@hotmail.com
Full_Name: Barry Lance
Version: 2.4.35
OS: Linux (Debian 7)
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (70.226.37.254)
Operating System: Debian 7.1.0 (Wheezy) 64-bit
Openldap version: 2.4.35
Configure Options: --prefix=/usr/local
--enable-shared
--enable-static
--enable-debug
--enable-dynamic
--enable-syslog
--enable-local
--enable-slapd
--enable-spasswd
--enable-modules
--enable-backends=mod
--disable-ndb
--disable-passwd
--disable-perl
--disable-shell
--disable-sock
--disable-sql
--enable-overlays=mod
--with-threads
--with-cyrus-sasl
--with-tls=openssl
config: slapd.conf from make install
Jail directory: /var/chroot/openldap
Files copied into jail:
/etc/openldap -> <jaildir>/etc/openldap
/etc/nsswitch.cond -> <jaildir>/etc
/etc/pam.d -> <jaildir>/etc/pam.d
/etc/passwd (or fragment of) -> <jaildir>/etc/passwd
/etc/groups (or fragment of) -> <jaildir>/etc/groups
/etc/shadow -> <jaildir>/etc/shadow
all other libs referenced by ldd slapd into respective <jaildir>/dir
/usr/local/libexec/openldap -> <jaildir>//usr/local/libexec/openldap
/lib/x86_64-linux-gnu/libnss_* -> <jaildir>/lib/x86_64-linux-gnu
commandline: /usr/local/libexec/slapd -d -1 -f /etc/openldap/slapd.conf -h
"ldap:/// ldapi:///" -n slapd -r /var/chroot/openldap -u ldap -g ldap
The behavior I have experienced is as follows:
1, Launch slapd without user (-u), group (-g), and jail dir (-r) options is
successful. Slapd is running under the current user id (root).
2. Launch slapd with user and group parameters, but without a jail directory
successful and the root privilege is dropped to the username given
Takeaway - slapd is able to read passwd and groups outside jail. This is
definitely expected.
3, Launch slapd with user, group and jail dir options, slapd fails with a
message that no such user exists in passwd.
4. Launching slapd given a jail directory, but no user or group options
succeeds with the daemon jailed in jail dir, butrunning as root (undesirable).
Takeaway - chroot code works, but passwd/groups cannot be accessed after it (as
seen in (3)). (4) is expected given the code in servers/slapd/main.c attempts
to get the real uid/gid and drop root permission only if the -u and -g options
are given on the command line.
Comparing servers/slapd/main.c to the code for a few other daemons (ntpd, named,
isc-dhcp), the jailing process follows the chdir/chroot process as expected.
The difference in these other daemon is that, in all cases, the real uid/gid are
retrieved before the chroot code. By doing this, nsswitch.conf, passwd, groups,
etc are all still available outside the jail. Once the chroot is completed,
they then drop root permission to the non privileged user given on the command
line.
The code in servers/slapd/main.c gets the real uid/gid AFTER the chroot. As
such, the authentication infrastructure (nsswitch.conf, passwd, groups, etc)
must be duplicated in the jail. In my opinion, this makes jailing slapd more
difficult (inconvenient) than the other daemons mentioned. The jailing code in
slapd may work, but I was unable to make it go as a non-root user. Didn't find
very much useful information available via Google with respect to
troubleshooting my chroot issue.
To test a few theories, I added some scaffolding code in main.c and user.c to
see where the process was going bad for me. As expected after looking at the
source, the failure was happening in the slap_init_user function of user.c.
More specifically, the call to getpwnam was returning NULL (failure) and causing
the corresponding Debug statement to print the error message I am seeing when
attempting to jail as a non-privileged user. No surprise there.
Not being that familiar with the code in these two files, I am reluctant to
modify too much for fear of introducing unintentional side effects. But in
testing I found a few ways of working around the issue.
Initially, I thought it might be easiest to move the call to slap_init_user
before the chroot code. But then I realized that cannot work, because this
function drops root permission before returning which will then cause the chroot
code to fail.
The first, and simplest, workaround I found was that by making an initial call
to getpwnam and discarding the result before hitting the chroot code seemed to
make the subsequent call in slap_init_user succeed. Wierd. I can only
speculate two possibilities for that. The most reasonable is that the getpwnam
call before the chroot code loads some shared library(ies) into memory that I'm
missing in my jail allowing the later call after the chroot call to succeed.
The second, and least reasonable, is that the man page for getpwnam states the
returned pointer is to a static passwd struct which when initialized before the
chroot code, is returned despite the later failure in slap_init_user. Like I
said, least likely. I don't have the knowledge to test either theory and it
just works despite the wasteful additional call to getpwnam. This workaround
seems to have the least amount of potential side-effects elsewhere in the code.
The second workaround involved moving the code blocks for the getpwnam and
getgrnam calls out of slap_init_user and into main just prior to the the chroot
code. I would also then have to drop the root permission just after the chroot
code block based on got_uid/gid variable values. I didn't dig far enough into
the code to know if there is anywhere else that the code drops root permission.
If so, this may cause one of those unintentional side effects. This also
worked, but seemed like bad coding practice.
My third, and final, work around I think fits in the best with the spirit of the
existing code. In this workaround I split up the slap_init_user code in user.c
into four seperate functions: slap_init_user, slap_init_group, slap_set_user,
slap_set_group. The slap_init_xxx functions use two variables added to main of
type uid_t and gid_t that are passed in by reference. Each function performs
the respective passwd/group calls in slap_init_user, store the result, if
successful, into the byref parameter(s) and return a got_uid/gid value back as
an integer return value indicating if the uid/gid parameters contain useful
id's. The slapd_set_xxx functions perform the actual drop of root permission.
Back in main, the slap_init_user/group functions get called prior to the chroot
code and the corresponding slap_set_xxx calls are added in the relevant code
block that follows the chroot call if the "got" variables indicate the uid/gid
values are useful. This allows the uid/gid to be looked up prior to chroot
eliminating the need for security info in the jail, root permission is still
held to chroot, and the dropping of root authority sill occurs immediately after
the chroot call.
At the end of the day, I think the third workaround allow the chroot code for
slapd to perform in the same way as the other daemons I mentioned creating a
more predictable user experience. I certainly can't speak for everyone, but to
me, a predictable user experience is more definitely more convenient. In the
same light, I wonder if moving the creation of the pid and args files before the
chroot adds value for the user when dealing with the daemon via an init script.
The init script might be a bit more simple without having to worry about
prepending a jail path to the pid file detected from the config when the daemon
is being run as such. Not a big issue.
If you think the code I used in my workarounds add value to the project, I will
gladly send them along. But, honestly, it is so rudimentary that I doubt my
copy offers you much value since I'm sure you could duplicate it in less than 30
minutes.
Thanks,
Barry Lance
9 years, 11 months
Re: (ITS#7710) contextCSN values not updated by internal non-replicated ops
by marco.pizzoli@gmail.com
--047d7bdc109e14c94f04e7b17008
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Hi Michael,
yes I was, but I'm quite sure I didn't have any memberOf-operation running
in the meantime.
Marco
On Tue, Oct 1, 2013 at 7:36 PM, Michael Str=F6der <michael(a)stroeder.com>wro=
te:
> Marco Pizzoli wrote:
> > Hi, this *could* be also the root cause of a problem I found some times
> ago
> > joking with a particular OL cluster scenario in which I failed to obtai=
n
> > all entries correctly populated. Please tell me if this could be relate=
d.
> > [..]
> > All the members of the cluster were using the contrib/slapo-lastbind
> > overlay.
>
> We're also using slapo-lastbind but deactivating does not make a differen=
ce
> when modifying group entries (tested today).
>
> Are you using slapo-memberof at all?
>
> Ciao, Michael.
>
--047d7bdc109e14c94f04e7b17008
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr"><div><div>Hi Michael,<br></div>yes I was, but I'm quit=
e sure I didn't have any memberOf-operation running in the meantime.<br=
><br></div>Marco<br></div><div class=3D"gmail_extra"><br><br><div class=3D"=
gmail_quote">
On Tue, Oct 1, 2013 at 7:36 PM, Michael Str=F6der <span dir=3D"ltr"><<a =
href=3D"mailto:michael@stroeder.com" target=3D"_blank">michael(a)stroeder.com=
</a>></span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin=
:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class=3D"im">Marco Pizzoli wrote:<br>
> Hi, this *could* be also the root cause of a problem I found some time=
s ago<br>
> joking with a particular OL cluster scenario in which I failed to obta=
in<br>
> all entries correctly populated. Please tell me if this could be relat=
ed.<br>
</div>> [..]<br>
<div class=3D"im">> All the members of the cluster were using the contri=
b/slapo-lastbind<br>
> overlay.<br>
<br>
</div>We're also using slapo-lastbind but deactivating does not make a =
difference<br>
when modifying group entries (tested today).<br>
<br>
Are you using slapo-memberof at all?<br>
<br>
Ciao, Michael.<br>
</blockquote></div><br></div>
--047d7bdc109e14c94f04e7b17008--
9 years, 11 months
Re: (ITS#7710) contextCSN values not updated by internal non-replicated ops
by michael@stroeder.com
Marco Pizzoli wrote:
> Hi, this *could* be also the root cause of a problem I found some times ago
> joking with a particular OL cluster scenario in which I failed to obtain
> all entries correctly populated. Please tell me if this could be related.
> [..]
> All the members of the cluster were using the contrib/slapo-lastbind
> overlay.
We're also using slapo-lastbind but deactivating does not make a difference
when modifying group entries (tested today).
Are you using slapo-memberof at all?
Ciao, Michael.
9 years, 11 months
Re: (ITS#7710) contextCSN values not updated by internal non-replicated ops
by marco.pizzoli@gmail.com
--20cf303ea4b89c8b7504e7abe440
Content-Type: text/plain; charset=ISO-8859-1
Hi, this *could* be also the root cause of a problem I found some times ago
joking with a particular OL cluster scenario in which I failed to obtain
all entries correctly populated. Please tell me if this could be related.
Long story short:
4-way multimaster cluster --> cluster A
3-way multimaster cluster --> cluster B
one of the members of cluster A has also configured, as provider, one of
the member of cluster B.
By modifying data on any of the members of cluster A I should be able to
see the modification also on any member of cluster B. Correct?
Well, this was failing sometimes on 1 or 2 members.
I had load balancer health-checks continuously polling all of members from
both A and B:
- bind
- search
- unbind
All the members of the cluster were using the contrib/slapo-lastbind
overlay. So the internal authTimestamp attribute populated by an internal
operation.
Could it be that the contextCSN of one node of the cluster were newer than
the one of the providers?
I'm not too expert, just trying to be of help by sharing experiences.
Thanks for reading
Marco
--20cf303ea4b89c8b7504e7abe440
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr">Hi, this *could* be also the root cause of a problem I fou=
nd some times ago joking with a particular OL cluster scenario in which I f=
ailed to obtain all entries correctly populated. Please tell me if this cou=
ld be related.<div>
<br></div><div>Long story short:</div><div>4-way multimaster cluster -->=
cluster A</div><div>3-way multimaster cluster --> cluster B</div><div><=
br></div><div>one of the members of cluster A has=A0also=A0configured, as p=
rovider, one of the member of cluster B.</div>
<div><br></div><div>By modifying data on any of the members of cluster A I =
should be able to see the modification also on any member of cluster B. Cor=
rect?</div><div>Well, this was failing sometimes on 1 or 2 members.<br>
</div><div><br></div><div>I had load balancer health-checks continuously po=
lling all of members from both A and B:</div><div>- bind</div><div>- search=
</div><div>- unbind</div><div><br></div><div>All the members of the cluster=
were using the contrib/slapo-lastbind overlay. So the internal authTimesta=
mp attribute populated by an internal operation.</div>
<div>Could it be that the contextCSN of one node of the cluster were newer =
than the one of the providers?</div><div><br></div><div>I'm not too exp=
ert, just trying to be of help by sharing experiences.</div><div>Thanks for=
reading</div>
<div>Marco</div></div>
--20cf303ea4b89c8b7504e7abe440--
9 years, 11 months