Pierangelo,
Many many thanks
I’ll provide what
requested asap
In the meantime we
thought a while over it and just liked to provide one more clue: this once-in-a-while
problem happens under two conditions: ldap overwhelmed by parallel ssh requests
plus a single ssh long-term transaction
hanging. What we mean is: when a user is requested for a new token to give, or
a new pin, he has to wait until the hard-token-key produces a new software-token
number, and this might happen even a minute later (even more for a
distracted lazy one). Meanwhile on the server side a opendlap sasl_server-step loop is hanging for the
input to arrive and its last round trip might last (hang) one minute or more.
We wonder whether a sort
of ldap garbage-collector thread-service or system garbage-collector system-process
exist and is triggered in order to release mallocated blocks of storage not
referenced by a certain while, and, let’s say, it is triggered only when memory
thresholds got reached. This would explain why connection.c goes to free the DN
block and ends up with a segfault and why this happens once in a while.
Ok please forget my
philosophical meditation.
I’ll provide what is
requested. Besides, what openldap version would you suggest us?
Many Thanks
Francesco Grossi
Da: Pierangelo
Masarati [mailto:ando@sys-net.it]
Inviato: 04 February 2009 17:16
A: Francesco Grossi ITQL
Cc:
openldap-technical@openldap.org
Oggetto: Re: connection.c segfault
with new sasl mechanism
Could you send:
- a stack backtrace of a crash (please keep the binaries and the core, so that
we can ask you to print some of the values, if needed)
- the output obtained by running slapd under valgrind --tool=memcheck
--trace-children=yes --leak-check=full
- in any case, apart from some indications about where to look, you won't get
much help if you work with an outdated version of an obsolete version of
OpenLDAP.
p.
----- "Francesco Grossi ITQL" <f.grossi@itql.it> wrote:
>
>
Dear team
We hope you can help us
in discovering where or how we are wrong.
We are providing a new
sasl mechanism for third party strong authentication and we are in the final
delivery step.
We are current with
openldap 2.3.38.
We used strong
authentication server to carry out the authorization part of the mech and our
backside
(sasl client) is
Openldap 2.3.38.
On the client side we
have pam-ldap+ssh.
sasl libs are cyrus 2.1.22
os is linux red-hat
cent-os 4.7 final
During our functional
and stress tests everything works fine but once in a while when openldap
crashes
and this happens only,
but not always, under a certain condition:
Before explaining such a
condition we would like to establish the meaning of some terms:
1) straight
authentication session: a simple (not interactive) authentication, being either
successfull or not
(right or wrong
passcode)
2) interactive
authentication session: a complex (interactive) authentication with iterated
tcp/ip requests such as
next-token or new-pin
Crash scenario:
one ore more straight
authentication scripts are running in parallel, emulating parallel ssh
authentication sessions
with either right or
wrong passcode;
one interactive
authentication session:
in the error-originating
session the whole user back-and-forth got his way to the authentication server,
that is to say next-token or new-pin procedure comes to a complete end:
user gives its next
token or new pin (twice) and the strong authentication is fully accomplished
and reported by the authentication manager.
What is wrong is that
(by average 2 attempts out of 3) make openldap crash.
Sasl speaking our
server-step mechanism closes fine and we expect its dispose step to be called
by openldap (by the sasl_server_end)
That is not going to
happen being Openldap crashed meanwhile.
Again this happens only
but not always when the stress scripts are running.
Does it smell of thread
concurrency and safety?
Our different use cases
tests run almost always successfully when done individually or in parallel but
manually (real people operating).
Conversely when stress
test are going on the problem arise very often but not always.
We had the Opeldap crash
both on linux and solaris installations though on linux the crash happens some
steps earlier than with solaris.
On linux we have had the
opportunity to rebuild openldap with some debug instruction inside.
what
came out is
connection.c
crashing
at:
op->o_tmpfree(
op->orb_edn.bv_val, op->o_tmpmemctx );
in both successful and
error cases either the values passed and the function address look alike.
op->orb_edn.bv_val
shows the distinguished name of the user just logged in (op->orb_edn.bv_len
is correctly set with its length).
Based on the function
name we guess ldap is about to free the DN allocated storage.
On Solaris9 we had the
segfault symptom but we have some more line on the log, one of which shows a
almost empty DN (!!!) made just of uid=””
GDB has the following
from the linux core dump:
gdb -c core.5785
GNU gdb Red Hat Linux
(6.3.0.0-1.159.el4rh) Copyright 2004 Free Software Foundation, Inc.
GDB is free software,
covered by the GNU General Public License, and you are welcome to change it
and/or distribute copies of it under certain conditions.
Type "show
copying" to see the conditions.
There is absolutely no
warranty for GDB. Type "show warranty" for details.
This GDB was configured
as "i386-redhat-linux-gnu".
Core was generated by
`/usr/sbin/slapd -h ldap:///'.
Program terminated with
signal 11, Segmentation fault.
#0 0x00268934 in
?? ()
THE FOLLOWING CODE IS AN
EXCERPT OF THE FINAL SERVER_MECH_STEP FUNCTION OF THE MECHANISM WHICH RUN
BEHIND THE SASL_SERVER_STEP ONLY WHEN CRASH HAPPENS:
static int
mymech_server_mech_step(void *conn_context __attribute__((unused)),
sasl_server_params_t *params,
const char *clientin,
unsigned clientinlen,
const char **serverout,
unsigned *serveroutlen,
sasl_out_params_t *oparams)
{
server_context_t *text =
(server_context_t *) conn_context;
*serverout = NULL;
*serveroutlen = 0;
oparams->doneflag = 1;
oparams->mech_ssf = 0;
oparams->maxoutbuf = 0;
oparams->encode_context = NULL;
oparams->encode = NULL;
oparams->decode_context = NULL;
oparams->decode = NULL;
oparams->param_version = 0;
oparams->authid = text->authid;
return SASL_OK;
}
SLAPD.LOG
OF THE CRASH CASE (WITH OUR ADDED SYSLOGS):
>
>
Feb 4 12:09:15
yavcentos slapd: SASL [conn=48] Debug MMYMECH====================== SERVER
mymech_server_mech_step END ==
Feb 4 12:09:15
yavcentos mymech: sasl.c label slap_sasl_authorize
Feb 4 12:09:15
yavcentos slapd: SASL proxy authorize [conn=48]: authcid="alfredo@yavcentos"
authzid="alfredo@yavcentos"
Feb 4 12:09:15
yavcentos slapd: conn=48 op=4 BIND authcid="alfredo@yavcentos"
authzid="alfredo@yavcentos"
Feb 4 12:09:15
yavcentos slapd: SASL Authorize [conn=48]: proxy authorization allowed
authzDN=""
Feb 4 12:09:15
yavcentos mymech: sasl.c label1
Feb 4 12:09:15
yavcentos mymech: result.c label2
Feb 4 12:09:15
yavcentos slapd: send_ldap_sasl: err=0 len=-1
Feb 4 12:09:15
yavcentos mymech: result.c label6a
Feb 4 12:09:15
yavcentos mymech: result.c label61
Feb 4 12:09:15
yavcentos mymech: result.c label61a (tra qui)
Feb 4 12:09:15
yavcentos mymech: result.c label61b
Feb 4 12:09:15
yavcentos mymech: result.c label61b2
op->o_callback->sc_response=0x805e1fc op=0x8926cb0 rs=0x17551a0
Feb 4 12:09:15
yavcentos mymech: connection.c label1
Feb 4 12:09:15
yavcentos mymech: connection.c label1a
Feb 4 12:09:15
yavcentos mymech: connection.c label11a
Feb 4 12:09:15
yavcentos mymech: connection.c label111a
Feb 4 12:09:15
yavcentos mymech: connection.c label12a
Feb 4 12:09:15
yavcentos mymech: connection.c label12a op->o_tmpfree=0x809c3e8
op->orb_edn.bv_val=uid=alfredo@yavcentos,ou=
people,dc=my-domain,dc=com
op->orb_edn.bv_len=51 op->o_tmpmemctx=0x8926c10 (slab malloc context)
END
SLAPD.LOG
OF A SUCCESSFUL LOGIN CASE (WITH OUR ADDED
SYSLOGS):
Feb 4 12:13:13
yavcentos mymech: sasl.c label slap_sasl_authorize
Feb 4 12:13:13
yavcentos slapd: SASL proxy authorize [conn=3]:
authcid="alfredo@yavcentos" authzid="alfredo@yavcentos"
Feb 4 12:13:14
yavcentos slapd: conn=3 op=4 BIND authcid="alfredo@yavcentos"
authzid="alfredo@yavcentos"
Feb 4 12:13:14
yavcentos slapd: SASL Authorize [conn=3]: proxy authorization allowed
authzDN=""
Feb 4 12:13:14
yavcentos mymech: sasl.c label1
Feb 4 12:13:14
yavcentos mymech: result.c label2
Feb 4 12:13:14
yavcentos slapd: send_ldap_sasl: err=0 len=-1
Feb 4 12:13:14
yavcentos mymech: result.c label6a
Feb 4 12:13:14
yavcentos mymech: result.c label61
Feb 4 12:13:14
yavcentos mymech: result.c label61a (tra qui)
Feb 4 12:13:14
yavcentos mymech: result.c label61b
Feb 4 12:13:14
yavcentos mymech: result.c label61b2
op->o_callback->sc_response=0x805e1fc op=0x86667b8 rs=0x51271a0
Feb 4 12:13:14
yavcentos mymech: connection.c label1
Feb 4 12:13:14
yavcentos mymech: connection.c label1a
Feb 4 12:13:14
yavcentos mymech: connection.c label11a
Feb 4 12:13:14
yavcentos mymech: connection.c label111a
Feb 4 12:13:14
yavcentos mymech: connection.c label12a
Feb 4 12:13:14
yavcentos mymech: connection.c label12a op->o_tmpfree=0x809c3e8
op->orb_edn.bv_val=uid=alfredo@yavcentos,ou=
people,dc=my-domain,dc=com
op->orb_edn.bv_len=51 op->o_tmpmemctx=0x86660c0 (slab malloc context)
Feb 4 12:13:14
yavcentos mymech: connection.c label13a
Feb 4 12:13:14
yavcentos mymech: connection.c label14a
Feb 4 12:13:14
yavcentos mymech: connection.c label15a
Feb 4 12:13:14
yavcentos mymech: connection.c label151a
Feb 4 12:13:14
yavcentos mymech: connection.c label151b
Feb 4 12:13:14
yavcentos mymech: connection.c label2
Feb 4 12:13:14
yavcentos slapd: conn=3 op=4 BIND
dn="uid=alfredo@yavcentos,ou=people,dc=my-domain,dc=com" mech=MYMECH
sasl_
ssf=0 ssf=0
Feb 4 12:13:14
yavcentos slapd: do_bind: SASL/MYMECH bind: dn="uid=alfredo@yavcentos,ou=people,dc=my-domain,dc=com"
sasl_ss
f=0
Feb 4 12:13:14
yavcentos mymech: connection.c label3
Feb 4 12:13:14
yavcentos mymech: result.c label61c
Feb 4 12:13:14
yavcentos mymech: result.c label61d
Feb 4 12:13:14
yavcentos mymech: result.c label61d2
Feb 4 12:13:14
yavcentos mymech: result.c label61e
Feb 4 12:13:14
yavcentos slapd: send_ldap_response: msgid=5 tag=97 err=0
Feb 4 12:13:14
yavcentos mymech: result.c label1
Feb 4 12:13:14
yavcentos mymech: result.c label11
Feb 4 12:13:14
yavcentos mymech: result.c label12
Feb 4 12:13:14
yavcentos mymech: result.c label13
Feb 4 12:13:14
yavcentos mymech: result.c label14
Feb 4 12:13:14
yavcentos mymech: result.c label14e
Feb 4 12:13:14
yavcentos mymech: result.c label14f
Feb 4 12:13:14
yavcentos mymech: result.c label14g
Feb 4 12:13:14
yavcentos mymech: result.c label14h
Feb 4 12:13:14
yavcentos mymech: result.c label15
Feb 4 12:13:14
yavcentos mymech: result.c label16
Feb 4 12:13:14
yavcentos mymech: result.c label18
Feb 4 12:13:14
yavcentos mymech: result.c label19
Feb 4 12:13:14
yavcentos mymech: result.c label20
Feb 4 12:13:14
yavcentos mymech: result.c label21
Feb 4 12:13:14
yavcentos mymech: result.c label6b e qui2
Feb 4 12:13:14
yavcentos mymech: daemon: activity on 1 descriptor
Feb 4 12:13:15
yavcentos slapd: daemon: activity on:
Feb 4 12:13:15
yavcentos slapd: 13r
Feb 4 12:13:15
yavcentos slapd:
Feb 4 12:13:15
yavcentos slapd: daemon: read active on 13
Feb 4 12:13:15
yavcentos slapd: connection_get(13)
Feb 4 12:13:15
yavcentos slapd: connection_get(13): got connid=3
Feb 4 12:13:15
yavcentos slapd: connection_read(13): checking for input on id=3
Feb 4 12:13:15
yavcentos slapd: daemon: epoll: listen=7 active_threads=0 tvp=NULL
Feb 4 12:13:15
yavcentos slapd: daemon: epoll: listen=8 active_threads=0 tvp=NULL
Feb 4 12:13:15
yavcentos slapd: conn=3 op=4 RESULT tag=97 err=0 text=
Feb 4 12:13:15
yavcentos mymech: sasl.c label4
Feb 4 12:13:15
yavcentos slapd: <== slap_sasl_bind: rc=0
Feb 4 12:13:15
yavcentos mymech: bind.c do_bind label1
Feb 4 12:13:15
yavcentos slapd: do_bind
Feb 4 12:13:15
yavcentos slapd: conn=3 op=5 BIND anonymous mech=implicit ssf=0
Feb 4 12:13:15
yavcentos slapd: >>> dnPrettyNormal: <>
Feb 4 12:13:15
yavcentos slapd: <<< dnPrettyNormal: <>, <>
Feb 4 12:13:15
yavcentos slapd: do_bind: version=3 dn="" method=128
Feb 4 12:13:15
yavcentos slapd: conn=3 op=5 BIND dn="" method=128
Feb 4 12:13:15
yavcentos mymech: sasl.c label slap_sasl_reset
Feb 4 12:13:15
yavcentos slapd: send_ldap_result: conn=3 op=5 p=3
Feb 4 12:13:15
yavcentos slapd: send_ldap_result: err=0 matched="" text=""
Feb 4 12:13:15
yavcentos mymech: result.c label22a
Feb 4 12:13:15
yavcentos mymech: result.c label61
Feb 4 12:13:15
yavcentos mymech: result.c label61a (tra qui)
Feb 4 12:13:15
yavcentos mymech: result.c label61b
Feb 4 12:13:15
yavcentos mymech: result.c label61b2
op->o_callback->sc_response=0x805e1fc op=0x8668158 rs=0x2afc1a0
Feb 4 12:13:15
yavcentos mymech: connection.c label1
Feb 4 12:13:15
yavcentos mymech: connection.c label3
Feb 4 12:13:15
yavcentos mymech: result.c label61c
Feb 4 12:13:15
yavcentos mymech: result.c label61d
Feb 4 12:13:15
yavcentos mymech: result.c label61d2
Feb 4 12:13:15
yavcentos mymech: result.c label61e
Feb 4 12:13:15
yavcentos slapd: send_ldap_response: msgid=6 tag=97 err=0
Feb 4 12:13:15
yavcentos mymech: result.c label1
Feb 4 12:13:15
yavcentos mymech: result.c label11
Feb 4 12:13:15
yavcentos mymech: result.c label12
Feb 4 12:13:15
yavcentos mymech: result.c label13
Feb 4 12:13:15
yavcentos mymech: result.c label14
Feb 4 12:13:15
yavcentos mymech: result.c label14e
Feb 4 12:13:15
yavcentos mymech: result.c label14f
Feb 4 12:13:15
yavcentos mymech: result.c label14g
Feb 4 12:13:15
yavcentos mymech: result.c label14h
Feb 4 12:13:15
yavcentos mymech: result.c label15
Feb 4 12:13:15
yavcentos mymech: result.c label16
Feb 4 12:13:15
yavcentos mymech: result.c label18
Feb 4 12:13:15
yavcentos mymech: result.c label19
Feb 4 12:13:16
yavcentos mymech: result.c label20
Feb 4 12:13:16
yavcentos mymech: result.c label21
Feb 4 12:13:16
yavcentos mymech: result.c label22b
Feb 4 12:13:16
yavcentos mymech: result.c label5 e qui1
Feb 4 12:13:16
yavcentos slapd: conn=3 op=5 RESULT tag=97 err=0 text=
Feb 4 12:13:16
yavcentos slapd: do_bind: v3 anonymous bind
Feb 4 12:13:15
yavcentos mymech: daemon: activity on 1 descriptor
Feb 4 12:13:16
yavcentos slapd: daemon: activity on:
Feb 4 12:13:16
yavcentos slapd: 13r
Feb 4 12:13:16
yavcentos slapd:
Feb 4 12:13:16
yavcentos slapd: daemon: read active on 13
Feb 4 12:13:16
yavcentos slapd: connection_get(13)
Feb 4 12:13:16
yavcentos slapd: connection_get(13): got connid=3
Feb 4 12:13:16
yavcentos slapd: connection_read(13): checking for input on id=3
Feb 4 12:13:16
yavcentos slapd: ber_get_next on fd 13 failed errno=0 (Success)
Feb 4 12:13:16
yavcentos slapd: connection_read(13): input error=-2 id=3, closing.
Feb 4 12:13:16
yavcentos slapd: connection_closing: readying conn=3 sd=13 for close
Feb 4 12:13:16
yavcentos slapd: connection_close: conn=3 sd=-1
Feb 4 12:13:16
yavcentos mymech: sasl.c label slap_sasl_close
Feb 4 12:13:16
yavcentos mymech: sasl.c label slap_sasl_log
Feb 4 12:13:16
yavcentos slapd: SASL [conn=-1]
Debug: MMYMECHSERVER mymech_server_mech_dispose
Feb 4 12:13:16
yavcentos mymech: sasl.c label slap_sasl_log
Feb 4 12:13:16
yavcentos slapd: SASL [conn=-1] Error: MMYMECHSERVER
mymech_server_mech_dispose: start - receives *conn_co
ntext=0x8666a10
*utils=0x8666188
Feb 4 12:13:16
yavcentos mymech: sasl.c label slap_sasl_log
Feb 4 12:13:17
yavcentos slapd: SASL [conn=-1] Error: MMYMECHSERVER
mymech_server_mech_dispose
Feb 4 12:13:17
yavcentos slapd: daemon: removing 13
Feb 4 12:13:17
yavcentos slapd: conn=3 fd=13 closed (connection lost)
Feb 4 12:13:17
yavcentos slapd: daemon: epoll: listen=7 active_threads=0 tvp=NULL
Feb 4 12:13:17
yavcentos slapd: daemon: epoll: listen=8 active_threads=0 tvp=NULL
Feb 4 12:13:17
yavcentos slapd: daemon: activity on 1 descriptor
Feb 4 12:13:17
yavcentos slapd: daemon: activity on:
Feb 4 12:13:17
yavcentos slapd:
Feb 4 12:13:17
yavcentos slapd: daemon: epoll: listen=7 active_threads=0 tvp=NULL
Feb 4 12:13:17
yavcentos slapd: daemon: epoll: listen=8 active_threads=0 tvp=NULL
END
Compile and link options
of mymech:
gcc –I ..... -v -Wall -W
-O2 -MT mymech.lo -MD -MP -MF .deps/mymech.Tpo -c mymech.c -fPIC -DPIC -o
mymech.lo
gcc -shared
mymech.lo mymech _init.lo plugin_common.lo -lpthread -lcrypt -lresolv -lc
-Wl,-soname -Wl,lib mymech.so.2 -o .libs/lib mymech.so.2.0.22
Would you have a clue or
a though for us about some further check to do over either our code or compile
options,
(maybe particularly on
the thread-safe topic)?
Would you suggest some
thread safe programming strategy?
Many many thanks
Francesco Grossi
Ing.
Pierangelo Masarati
OpenLDAP Core Team
SysNet
s.r.l.
Via Dossi, 8 - 27100 Pavia - ITALIA
|
|
Office: |
+39 02 23998309 |
Mobile: |
+39 333 4963172 |
Fax: |
+39 0382 476497 |
Email: |
|
|