Pierangelo,

 

Many many thanks

I’ll provide what requested asap

 

In the meantime we thought a while over it and just liked to provide one more clue: this once-in-a-while problem happens under two conditions: ldap overwhelmed by parallel ssh requests plus a single ssh long-term transaction hanging. What we mean is: when a user is requested for a new token to give, or a new pin, he has to wait until the hard-token-key produces a new software-token number, and this might happen even a minute later (even more for a distracted lazy one). Meanwhile on the server side a opendlap sasl_server-step loop is hanging for the input to arrive and its last round trip might last (hang) one minute or more.

 

We wonder whether a sort of ldap garbage-collector thread-service or system garbage-collector system-process exist and is triggered in order to release mallocated blocks of storage not referenced by a certain while, and, let’s say, it is triggered only when memory thresholds got reached. This would explain why connection.c goes to free the DN block and ends up with a segfault and why this happens once in a while.

 

Ok please forget my philosophical meditation.

 

I’ll provide what is requested. Besides, what openldap version would you suggest us?

 

Many Thanks

Francesco Grossi

 

 

 


Da: Pierangelo Masarati [mailto:ando@sys-net.it]
Inviato: 04 February 2009 17:16
A: Francesco Grossi ITQL
Cc: openldap-technical@openldap.org
Oggetto: Re: connection.c segfault with new sasl mechanism

 

Could you send:

- a stack backtrace of a crash (please keep the binaries and the core, so that we can ask you to print some of the values, if needed)

- the output obtained by running slapd under valgrind --tool=memcheck --trace-children=yes --leak-check=full

- in any case, apart from some indications about where to look, you won't get much help if you work with an outdated version of an obsolete version of OpenLDAP.

p.

----- "Francesco Grossi ITQL" <f.grossi@itql.it> wrote:
>

>

Dear team

 

We hope you can help us in discovering  where or how we are wrong.

 

We are providing a new sasl mechanism for third party strong authentication and we are in the final delivery step.

We are current with openldap 2.3.38.

We used strong authentication server to carry out the authorization part of the mech and our backside

(sasl client) is Openldap 2.3.38.

On the client side we have pam-ldap+ssh.

sasl libs are cyrus 2.1.22

os is linux red-hat cent-os 4.7 final

 

During our functional and stress tests everything works fine but once in a while when openldap crashes

and this happens only, but not always, under a certain condition:

 

Before explaining such a condition we would like to establish the meaning of some terms:

1) straight authentication session: a simple (not interactive) authentication, being either successfull or not

(right or wrong passcode)

2) interactive authentication session: a complex (interactive) authentication with iterated tcp/ip requests such as

next-token or new-pin

 

Crash scenario:

one ore more straight authentication scripts are running in parallel, emulating parallel ssh authentication sessions

with either right or wrong passcode;

one interactive authentication session:

in the error-originating session the whole user back-and-forth got his way to the authentication server, that is to say next-token or new-pin procedure comes to a complete end:

user gives its next token or new pin (twice) and the strong authentication is fully accomplished and reported by the authentication manager.

 

What is wrong is that (by average 2 attempts out of 3) make openldap crash.

Sasl speaking our server-step mechanism closes fine and we expect its dispose step to be called by openldap (by the sasl_server_end)

That is not going to happen being Openldap crashed meanwhile.

 

Again this happens only but not always when the stress scripts are running.

Does it smell of thread concurrency and safety?

Our different use cases tests run almost always successfully when done individually or in parallel but manually (real people operating).

Conversely when stress test are going on the problem arise very often but not always.

 

We had the Opeldap crash both on linux and solaris installations though on linux the crash happens some steps earlier than with solaris.

On linux we have had the opportunity to rebuild openldap with some debug instruction inside.

what came out is

connection.c

crashing at:

op->o_tmpfree( op->orb_edn.bv_val, op->o_tmpmemctx );

in both successful and error cases either the values passed and the function address look alike.

op->orb_edn.bv_val shows the distinguished name of the user just logged in (op->orb_edn.bv_len is correctly set with its length).

Based on the function name we guess ldap is about to free the DN allocated storage.

 

 

On Solaris9 we had the segfault symptom but we have some more line on the log, one of which shows a almost empty DN (!!!) made just of uid=””

 

GDB has the following from the linux core dump:

gdb -c core.5785

GNU gdb Red Hat Linux (6.3.0.0-1.159.el4rh) Copyright 2004 Free Software Foundation, Inc.

GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions.

Type "show copying" to see the conditions.

There is absolutely no warranty for GDB.  Type "show warranty" for details.

This GDB was configured as "i386-redhat-linux-gnu".

Core was generated by `/usr/sbin/slapd -h ldap:///'.

Program terminated with signal 11, Segmentation fault.

#0  0x00268934 in ?? ()

 

 

THE FOLLOWING CODE IS AN EXCERPT OF THE FINAL SERVER_MECH_STEP FUNCTION OF THE MECHANISM WHICH RUN BEHIND THE SASL_SERVER_STEP ONLY WHEN CRASH HAPPENS:

 

static int mymech_server_mech_step(void *conn_context __attribute__((unused)),

                                                 sasl_server_params_t *params,

                                                 const char *clientin,

                                                 unsigned clientinlen,

                                                 const char **serverout,

                                                 unsigned *serveroutlen,

                                                 sasl_out_params_t *oparams)

{

            server_context_t *text = (server_context_t *) conn_context;

            *serverout = NULL;

            *serveroutlen = 0;

 

            oparams->doneflag = 1;

            oparams->mech_ssf = 0;

            oparams->maxoutbuf = 0;

            oparams->encode_context = NULL;

            oparams->encode = NULL;

            oparams->decode_context = NULL;

            oparams->decode = NULL;

            oparams->param_version = 0;

            oparams->authid = text->authid;

            return SASL_OK;

 

}

 

 

SLAPD.LOG OF THE CRASH CASE (WITH OUR ADDED SYSLOGS):
>
>

Feb  4 12:09:15 yavcentos slapd: SASL [conn=48] Debug MMYMECH====================== SERVER mymech_server_mech_step END ==

Feb  4 12:09:15 yavcentos mymech: sasl.c label slap_sasl_authorize

Feb  4 12:09:15 yavcentos slapd: SASL proxy authorize [conn=48]: authcid="alfredo@yavcentos" authzid="alfredo@yavcentos"

Feb  4 12:09:15 yavcentos slapd: conn=48 op=4 BIND authcid="alfredo@yavcentos" authzid="alfredo@yavcentos"

Feb  4 12:09:15 yavcentos slapd: SASL Authorize [conn=48]:  proxy authorization allowed authzDN=""

Feb  4 12:09:15 yavcentos mymech: sasl.c label1

Feb  4 12:09:15 yavcentos mymech: result.c label2

Feb  4 12:09:15 yavcentos slapd: send_ldap_sasl: err=0 len=-1

Feb  4 12:09:15 yavcentos mymech: result.c label6a

Feb  4 12:09:15 yavcentos mymech: result.c label61

Feb  4 12:09:15 yavcentos mymech: result.c label61a (tra qui)

Feb  4 12:09:15 yavcentos mymech: result.c label61b

Feb  4 12:09:15 yavcentos mymech: result.c label61b2 op->o_callback->sc_response=0x805e1fc op=0x8926cb0 rs=0x17551a0

Feb  4 12:09:15 yavcentos mymech: connection.c label1

Feb  4 12:09:15 yavcentos mymech: connection.c label1a

Feb  4 12:09:15 yavcentos mymech: connection.c label11a

Feb  4 12:09:15 yavcentos mymech: connection.c label111a

Feb  4 12:09:15 yavcentos mymech: connection.c label12a

Feb  4 12:09:15 yavcentos mymech: connection.c label12a op->o_tmpfree=0x809c3e8 op->orb_edn.bv_val=uid=alfredo@yavcentos,ou=

people,dc=my-domain,dc=com op->orb_edn.bv_len=51 op->o_tmpmemctx=0x8926c10 (slab malloc context)

END

 

SLAPD.LOG OF A SUCCESSFUL LOGIN CASE (WITH OUR ADDED SYSLOGS):

Feb  4 12:13:13 yavcentos mymech: sasl.c label slap_sasl_authorize

Feb  4 12:13:13 yavcentos slapd: SASL proxy authorize [conn=3]: authcid="alfredo@yavcentos" authzid="alfredo@yavcentos"

Feb  4 12:13:14 yavcentos slapd: conn=3 op=4 BIND authcid="alfredo@yavcentos" authzid="alfredo@yavcentos"

Feb  4 12:13:14 yavcentos slapd: SASL Authorize [conn=3]:  proxy authorization allowed authzDN=""

Feb  4 12:13:14 yavcentos mymech: sasl.c label1

Feb  4 12:13:14 yavcentos mymech: result.c label2

Feb  4 12:13:14 yavcentos slapd: send_ldap_sasl: err=0 len=-1

Feb  4 12:13:14 yavcentos mymech: result.c label6a

Feb  4 12:13:14 yavcentos mymech: result.c label61

Feb  4 12:13:14 yavcentos mymech: result.c label61a (tra qui)

Feb  4 12:13:14 yavcentos mymech: result.c label61b

Feb  4 12:13:14 yavcentos mymech: result.c label61b2 op->o_callback->sc_response=0x805e1fc op=0x86667b8 rs=0x51271a0

Feb  4 12:13:14 yavcentos mymech: connection.c label1

Feb  4 12:13:14 yavcentos mymech: connection.c label1a

Feb  4 12:13:14 yavcentos mymech: connection.c label11a

Feb  4 12:13:14 yavcentos mymech: connection.c label111a

Feb  4 12:13:14 yavcentos mymech: connection.c label12a

Feb  4 12:13:14 yavcentos mymech: connection.c label12a op->o_tmpfree=0x809c3e8 op->orb_edn.bv_val=uid=alfredo@yavcentos,ou=

people,dc=my-domain,dc=com op->orb_edn.bv_len=51 op->o_tmpmemctx=0x86660c0 (slab malloc context)

Feb  4 12:13:14 yavcentos mymech: connection.c label13a

Feb  4 12:13:14 yavcentos mymech: connection.c label14a

Feb  4 12:13:14 yavcentos mymech: connection.c label15a

Feb  4 12:13:14 yavcentos mymech: connection.c label151a

Feb  4 12:13:14 yavcentos mymech: connection.c label151b

Feb  4 12:13:14 yavcentos mymech: connection.c label2

Feb  4 12:13:14 yavcentos slapd: conn=3 op=4 BIND dn="uid=alfredo@yavcentos,ou=people,dc=my-domain,dc=com" mech=MYMECH sasl_

ssf=0 ssf=0

Feb  4 12:13:14 yavcentos slapd: do_bind: SASL/MYMECH bind: dn="uid=alfredo@yavcentos,ou=people,dc=my-domain,dc=com" sasl_ss

f=0

Feb  4 12:13:14 yavcentos mymech: connection.c label3

Feb  4 12:13:14 yavcentos mymech: result.c label61c

Feb  4 12:13:14 yavcentos mymech: result.c label61d

Feb  4 12:13:14 yavcentos mymech: result.c label61d2

Feb  4 12:13:14 yavcentos mymech: result.c label61e

Feb  4 12:13:14 yavcentos slapd: send_ldap_response: msgid=5 tag=97 err=0

Feb  4 12:13:14 yavcentos mymech: result.c label1

Feb  4 12:13:14 yavcentos mymech: result.c label11

Feb  4 12:13:14 yavcentos mymech: result.c label12

Feb  4 12:13:14 yavcentos mymech: result.c label13

Feb  4 12:13:14 yavcentos mymech: result.c label14

Feb  4 12:13:14 yavcentos mymech: result.c label14e

Feb  4 12:13:14 yavcentos mymech: result.c label14f

Feb  4 12:13:14 yavcentos mymech: result.c label14g

Feb  4 12:13:14 yavcentos mymech: result.c label14h

Feb  4 12:13:14 yavcentos mymech: result.c label15

Feb  4 12:13:14 yavcentos mymech: result.c label16

Feb  4 12:13:14 yavcentos mymech: result.c label18

Feb  4 12:13:14 yavcentos mymech: result.c label19

Feb  4 12:13:14 yavcentos mymech: result.c label20

Feb  4 12:13:14 yavcentos mymech: result.c label21

Feb  4 12:13:14 yavcentos mymech: result.c label6b e qui2

Feb  4 12:13:14 yavcentos mymech: daemon: activity on 1 descriptor

Feb  4 12:13:15 yavcentos slapd: daemon: activity on:

Feb  4 12:13:15 yavcentos slapd:  13r

Feb  4 12:13:15 yavcentos slapd:

Feb  4 12:13:15 yavcentos slapd: daemon: read active on 13

Feb  4 12:13:15 yavcentos slapd: connection_get(13)

Feb  4 12:13:15 yavcentos slapd: connection_get(13): got connid=3

Feb  4 12:13:15 yavcentos slapd: connection_read(13): checking for input on id=3

Feb  4 12:13:15 yavcentos slapd: daemon: epoll: listen=7 active_threads=0 tvp=NULL

Feb  4 12:13:15 yavcentos slapd: daemon: epoll: listen=8 active_threads=0 tvp=NULL

Feb  4 12:13:15 yavcentos slapd: conn=3 op=4 RESULT tag=97 err=0 text=

Feb  4 12:13:15 yavcentos mymech: sasl.c label4

Feb  4 12:13:15 yavcentos slapd: <== slap_sasl_bind: rc=0

Feb  4 12:13:15 yavcentos mymech: bind.c do_bind label1

Feb  4 12:13:15 yavcentos slapd: do_bind

Feb  4 12:13:15 yavcentos slapd: conn=3 op=5 BIND anonymous mech=implicit ssf=0

Feb  4 12:13:15 yavcentos slapd: >>> dnPrettyNormal: <>

Feb  4 12:13:15 yavcentos slapd: <<< dnPrettyNormal: <>, <>

Feb  4 12:13:15 yavcentos slapd: do_bind: version=3 dn="" method=128

Feb  4 12:13:15 yavcentos slapd: conn=3 op=5 BIND dn="" method=128

Feb  4 12:13:15 yavcentos mymech: sasl.c label slap_sasl_reset

Feb  4 12:13:15 yavcentos slapd: send_ldap_result: conn=3 op=5 p=3

Feb  4 12:13:15 yavcentos slapd: send_ldap_result: err=0 matched="" text=""

Feb  4 12:13:15 yavcentos mymech: result.c label22a

Feb  4 12:13:15 yavcentos mymech: result.c label61

Feb  4 12:13:15 yavcentos mymech: result.c label61a (tra qui)

Feb  4 12:13:15 yavcentos mymech: result.c label61b

Feb  4 12:13:15 yavcentos mymech: result.c label61b2 op->o_callback->sc_response=0x805e1fc op=0x8668158 rs=0x2afc1a0

Feb  4 12:13:15 yavcentos mymech: connection.c label1

Feb  4 12:13:15 yavcentos mymech: connection.c label3

Feb  4 12:13:15 yavcentos mymech: result.c label61c

Feb  4 12:13:15 yavcentos mymech: result.c label61d

Feb  4 12:13:15 yavcentos mymech: result.c label61d2

Feb  4 12:13:15 yavcentos mymech: result.c label61e

Feb  4 12:13:15 yavcentos slapd: send_ldap_response: msgid=6 tag=97 err=0

Feb  4 12:13:15 yavcentos mymech: result.c label1

Feb  4 12:13:15 yavcentos mymech: result.c label11

Feb  4 12:13:15 yavcentos mymech: result.c label12

Feb  4 12:13:15 yavcentos mymech: result.c label13

Feb  4 12:13:15 yavcentos mymech: result.c label14

Feb  4 12:13:15 yavcentos mymech: result.c label14e

Feb  4 12:13:15 yavcentos mymech: result.c label14f

Feb  4 12:13:15 yavcentos mymech: result.c label14g

Feb  4 12:13:15 yavcentos mymech: result.c label14h

Feb  4 12:13:15 yavcentos mymech: result.c label15

Feb  4 12:13:15 yavcentos mymech: result.c label16

Feb  4 12:13:15 yavcentos mymech: result.c label18

Feb  4 12:13:15 yavcentos mymech: result.c label19

Feb  4 12:13:16 yavcentos mymech: result.c label20

Feb  4 12:13:16 yavcentos mymech: result.c label21

Feb  4 12:13:16 yavcentos mymech: result.c label22b

Feb  4 12:13:16 yavcentos mymech: result.c label5 e qui1

Feb  4 12:13:16 yavcentos slapd: conn=3 op=5 RESULT tag=97 err=0 text=

Feb  4 12:13:16 yavcentos slapd: do_bind: v3 anonymous bind

Feb  4 12:13:15 yavcentos mymech: daemon: activity on 1 descriptor

Feb  4 12:13:16 yavcentos slapd: daemon: activity on:

Feb  4 12:13:16 yavcentos slapd:  13r

Feb  4 12:13:16 yavcentos slapd:

Feb  4 12:13:16 yavcentos slapd: daemon: read active on 13

Feb  4 12:13:16 yavcentos slapd: connection_get(13)

Feb  4 12:13:16 yavcentos slapd: connection_get(13): got connid=3

Feb  4 12:13:16 yavcentos slapd: connection_read(13): checking for input on id=3

Feb  4 12:13:16 yavcentos slapd: ber_get_next on fd 13 failed errno=0 (Success)

Feb  4 12:13:16 yavcentos slapd: connection_read(13): input error=-2 id=3, closing.

Feb  4 12:13:16 yavcentos slapd: connection_closing: readying conn=3 sd=13 for close

Feb  4 12:13:16 yavcentos slapd: connection_close: conn=3 sd=-1

Feb  4 12:13:16 yavcentos mymech: sasl.c label slap_sasl_close

Feb  4 12:13:16 yavcentos mymech: sasl.c label slap_sasl_log

Feb  4 12:13:16 yavcentos slapd: SASL [conn=-1] Debug:        MMYMECHSERVER mymech_server_mech_dispose

Feb  4 12:13:16 yavcentos mymech: sasl.c label slap_sasl_log

Feb  4 12:13:16 yavcentos slapd: SASL [conn=-1] Error:  MMYMECHSERVER mymech_server_mech_dispose: start - receives *conn_co

ntext=0x8666a10 *utils=0x8666188

Feb  4 12:13:16 yavcentos mymech: sasl.c label slap_sasl_log

Feb  4 12:13:17 yavcentos slapd: SASL [conn=-1] Error:  MMYMECHSERVER mymech_server_mech_dispose

Feb  4 12:13:17 yavcentos slapd: daemon: removing 13

Feb  4 12:13:17 yavcentos slapd: conn=3 fd=13 closed (connection lost)

Feb  4 12:13:17 yavcentos slapd: daemon: epoll: listen=7 active_threads=0 tvp=NULL

Feb  4 12:13:17 yavcentos slapd: daemon: epoll: listen=8 active_threads=0 tvp=NULL

Feb  4 12:13:17 yavcentos slapd: daemon: activity on 1 descriptor

Feb  4 12:13:17 yavcentos slapd: daemon: activity on:

Feb  4 12:13:17 yavcentos slapd:

Feb  4 12:13:17 yavcentos slapd: daemon: epoll: listen=7 active_threads=0 tvp=NULL

Feb  4 12:13:17 yavcentos slapd: daemon: epoll: listen=8 active_threads=0 tvp=NULL

END

 

Compile and link options of mymech:

gcc –I ..... -v -Wall -W -O2 -MT mymech.lo -MD -MP -MF .deps/mymech.Tpo -c mymech.c  -fPIC -DPIC -o mymech.lo

gcc -shared  mymech.lo mymech _init.lo plugin_common.lo -lpthread -lcrypt -lresolv -lc  -Wl,-soname -Wl,lib mymech.so.2 -o .libs/lib mymech.so.2.0.22

 

Would you have a clue or a though for us about some further check to do over either our code or compile options,

(maybe particularly on the thread-safe topic)?

Would you suggest some thread safe programming strategy?

 

Many many thanks

 

Francesco Grossi

 

Ing. Pierangelo Masarati

OpenLDAP Core Team

SysNet s.r.l.

Via Dossi, 8 - 27100 Pavia - ITALIA

www.sys-net.it


Office:

+39 02 23998309

Mobile:

+39 333 4963172

Fax:

+39 0382 476497

Email:

ando@sys-net.it