Crash while aborting connection on timeout... - openldap-technical

10 Jul 2015


      Hi everyone!
I'm working on a project using openLDAP C API (version 2.4.36) in a
asynchronous way. Everything works quite well after two years of
development cycles and product evolution. Since a year ago we have a few
clients successfully running our LDAP module on their servers.
Recently I've received a core dump file from one of our clients, with this
stack frames:
... libc frames ...
#6 0x00007f68887d3068 in ldap_int_bisect_find (v=<value optimized out>,
n=<value optimized out>, id=<value optimized out>, idxp=<value optimized
out>) at abandon.c:334
#7 0x00007f68887d32d2 in do_abandon (ld=0x7f67dcb0bbe0, origid=-1,
msgid=-1, sctrls=<value optimized out>, sendabandon=1) at abandon.c:300
... my application frames ...
...
From my code, I'm calling openldap_ldap_abandon_ext(ld, msgid, NULL, NULL)
because a timeout has been reached after doing a
openldap_ldap_sasl_bind(...) and getting LDAP_X_CONNECTING state, while
waiting for the result of an LDAP_SUCCESS.
As I've read on *abandon.c*, the assert( id >= 0 ) is executed only on
certain flows, I guess... those which are involved in communication
handshake with the server in advanced stages. So I think, I'm calling
openldap_ldap_abandon_ext(...) at the wrong time.
My question is: can I use something from the API (ldap.h) to prevent
calling openldap_ldap_abandon_ext on this specific situation? I think I may
add code in my application to prevent crashing, but also ensure aborting
the connection correctly, as I have to respect my timeout policy.
BTW, the calls to the openLDAP API in my code are all protected with the
same boost::unique_lockboost::mutex to ensure thread safety. Logs shows
that my module was under heavy load when the application crashed. I've only
this core information, and I haven't been able to reproduce this situation
on my integration tests, even doing a simulation of a slowdown in
networking communications and shrinking timeouts.
Thanks in advance for your help!