Hi everyone!
I'm working on a project using openLDAP C API (version 2.4.36) in a asynchronous way. Everything works quite well after two years of development cycles and product evolution. Since a year ago we have a few clients successfully running our LDAP module on their servers.
Recently I've received a core dump file from one of our clients, with this stack frames:
... libc frames ...
#6 0x00007f68887d3068 in ldap_int_bisect_find (v=<value optimized out>, n=<value optimized out>, id=<value optimized out>, idxp=<value optimized out>) at abandon.c:334
#7 0x00007f68887d32d2 in do_abandon (ld=0x7f67dcb0bbe0, origid=-1, msgid=-1, sctrls=<value optimized out>, sendabandon=1) at abandon.c:300
... my application frames ...
From my code, I'm calling openldap_ldap_abandon_ext(ld, msgid, NULL, NULL)
because a timeout has been reached after doing a openldap_ldap_sasl_bind(...) and getting LDAP_X_CONNECTING state, while waiting for the result of an LDAP_SUCCESS.
As I've read on *abandon.c*, the assert( id >= 0 ) is executed only on certain flows, I guess... those which are involved in communication handshake with the server in advanced stages. So I think, I'm calling openldap_ldap_abandon_ext(...) at the wrong time.
My question is: can I use something from the API (ldap.h) to prevent calling openldap_ldap_abandon_ext on this specific situation? I think I may add code in my application to prevent crashing, but also ensure aborting the connection correctly, as I have to respect my timeout policy.
BTW, the calls to the openLDAP API in my code are all protected with the same boost::unique_lockboost::mutex to ensure thread safety. Logs shows that my module was under heavy load when the application crashed. I've only this core information, and I haven't been able to reproduce this situation on my integration tests, even doing a simulation of a slowdown in networking communications and shrinking timeouts.
Thanks in advance for your help!