https://bugs.openldap.org/show_bug.cgi?id=10176
Issue ID: 10176 Summary: new atexit() call to atexit(ldap_exit_tls_destroy) in 2.5.17 crashes AIX application Product: OpenLDAP Version: 2.5.17 Hardware: Other OS: Other Status: UNCONFIRMED Keywords: needs_review Severity: normal Priority: --- Component: libraries Assignee: bugs@openldap.org Reporter: philip.miloslavsky@gmail.com Target Milestone: ---
We have a long standing openldap application that's being ported from 2.4.58 to 2.5.17. On ppc AIX (but not on linux for which we also build), when we exit the main application we get a crash in exit() because it is trying to run the atexit which LDAP regsitered, but ldap has already been unloaded and the unloading caused that atexit function pointer to become zero.
So I tracked it to this line of code in ldap 2.5.17 that was not there in 2.4.58
libraries/libldap/tls2.c: atexit( ldap_exit_tls_destroy );
If I remove that line of code, my issue goes away.
So, now on to dlcose and atexit.
So we have a main kernel (irisdb), a C++ library (ldap.so) that we wrote that calls ldap client libraries, and the 2 actual openldap libraries which ldap.so is linked against.
During irisdb exit (the h command)
irisdb does call dlclose on ldap.so, which as a side effect results in the unloading of the 2 official openldap libraries, but no one calls unatexit() (on the 0x09001000a04947a8 below).
After the 3 libraries are unloaded, the atexit registration is still there but its been replaced with zeroes. At what point in this process should we call unatexit or some LDAP function and why does this sequence of events work right on linux but not AIX?
[5] stop in ldap_unbind_s
(dbx) c
[1] stopped in unload_sharedlib at line 7793 in file "/nethome/pmilosla/perforce/projects/OpenLDAP4/kernel/common/src/cdzf.c" ($t1)
7793 if (!libptr)
(dbx) where
unload_sharedlib(libptr = 0x0000000000000004), line 7793 in "cdzf.c"
UnloadZFETable(zfetabdescp = 0x0a00010000032790), line 7346 in "cdzf.c"
ResetZFETable(), line 7940 in "cdzf.c"
zfrundown(), line 10135 in "cdzf.c"
chsub2(), line 3480 in "dmisc2.c"
chalt(flag = 1), line 3222 in "dmisc2.c"
Chaltcmd(), line 3146 in "dmisc2.c"
(dbx) p zfetabdescp->fnameptr
"/home/gavlak/gavlakcre7424/bin/ldap.so"
(dbx) 0x09001000a04947a8/2x
0x09001000a04947a8: 0900 0000
(dbx) 0x09001000a04947a8/4x
0x09001000a04947a8: 0900 0000 0491 8ec0
(dbx) c
[3] stopped in dlclose at 0x90000000029da40 ($t1)
0x90000000029da40 (dlclose) 7c0802a6 mflr r0
(dbx) where
dlclose(0x4) at 0x90000000029da40
unload_sharedlib(libptr = 0x0000000000000004), line 7804 in "cdzf.c"
UnloadZFETable(zfetabdescp = 0x0a00010000032790), line 7346 in "cdzf.c"
ResetZFETable(), line 7940 in "cdzf.c"
zfrundown(), line 10135 in "cdzf.c"
chsub2(), line 3480 in "dmisc2.c"
chalt(flag = 1), line 3222 in "dmisc2.c"
Chaltcmd(), line 3146 in "dmisc2.c"
(dbx) p zfetabdescp->fnameptr
"/home/gavlak/gavlakcre7424/bin/ldap.so"
(dbx) c
[2] stopped in exit at 0x9000000002524a0 ($t1)
0x9000000002524a0 (exit) 7c0802a6 mflr r0
(dbx) 0x09001000a04947a8/4x
0x09001000a04947a8: 0000 0000 0000 0000
(dbx) c
Illegal instruction in . at 0x0 ($t1) 0x0000000000000000 00000000 Invalid opcode. (dbx) where .() at 0x0 exit(??) at 0x900000000252610 syshalt(a = 0), line 6925 in "emisc.c" chalt(flag = 1), line 3227 in "dmisc2.c" Chaltcmd(), line 3146 in "dmisc2.c"
https://bugs.openldap.org/show_bug.cgi?id=10176
--- Comment #1 from philip.miloslavsky@gmail.com --- (gdb) info symbol 0x09001000a04947a8 _$STATIC + 1248 in section .data of /home/gavlak/gavlakcre7424/bin/libldap.so (gdb) info symbol 0x900000004918EC0 ldap_exit_tls_destroy in section .text of /home/gavlak/gavlakcre7424/bin/libldap.so
https://bugs.openldap.org/show_bug.cgi?id=10176
Howard Chu hyc@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://bugs.openldap.org/s | |how_bug.cgi?id=9952
https://bugs.openldap.org/show_bug.cgi?id=10176
--- Comment #2 from Howard Chu hyc@openldap.org --- Since you've already found ITS#9952 I suggest you revert the patch listed there in your build.
Probably we should also revert that change and pursue a fix in the OpenSSL library instead, but there's little chance they will do the right thing.
https://bugs.openldap.org/show_bug.cgi?id=10176
--- Comment #3 from philip.miloslavsky@gmail.com --- I added a line commenting out just the atexit() for AIX. Is that OK you think?
We'll have to redo that every time we pull a new LTS version of openldap so it would be good for you to make some change (I can test). Getting AIX to change atexit() is probably even lower probability!
BTW we had to upgrade the configure infrastructure inside openldap to autoconf=2.69 to get it to behave reasonably for AIX.
https://bugs.openldap.org/show_bug.cgi?id=10176
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Target Milestone|--- |2.5.18 Assignee|bugs@openldap.org |hyc@openldap.org Keywords|needs_review |
--- Comment #4 from Quanah Gibson-Mount quanah@openldap.org --- For 2.5:
Revert ITS#9952
hyc to file bug with OpenSSL project as the issue appears to be with their library.
https://bugs.openldap.org/show_bug.cgi?id=10176
--- Comment #5 from Howard Chu hyc@openldap.org --- If you don't do a proper revert and restore the previous version of the code, then you'll have memory leaks when you dlclose() the library. If your program is exiting soon anyway, it won't matter, but if you dlopen/dlclose multiple times within a single run, it could become a problem.
Alternatively, since your program is exiting, you can just omit your own call to dlclose. Again, there's no point to it if the program is going away anyway.
https://bugs.openldap.org/show_bug.cgi?id=10176
Howard Chu hyc@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://github.com/openssl/ | |openssl/issues/23575
https://bugs.openldap.org/show_bug.cgi?id=10176
--- Comment #6 from Howard Chu hyc@openldap.org --- I submitted https://github.com/openssl/openssl/issues/23575 to their bug tracker. I note they've already had portability issues with their use of atexit (logged as their #23135) and they subsequently added a configure option to disable use of atexit in their builds, but on Linux atexit is still used by default. None of this atexit usage conforms to the POSIX standard so they brought on this portability mess upon themselves.
https://bugs.openldap.org/show_bug.cgi?id=10176
--- Comment #7 from Howard Chu hyc@openldap.org --- Apparently there is an option to tell OpenSSL not to use atexit. Can you please test this patch and see if it fixes your issue?
diff --git a/libraries/libldap/tls_o.c b/libraries/libldap/tls_o.c index 6847ef33b4..9bd830c196 100644 --- a/libraries/libldap/tls_o.c +++ b/libraries/libldap/tls_o.c @@ -225,7 +225,7 @@ tlso_init( void ) SSL_library_init(); OpenSSL_add_all_digests(); #else - OPENSSL_init_ssl(0, NULL); + OPENSSL_init_ssl(OPENSSL_INIT_NO_ATEXIT, NULL); #endif
/* FIXME: mod_ssl does this */
https://bugs.openldap.org/show_bug.cgi?id=10176
--- Comment #8 from Howard Chu hyc@openldap.org --- Alternatively, we can just remove our call to OPENSSL_cleanup, which is what the OpenSSL docs now recommend. https://www.openssl.org/docs/man3.2/man3/OPENSSL_init_crypto.html
https://bugs.openldap.org/show_bug.cgi?id=10176
--- Comment #9 from philip.miloslavsky@gmail.com --- but its not the openssl atexit that exit() crashed on, it was the ldap one.
https://bugs.openldap.org/show_bug.cgi?id=10176
--- Comment #10 from Howard Chu hyc@openldap.org --- (In reply to philip.miloslavsky from comment #9)
but its not the openssl atexit that exit() crashed on, it was the ldap one.
Yes, but that was from ITS#9952 which we are reverting now. And the crash was due to cleaning up OpenSSL.
https://bugs.openldap.org/show_bug.cgi?id=10176
--- Comment #11 from philip.miloslavsky@gmail.com --- I think you meant to ask for "please test" in the other bug. Our app never crashed on linux.
https://bugs.openldap.org/show_bug.cgi?id=10176
Howard Chu hyc@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |RESOLVED Resolution|--- |TEST
--- Comment #12 from Howard Chu hyc@openldap.org --- Fixed in git master 5e13ef87a94491f9339dbca709db29e76741f1a9
That reverts the use of atexit which was originally inserted due to ITS#9952. The crash in ITS#9952 is then addressed in a5953812f0c03e802e61109ae18e8fed5f3f2df8
https://bugs.openldap.org/show_bug.cgi?id=10176
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|TEST |FIXED
https://bugs.openldap.org/show_bug.cgi?id=10176
--- Comment #13 from Quanah Gibson-Mount quanah@openldap.org --- Fix will be in the 2.5.18/2.6.8 releases, if you need it sooner can grab the related two commits from there. See https://bugs.openldap.org/show_bug.cgi?id=9952#c11 for the related commits.
https://bugs.openldap.org/show_bug.cgi?id=10176
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |VERIFIED