richton@nbcs.rutgers.edu wrote:
Full_Name: Aaron Richton Version: RE24 OS: Fedora 20 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (128.6.31.135)
While performing RE24 testing observed an issue in slapd-mtread:
==28795== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==28795== Access not within mapped region at address 0x2FE9F9D0 ==28795== at 0x3A45E0A1%3: pthread_join (pthread_join.c:47) ==28795== by 0x4A09005: ??? (in /usr/lib64/valgrind/vgpreload_helgrind-amd64-linux.so) ==28795== by 0x4A0BBA5: pthread_join (in /usr/lib64/valgrind/vgpreload_helgrind-amd64-linux.so) ==28795== by 0 0x409A2A: ldap_pvt_thread_join (thr_posix.c:197) ==28795== by 0x405540: main (slapd-mtread.c:361)
Hmm, not particularly helpful. Gave this a run through helgrind, report below. So far as I can tell we had one thread in ldap_dup protected by ldcmutex:
542 LDAP_MUTEX_LOCK( &old->ld_ldcmutex ); 543 ld->ldc = old->ldc; 544 old->ld_ldcrefcnt++; 545 LDAP_MUTEX_UNLOCK( &old->ld_ldcmutex );
while ldap_get_option(LDAP_OPT_SESSION_REFCNT) is protected by ldo_mutex only. As a quick attempt:
Thanks. I see your patch is this way to avoid lock ordering issues. The original code in libldap/unbind.c seems to hold ldcmutex for the entire duration of the function, without any good reason. I suspect we can just move the mutex lock/unlock to just around the actual option fetch.
--- a/libraries/libldap/options.c +++ b/libraries/libldap/options.c @@ -123,6 +123,10 @@ ldap_get_option( return LDAP_OPT_ERROR; }
if( option == LDAP_OPT_SESSION_REFCNT ) {
LDAP_MUTEX_LOCK( &ld->ld_ldcmutex );
}
LDAP_MUTEX_LOCK( &lo->ldo_mutex ); switch(option) {
@@ -414,6 +418,11 @@ ldap_get_option( }
LDAP_MUTEX_UNLOCK( &lo->ldo_mutex );
if( option == LDAP_OPT_SESSION_REFCNT ) {
LDAP_MUTEX_UNLOCK( &ld->ld_ldcmutex );
}
}return ( rc );
helgrind looks happy with this. The if() is obviously optional but is hopefully a small performance gain for ldap_get_option(others).
helgrind report:
==28795== Lock at 0x4C393D0 was first observed ==28795== at 0x4A0BCCF: pthread_mutex_init (in /usr/lib64/valgrind/vgpreload_helgrind-amd64-linux.so) ==28795== by 0x409B08: ldap_pvt_thread_mutex_init (thr_posix.c:383) ==28795== by 0x40A25A: ldap_create (open.c:180) ==28795== by 0x40A3E4: ldap_initialize (open.c:240) ==28795== by 0x4062B7: do_conn (slapd-mtread.c:584) ==28795== by 0x40530A: main (slapd-mtread.c:331) ==28795== ==28795== Lock at 0x4C393A8 was first observed ==28795== at 0x4A0BCCF: pthread_mutex_init (in /usr/lib64/valgrind/vgpreload_helgrind-amd64-linux.so) ==28795== by 0x409B08: ldap_pvt_thread_mutex_init (thr_posix.c:283) ==28795== by 0x409FBF: ldap_create (open.c:129) ==28795%%3= by 0xA3A3E4: ldap_initialize (open.c:240) ==28795== by 0x4062B7: do_conn (slapd-mtread.c:584) ==28795== by 0x40530A: main (slapd-mtread.c:331) ==28795== ==28795== Possible data race during write of size 4 at 0x4C39238 by thread #%2 ==28795== Locks held: 1, at address 0x4C393D0 ==28795== at 0x412AC1: ldap_ld_free (unbind.c:87) ==28795== by 0x4130EE: ldap_destroy (unbind.c:245) ==28795== by 0x405BEF: do_onethread (slapd-mtread.c:455) ==28795== by 0x4A0AECD: ??? (in /usr/lib64/valgrind/vgpreload_helgrind-amd64-linux.so) ==28795== by 0x3A45E07EE4: start_thread (pthread_create.c:309) ==28795== by 0x3A45AF4B8C: clone (clone.S:111) ==28795== ==28795== This conflicts with a previous read of size 4 by thread #29 ==28795%%3= Locks held: 1, at address 0x4C393A8 ==28795== at 0x426957: ldap_get_option (options.c:374) ==28795== by 0x405914: do_onethread (slapd-mtread.c:423) ==28795== by 0x4A0AECD: ??? (in /usr/lib64/valgrind/vgpreload_helgrind-amd64-linux.so) ==28795== by 0x3A45E07EE4: start_thread (pthread_create.c:309) ==28795== by 0x3A45AF4B8C: clone (clone.S:111) ==28795== ==28795== Address 0x4C39238 is 88 bytes inside a block of size 736 alloc'd ==28795== at 0x4A083F4: calloc (in /usr/lib64/valgrind/vgpreload_helgrind-amd64-linux.so) ==28795== by 0x43E4B1: ber_memcalloc_x (memory.c:283) ==28795== by 0x409F4B: ldap_create (open.c:119) ==28795== by 0x40A3E4: ldap_initialize (open.c:240) ==28795== by 0x4062B7: do_conn (slapd-mtread.c:584) ==28795== by 0x40530A: main (slapd-mtread.c:331)