Hello list.
Since a recent upgrade 2.4.12 -> 2.4.13, I'm facing recurrent slapd hanging.
On client side, ldapsearch requests receive this error: error.c:272: ldap_parse_result: Assertion `r != ((void *)0)' failed
I'd expect in this case an automatic switch to slave server, but it doesn't work. Here is my ldap libraries configuration: BASE dc=msr-inria,dc=inria,dc=fr URI ldap://ldap1.msr-inria.inria.fr ldap://ldap2.msr-inria.inria.fr TLS_CACERTDIR /etc/pki/tls/certs TLS_REQCERT demand NETWORK_TIMEOUT 2 TIMEOUT 2 TIMELIMIT 2
On server side, slapd usually shows eating 100, 200 or 300% cpu, which make me think some specific repeated query trigger the issue, making the problem worse when several of them accumulates.
strace on running slapd process shows it's waiting on a futex: [root@etoile main]# strace -p 2769 Process 2769 attached - interrupt to quit futex(0xb6bb4bd8, FUTEX_WAIT, 2774, NULL <unfinished ...>
And gdb shows it waiting in __kernel_vsyscall (gdb) bt #0 0xffffe410 in __kernel_vsyscall () #1 0xb7d385c6 in pthread_join () from /lib/i686/libpthread.so.0 #2 0xb7f23d3f in ldap_pvt_thread_join () from /usr/lib/libldap_r-2.4.so.2 #3 0x0806e1b4 in slapd_daemon () #4 0x0805a507 in main ()
In both case, I think the lack of relevant information is caused by the multithreading nature of slapd, I don't know how to access the exact thread where the problem occurs.
I already tried to regenerate indexes, without results. I dropped the base, and reconstructed it from latest backup, it made the problem temporarily disapear. I didn't found anything in the logs, even with debug level set to 'trace'.
I'm using a bdb backend, with this configuration in slapd.conf: database bdb suffix "dc=msr-inria,dc=inria,dc=fr" rootdn "cn=root,dc=msr-inria,dc=inria,dc=fr" #rootpw root directory /var/lib/ldap/main
cachesize 1000 idlcachesize 1000 checkpoint 256 5
And this one in DB_CONFIG: set_cachesize 0 1048576 0 set_lg_bsize 2097152 set_lg_max 10485760 set_flags DB_LOG_AUTOREMOVE
The full slapd.conf is accessible at http://pastebin.mandriva.com/5801 db_stat -m output is accessible at http://pastebin.mandriva.com/5799
The main database itself is quite small, the ldiff backup is 1.4 only. I also have a log database for syncrepl purpose.
I'm using openldap 2.4.13, with db 4.6.21, on mandriva linux 2008.1, 32 bits system. I'd be happy to provide additional informations if needed.