Nick Geron wrote:
Howard Chu wrote:
OK, this sounds like the background thread to propagate updates isn't getting scheduled when it should. That could be a bug in the syncprov overlay.
Should I file a report, and if so, what information is required from this end?
I've filed this as ITS#5405 and fixed it in CVS HEAD. It would be good if you could test the patch and report your results as a followup to the ITS.
Thanks for the info. That would certainly point to a problem with my build environment, or a new bug.
I'm seeing a number of aborts when testing under high load. The latest came from running scripted ldapsearches and ldapmodifies which resulted in a mutex error (or so I am told by one of our developers).
Specifically:
- adding about 100 attributes to an entry
- diffing the output of ldapsearch between the two nodes in loop
- once synced, grabbing the attributes, shoving them in a temp file
with delete instructions and using that with ldapmodify.
I complied with debugging on which results in an abort with "connection.c: 676: connection_state_closing: Assertion 'c_struct_state == 0x02' failed" logged.
Interesting. It would be useful to get a gdb stack trace from that situation.
Yesterday I was able to successfully reproduce this beahvior at least three times. This morning, I was able to reproduce it with the above steps yet again. From a gdb session, no backtrace was available, however. I then recompiled with debugging enabled and was unable to reproduce the bug until I added '-d 7' to the run arguments. It should be noted that before recompiling, I was able to reproduce the behavior with and without the command line debug argument.
Here's the stack trace from a gdb session with arguments, -h 'ldap:/// ldaps:///' -d 7
=> acl_string_expand: expanded: uid=[^,]+,ou=employees,ou=people,dc=example,dc=com => regex_matches: string: uid=syncrepl,ou=ldap,dc=example,dc=com => regex_matches: rc: 1 no matches slapd: connection.c:676: connection_state_closing: Assertion `c->c_struct_state == 0x02' failed.
Program received signal SIGABRT, Aborted. [Switching to Thread 1124096320 (LWP 7301)] 0x0000003918230055 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x0000003918230055 in raise () from /lib64/libc.so.6 #1 0x0000003918231af0 in abort () from /lib64/libc.so.6 #2 0x0000003918229756 in __assert_fail () from /lib64/libc.so.6 #3 0x000000000042d345 in connection_state_closing () #4 0x000000000043d43b in slap_freeself_cb () #5 0x000000000043ef81 in slap_send_search_entry () #6 0x00000000004c88c4 in syncprov_initialize () #7 0x00002aaaaaabc1c7 in ldap_int_thread_pool_wrapper (xpool=0x1a0b06d0) at tpool.c:625 #8 0x00000039196062f7 in start_thread () from /lib64/libpthread.so.0 #9 0x00000039182ce85d in clone () from /lib64/libc.so.6 (gdb)
Envrionment = CentOS 5 updated to whatever RH thinks is current as of last week. Oracle db 4.5.20 and openldap 2.4.7 compiled by hand.
Please let me know if any further information would be helpful.
Your stack trace is a bit odd, because I can't find anywhere in the source tree that uses the function "slap_freeself_cb()". Are you using any custom overlays? It appears that your stack trace is still missing a lot of details. You should compile with -g and without any optimization, and make sure you're testing with the unstripped binary.