Howard Chu wrote:
Nick Geron wrote:
Howard Chu wrote:
OK, this sounds like the background thread to propagate updates isn't getting scheduled when it should. That could be a bug in the syncprov overlay.
Should I file a report, and if so, what information is required from this end?
I've filed this as ITS#5405 and fixed it in CVS HEAD. It would be good if you could test the patch and report your results as a followup to the ITS.
I'll try to look into this soon.
Thanks for the info. That would certainly point to a problem with my build environment, or a new bug.
I'm seeing a number of aborts when testing under high load. The latest came from running scripted ldapsearches and ldapmodifies which resulted in a mutex error (or so I am told by one of our developers).
Specifically:
- adding about 100 attributes to an entry
- diffing the output of ldapsearch between the two nodes in loop
- once synced, grabbing the attributes, shoving them in a temp file
with delete instructions and using that with ldapmodify.
I complied with debugging on which results in an abort with "connection.c: 676: connection_state_closing: Assertion 'c_struct_state == 0x02' failed" logged.
Interesting. It would be useful to get a gdb stack trace from that situation.
Yesterday I was able to successfully reproduce this beahvior at least three times. This morning, I was able to reproduce it with the above steps yet again. From a gdb session, no backtrace was available, however. I then recompiled with debugging enabled and was unable to reproduce the bug until I added '-d 7' to the run arguments. It should be noted that before recompiling, I was able to reproduce the behavior with and without the command line debug argument.
Here's the stack trace from a gdb session with arguments, -h 'ldap:/// ldaps:///' -d 7
=> acl_string_expand: expanded: uid=[^,]+,ou=employees,ou=people,dc=example,dc=com => regex_matches: string: uid=syncrepl,ou=ldap,dc=example,dc=com => regex_matches: rc: 1 no matches slapd: connection.c:676: connection_state_closing: Assertion `c->c_struct_state == 0x02' failed.
Program received signal SIGABRT, Aborted. [Switching to Thread 1124096320 (LWP 7301)] 0x0000003918230055 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x0000003918230055 in raise () from /lib64/libc.so.6 #1 0x0000003918231af0 in abort () from /lib64/libc.so.6 #2 0x0000003918229756 in __assert_fail () from /lib64/libc.so.6 #3 0x000000000042d345 in connection_state_closing () #4 0x000000000043d43b in slap_freeself_cb () #5 0x000000000043ef81 in slap_send_search_entry () #6 0x00000000004c88c4 in syncprov_initialize () #7 0x00002aaaaaabc1c7 in ldap_int_thread_pool_wrapper (xpool=0x1a0b06d0) at tpool.c:625 #8 0x00000039196062f7 in start_thread () from /lib64/libpthread.so.0 #9 0x00000039182ce85d in clone () from /lib64/libc.so.6 (gdb)
Envrionment = CentOS 5 updated to whatever RH thinks is current as of last week. Oracle db 4.5.20 and openldap 2.4.7 compiled by hand.
Please let me know if any further information would be helpful.
Your stack trace is a bit odd, because I can't find anywhere in the source tree that uses the function "slap_freeself_cb()". Are you using any custom overlays? It appears that your stack trace is still missing a lot of details. You should compile with -g and without any optimization, and make sure you're testing with the unstripped binary.
The only overlays I'm using are those found in the openldap tarball. I found that function referenced in result.c
[root@server01 openldap-2.4.7]# grep -R slap_freeself_cb . Binary file ./servers/slapd/.libs/slapd matches ./servers/slapd/proto-slap.h:LDAP_SLAPD_F (int) slap_freeself_cb LDAP_P(( Operation *op, SlapReply *rs )); Binary file ./servers/slapd/result.o matches ./servers/slapd/result.c:int slap_freeself_cb( Operation *op, SlapReply *rs ) [root@server01 openldap-2.4.7]# strings /usr/local/libexec/slapd | grep freeself slap_freeself_cb slap_freeself_cb [root@server01 openldap-2.4.7]#
It does look like I had stripped binaries though. I've removed optimization and verified that the -g option is being given. It took a bit to track down where stripping was being set, but 'file' now tells me the binaries are not stripped.
Here's the latest backtrace:
regex_matches: string: uid=syncrepl,ou=ldap,dc=corenap,dc=com
=> regex_matches: rc: 1 no matches ldap_read: want=8, got=7 0000: 30 05 02 01 03 42 00 0....B. ber_get_next: tag 0x30 len 5 contents: ber_get_next ldap_read: want=8, got=0
ber_get_next on fd 21 failed errno=0 (Success) connection_closing: readying conn=1 sd=21 for close connection_close: deferring conn=1 sd=21 conn=1 op=2 do_unbind connection_resched: attempting closing conn=1 sd=21 connection_close: conn=1 sd=21 slapd: connection.c:676: connection_state_closing: Assertion `c->c_struct_state == 0x02' failed.
Program received signal SIGABRT, Aborted. [Switching to Thread 1107310912 (LWP 17114)] 0x00000030ae430055 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x00000030ae430055 in raise () from /lib64/libc.so.6 #1 0x00000030ae431af0 in abort () from /lib64/libc.so.6 #2 0x00000030ae429756 in __assert_fail () from /lib64/libc.so.6 #3 0x000000000043854a in connection_state_closing (c=0xa35ef60) at connection.c:676 #4 0x000000000044dc38 in send_ldap_ber (conn=0xa35ef60, ber=0x42002410) at result.c:153 #5 0x000000000045173d in slap_send_search_entry (op=0x420029a0, rs=0x420026c0) at result.c:1187 #6 0x000000000051072f in syncprov_sendresp (op=0x420029a0, opc=0x42002780, so=0xa882b80, e=0x420027d8, mode=2) at syncprov.c:809 #7 0x0000000000510a8d in syncprov_qplay (op=0x420029a0, on=0xa094690, so=0xa882b80) at syncprov.c:878 #8 0x0000000000510c54 in syncprov_qtask (ctx=0x42002dc0, arg=0x2aaac4111500) at syncprov.c:923 #9 0x00002aaaaaabc605 in ldap_int_thread_pool_wrapper (xpool=0xa0466d0) at tpool.c:625 #10 0x00000030aec062f7 in start_thread () from /lib64/libpthread.so.0 #11 0x00000030ae4ce85d in clone () from /lib64/libc.so.6 (gdb)
-Nick