---- Original message ----
Date: Sat, 24 Jan 2009 16:15:06 -0800 From: Howard Chu hyc@symas.com Subject: Re: RE24 connection code reworking To: Pierangelo Masarati ando@sys-net.it Cc: Quanah Gibson-Mount quanah@zimbra.com,openldap-devel@openldap.org
Pierangelo Masarati wrote:
Pierangelo Masarati wrote:
I ran 30 times test045 with HEAD and got no failures. Then re24 failed after 44 runs with the backtrace below (identical to the previous ones).
I got thru 80 runs of test045 on HEAD (prior to my syncrepl patch) and then slapd hung on shutdown, with a deadlock between connections_shutdown(), connection_closing(), and send_ldap_ber(). So, still tinkering with this.
Ok, make test runs flawlessly on AIX 5.3 with GCC 4.2.3, BDB 4.6.21.3, Cyrus SASL 2.1.22.
How can I do successive runs of a specific test such as you've described here?
Wait a second, I pulled down OPENLDAP_REL_ENG_2_4 should I be grabbing HEAD for these tests?
Please try to overlook my ignorance, just want to help :-)
Cheers, Bill
William Jojo wrote:
---- Original message ----
Date: Sat, 24 Jan 2009 16:15:06 -0800 From: Howard Chuhyc@symas.com Subject: Re: RE24 connection code reworking To: Pierangelo Masaratiando@sys-net.it Cc: Quanah Gibson-Mountquanah@zimbra.com,openldap-devel@openldap.org
Pierangelo Masarati wrote:
Pierangelo Masarati wrote:
I ran 30 times test045 with HEAD and got no failures. Then re24 failed after 44 runs with the backtrace below (identical to the previous ones).
I got thru 80 runs of test045 on HEAD (prior to my syncrepl patch) and then slapd hung on shutdown, with a deadlock between connections_shutdown(), connection_closing(), and send_ldap_ber(). So, still tinkering with this.
Ok, make test runs flawlessly on AIX 5.3 with GCC 4.2.3, BDB 4.6.21.3, Cyrus SASL 2.1.22.
How can I do successive runs of a specific test such as you've described here?
Wait a second, I pulled down OPENLDAP_REL_ENG_2_4 should I be grabbing HEAD for these tests?
Please try to overlook my ignorance, just want to help :-)
Sure. The original call for testing was for RE24, but now that we've found problems with it, you might as well stop testing that code.
I'm still finding odd problems in HEAD, so you should wait for a new call for testing.
On Sat, 24 Jan 2009, Howard Chu wrote:
I'm still finding odd problems in HEAD, so you should wait for a new call for testing.
With HEAD, I got to test050 before suffering a deadlock. Not sure if this is worth an ITS since I think it's the area actively being worked on...
Just for reference, this is consumer1:
t@1 (l@1) stopped in __lwp_wait at 0x7fb1ff64 0x7fb1ff64: __lwp_wait+0x0004: ta %icc,0x00000008 current thread: t@1 [1] __lwp_wait(0x2, 0xffbff2bc, 0x7f98f9e0, 0x7f9424fc, 0x1, 0xffbff284), at 0x7fb1ff64 [2] lwp_wait(0x2, 0xffbff2bc, 0x2cf98, 0x7f984e70, 0x5, 0xffbff2b4), at 0x7f94d1cc [3] _thrp_join(0x2, 0x0, 0x0, 0x1, 0x81010100, 0xff00), at 0x7f9490c4 =>[4] ldap_pvt_thread_join(thread = 2U, thread_return = (nil)), line 197 in "thr_posix.c" [5] slapd_daemon(), line 2658 in "daemon.c" [6] main(argc = 8, argv = 0xffbff4dc), line 948 in "main.c" t@2 (l@2) stopped in _poll at 0x7fb1e238 0x7fb1e238: _poll+0x0004: ta %icc,0x00000008 current thread: t@2 [1] _poll(0x7effbb88, 0x3, 0xbb8, 0x0, 0x3, 0x7effbd91), at 0x7fb1e238 [2] select_large_fdset(0x13, 0x20, 0x7effe218, 0x0, 0x7effbd90, 0x7effbd90), at 0x7fad2b6c =>[3] slapd_daemon_task(ptr = (nil)), line 2291 in "daemon.c" t@3 (l@3) stopped in __lwp_park at 0x7f9554b0 0x7f9554b0: __lwp_park+0x0010: ta %icc,0x00000008 current thread: t@3 [1] __lwp_park(0x0, 0x0, 0x0, 0x1, 0x7f968000, 0x0), at 0x7f9554b0 [2] cond_wait_queue(0x3f3af8, 0x7f968c08, 0x0, 0x0, 0x7f870400, 0x7f968000), at 0x7f9526b8 [3] _cond_wait_cancel(0x3f3af8, 0x3f3ae0, 0x2cf0cc, 0x7e7ff920, 0x0, 0x0), at 0x7f952e74 [4] _pthread_cond_wait(0x3f3af8, 0x3f3ae0, 0x2cf0cc, 0x452e5c, 0x3, 0x0), at 0x7f952eb0 =>[5] ldap_pvt_thread_cond_wait(cond = 0x3f3af8, mutex = 0x3f3ae0), line 277 in "thr_posix.c" [6] ldap_int_thread_pool_wrapper(xpool = 0x3f3ad8), line 654 in "tpool.c" t@4 (l@4) stopped in __lwp_park at 0x7f9554b0 0x7f9554b0: __lwp_park+0x0010: ta %icc,0x00000008 current thread: t@4 [1] __lwp_park(0x0, 0x0, 0x0, 0x1, 0x7f968000, 0x0), at 0x7f9554b0 [2] cond_wait_queue(0x3f3af8, 0x7f968c08, 0x0, 0x0, 0x7f870600, 0x7f968000), at 0x7f9526b8 [3] _cond_wait_cancel(0x3f3af8, 0x3f3ae0, 0x2b91e0, 0x2b91e4, 0x4, 0x0), at 0x7f952e74 [4] _pthread_cond_wait(0x3f3af8, 0x3f3ae0, 0x1, 0x50acc4, 0x0, 0x0), at 0x7f952eb0 =>[5] ldap_pvt_thread_cond_wait(cond = 0x3f3af8, mutex = 0x3f3ae0), line 277 in "thr_posix.c" [6] ldap_int_thread_pool_wrapper(xpool = 0x3f3ad8), line 654 in "tpool.c" t@5 (l@5) stopped in __lwp_park at 0x7f9554b0 0x7f9554b0: __lwp_park+0x0010: ta %icc,0x00000008 current thread: t@5 [1] __lwp_park(0x0, 0x0, 0x0, 0x1, 0x7f968000, 0x0), at 0x7f9554b0 [2] cond_wait_queue(0x3f3af8, 0x7f968c08, 0x0, 0x0, 0x7f870800, 0x7f968000), at 0x7f9526b8 [3] _cond_wait_cancel(0x3f3af8, 0x3f3ae0, 0x2cf0cc, 0x7d7ff920, 0xb, 0x7d7ff3da), at 0x7f952e74 [4] _pthread_cond_wait(0x3f3af8, 0x3f3ae0, 0x2cf0cc, 0x4fe89c, 0x4, 0x0), at 0x7f952eb0 =>[5] ldap_pvt_thread_cond_wait(cond = 0x3f3af8, mutex = 0x3f3ae0), line 277 in "thr_posix.c" [6] ldap_int_thread_pool_wrapper(xpool = 0x3f3ad8), line 654 in "tpool.c" t@6 (l@6) stopped in __lwp_park at 0x7f9554b0 0x7f9554b0: __lwp_park+0x0010: ta %icc,0x00000008 current thread: t@6 [1] __lwp_park(0x0, 0x0, 0x0, 0x1, 0x7f968000, 0xf0000000), at 0x7f9554b0 [2] cond_wait_queue(0x438200, 0x7f968c08, 0x0, 0x0, 0x7f870a00, 0x7f968000), at 0x7f9526b8 [3] _cond_wait_cancel(0x438200, 0x4381e8, 0x2d004, 0x7f983bf0, 0x5, 0x4), at 0x7f952e74 [4] _pthread_cond_wait(0x438200, 0x4381e8, 0x1, 0x7fb1881c, 0x9, 0x7fa75088), at 0x7f952eb0 =>[5] ldap_pvt_thread_cond_wait(cond = 0x438200, mutex = 0x4381e8), line 277 in "thr_posix.c" [6] send_ldap_ber(conn = 0x438108, ber = 0x7cfff6b8), line 217 in "result.c" [7] slap_send_search_entry(op = 0x7cfffaf4, rs = 0x7cfff89c), line 1246 in "result.c" [8] syncprov_sendresp(op = 0x7cfffaf4, opc = 0x7cfff948, so = 0x426d88, e = 0x7cfff974, mode = 1), line 817 in "syncprov.c" [9] syncprov_qplay(op = 0x7cfffaf4, rtask = 0x50a7b0), line 888 in "syncprov.c" [10] syncprov_qtask(ctx = 0x7cfffe0c, arg = 0x50a7b0), line 951 in "syncprov.c" [11] ldap_int_thread_pool_wrapper(xpool = 0x3f3ad8), line 663 in "tpool.c" t@7 (l@7) stopped in __lwp_park at 0x7f9554b0 0x7f9554b0: __lwp_park+0x0010: ta %icc,0x00000008 current thread: t@7 [1] __lwp_park(0x0, 0x0, 0x0, 0x1, 0x7f968000, 0x0), at 0x7f9554b0 [2] cond_wait_queue(0x3f3b08, 0x7f968c08, 0x0, 0x0, 0x7f870c00, 0x7f968000), at 0x7f9526b8 [3] _cond_wait_cancel(0x3f3b08, 0x3f3ae0, 0x0, 0x0, 0x0, 0x0), at 0x7f952e74 [4] _pthread_cond_wait(0x3f3b08, 0x3f3ae0, 0x0, 0x0, 0x0, 0x0), at 0x7f952eb0 =>[5] ldap_pvt_thread_cond_wait(cond = 0x3f3b08, mutex = 0x3f3ae0), line 277 in "thr_posix.c" [6] handle_pause(tpool = 0x3a6d3c, do_pause = 1), line 738 in "tpool.c" [7] ldap_pvt_thread_pool_pause(tpool = 0x3a6d3c), line 761 in "tpool.c" [8] config_back_add(op = 0x7c7ff8d8, rs = 0x7c7ff548), line 4711 in "bconfig.c" [9] overlay_op_walk(op = 0x7c7ff8d8, rs = 0x7c7ff548, which = op_add, oi = 0x452518, on = (nil)), line 670 in "backover.c" [10] over_op_func(op = 0x7c7ff8d8, rs = 0x7c7ff548, which = op_add), line 722 in "backover.c" [11] over_op_add(op = 0x7c7ff8d8, rs = 0x7c7ff548), line 768 in "backover.c" [12] syncrepl_entry(si = 0x452b58, op = 0x7c7ff8d8, entry = 0x523a9c, modlist = 0x7c7ff6c4, syncstate = 1, syncUUID = 0x7c7ff720, syncCSN = 0x453438), line 2166 in "syncrepl.c" [13] do_syncrep2(op = 0x7c7ff8d8, si = 0x452b58), line 892 in "syncrepl.c" [14] do_syncrepl(ctx = 0x7c7ffe0c, arg = 0x457ad0), line 1333 in "syncrepl.c" [15] connection_read_thread(ctx = 0x7c7ffe0c, argv = 0x8), line 1228 in "connection.c" [16] ldap_int_thread_pool_wrapper(xpool = 0x3f3ad8), line 663 in "tpool.c"
Aaron Richton wrote:
On Sat, 24 Jan 2009, Howard Chu wrote:
I'm still finding odd problems in HEAD, so you should wait for a new call for testing.
With HEAD, I got to test050 before suffering a deadlock. Not sure if this is worth an ITS since I think it's the area actively being worked on...
Right. Thanks for the trace. It shows a writer is blocked waiting for the socket to become writable. What happened to the other servers at this point?
Just for reference, this is consumer1:
t@1 (l@1) stopped in __lwp_wait at 0x7fb1ff64 0x7fb1ff64: __lwp_wait+0x0004: ta %icc,0x00000008 current thread: t@1 [1] __lwp_wait(0x2, 0xffbff2bc, 0x7f98f9e0, 0x7f9424fc, 0x1, 0xffbff284), at 0x7fb1ff64 [2] lwp_wait(0x2, 0xffbff2bc, 0x2cf98, 0x7f984e70, 0x5, 0xffbff2b4), at 0x7f94d1cc [3] _thrp_join(0x2, 0x0, 0x0, 0x1, 0x81010100, 0xff00), at 0x7f9490c4 =>[4] ldap_pvt_thread_join(thread = 2U, thread_return = (nil)), line 197 in "thr_posix.c" [5] slapd_daemon(), line 2658 in "daemon.c" [6] main(argc = 8, argv = 0xffbff4dc), line 948 in "main.c" t@2 (l@2) stopped in _poll at 0x7fb1e238 0x7fb1e238: _poll+0x0004: ta %icc,0x00000008 current thread: t@2 [1] _poll(0x7effbb88, 0x3, 0xbb8, 0x0, 0x3, 0x7effbd91), at 0x7fb1e238 [2] select_large_fdset(0x13, 0x20, 0x7effe218, 0x0, 0x7effbd90, 0x7effbd90), at 0x7fad2b6c =>[3] slapd_daemon_task(ptr = (nil)), line 2291 in "daemon.c" t@3 (l@3) stopped in __lwp_park at 0x7f9554b0 0x7f9554b0: __lwp_park+0x0010: ta %icc,0x00000008 current thread: t@3 [1] __lwp_park(0x0, 0x0, 0x0, 0x1, 0x7f968000, 0x0), at 0x7f9554b0 [2] cond_wait_queue(0x3f3af8, 0x7f968c08, 0x0, 0x0, 0x7f870400, 0x7f968000), at 0x7f9526b8 [3] _cond_wait_cancel(0x3f3af8, 0x3f3ae0, 0x2cf0cc, 0x7e7ff920, 0x0, 0x0), at 0x7f952e74 [4] _pthread_cond_wait(0x3f3af8, 0x3f3ae0, 0x2cf0cc, 0x452e5c, 0x3, 0x0), at 0x7f952eb0 =>[5] ldap_pvt_thread_cond_wait(cond = 0x3f3af8, mutex = 0x3f3ae0), line 277 in "thr_posix.c" [6] ldap_int_thread_pool_wrapper(xpool = 0x3f3ad8), line 654 in "tpool.c" t@4 (l@4) stopped in __lwp_park at 0x7f9554b0 0x7f9554b0: __lwp_park+0x0010: ta %icc,0x00000008 current thread: t@4 [1] __lwp_park(0x0, 0x0, 0x0, 0x1, 0x7f968000, 0x0), at 0x7f9554b0 [2] cond_wait_queue(0x3f3af8, 0x7f968c08, 0x0, 0x0, 0x7f870600, 0x7f968000), at 0x7f9526b8 [3] _cond_wait_cancel(0x3f3af8, 0x3f3ae0, 0x2b91e0, 0x2b91e4, 0x4, 0x0), at 0x7f952e74 [4] _pthread_cond_wait(0x3f3af8, 0x3f3ae0, 0x1, 0x50acc4, 0x0, 0x0), at 0x7f952eb0 =>[5] ldap_pvt_thread_cond_wait(cond = 0x3f3af8, mutex = 0x3f3ae0), line 277 in "thr_posix.c" [6] ldap_int_thread_pool_wrapper(xpool = 0x3f3ad8), line 654 in "tpool.c" t@5 (l@5) stopped in __lwp_park at 0x7f9554b0 0x7f9554b0: __lwp_park+0x0010: ta %icc,0x00000008 current thread: t@5 [1] __lwp_park(0x0, 0x0, 0x0, 0x1, 0x7f968000, 0x0), at 0x7f9554b0 [2] cond_wait_queue(0x3f3af8, 0x7f968c08, 0x0, 0x0, 0x7f870800, 0x7f968000), at 0x7f9526b8 [3] _cond_wait_cancel(0x3f3af8, 0x3f3ae0, 0x2cf0cc, 0x7d7ff920, 0xb, 0x7d7ff3da), at 0x7f952e74 [4] _pthread_cond_wait(0x3f3af8, 0x3f3ae0, 0x2cf0cc, 0x4fe89c, 0x4, 0x0), at 0x7f952eb0 =>[5] ldap_pvt_thread_cond_wait(cond = 0x3f3af8, mutex = 0x3f3ae0), line 277 in "thr_posix.c" [6] ldap_int_thread_pool_wrapper(xpool = 0x3f3ad8), line 654 in "tpool.c" t@6 (l@6) stopped in __lwp_park at 0x7f9554b0 0x7f9554b0: __lwp_park+0x0010: ta %icc,0x00000008 current thread: t@6 [1] __lwp_park(0x0, 0x0, 0x0, 0x1, 0x7f968000, 0xf0000000), at 0x7f9554b0 [2] cond_wait_queue(0x438200, 0x7f968c08, 0x0, 0x0, 0x7f870a00, 0x7f968000), at 0x7f9526b8 [3] _cond_wait_cancel(0x438200, 0x4381e8, 0x2d004, 0x7f983bf0, 0x5, 0x4), at 0x7f952e74 [4] _pthread_cond_wait(0x438200, 0x4381e8, 0x1, 0x7fb1881c, 0x9, 0x7fa75088), at 0x7f952eb0 =>[5] ldap_pvt_thread_cond_wait(cond = 0x438200, mutex = 0x4381e8), line 277 in "thr_posix.c" [6] send_ldap_ber(conn = 0x438108, ber = 0x7cfff6b8), line 217 in "result.c" [7] slap_send_search_entry(op = 0x7cfffaf4, rs = 0x7cfff89c), line 1246 in "result.c" [8] syncprov_sendresp(op = 0x7cfffaf4, opc = 0x7cfff948, so = 0x426d88, e = 0x7cfff974, mode = 1), line 817 in "syncprov.c" [9] syncprov_qplay(op = 0x7cfffaf4, rtask = 0x50a7b0), line 888 in "syncprov.c" [10] syncprov_qtask(ctx = 0x7cfffe0c, arg = 0x50a7b0), line 951 in "syncprov.c" [11] ldap_int_thread_pool_wrapper(xpool = 0x3f3ad8), line 663 in "tpool.c" t@7 (l@7) stopped in __lwp_park at 0x7f9554b0 0x7f9554b0: __lwp_park+0x0010: ta %icc,0x00000008 current thread: t@7 [1] __lwp_park(0x0, 0x0, 0x0, 0x1, 0x7f968000, 0x0), at 0x7f9554b0 [2] cond_wait_queue(0x3f3b08, 0x7f968c08, 0x0, 0x0, 0x7f870c00, 0x7f968000), at 0x7f9526b8 [3] _cond_wait_cancel(0x3f3b08, 0x3f3ae0, 0x0, 0x0, 0x0, 0x0), at 0x7f952e74 [4] _pthread_cond_wait(0x3f3b08, 0x3f3ae0, 0x0, 0x0, 0x0, 0x0), at 0x7f952eb0 =>[5] ldap_pvt_thread_cond_wait(cond = 0x3f3b08, mutex = 0x3f3ae0), line 277 in "thr_posix.c" [6] handle_pause(tpool = 0x3a6d3c, do_pause = 1), line 738 in "tpool.c" [7] ldap_pvt_thread_pool_pause(tpool = 0x3a6d3c), line 761 in "tpool.c" [8] config_back_add(op = 0x7c7ff8d8, rs = 0x7c7ff548), line 4711 in "bconfig.c" [9] overlay_op_walk(op = 0x7c7ff8d8, rs = 0x7c7ff548, which = op_add, oi = 0x452518, on = (nil)), line 670 in "backover.c" [10] over_op_func(op = 0x7c7ff8d8, rs = 0x7c7ff548, which = op_add), line 722 in "backover.c" [11] over_op_add(op = 0x7c7ff8d8, rs = 0x7c7ff548), line 768 in "backover.c" [12] syncrepl_entry(si = 0x452b58, op = 0x7c7ff8d8, entry = 0x523a9c, modlist = 0x7c7ff6c4, syncstate = 1, syncUUID = 0x7c7ff720, syncCSN = 0x453438), line 2166 in "syncrepl.c" [13] do_syncrep2(op = 0x7c7ff8d8, si = 0x452b58), line 892 in "syncrepl.c" [14] do_syncrepl(ctx = 0x7c7ffe0c, arg = 0x457ad0), line 1333 in "syncrepl.c" [15] connection_read_thread(ctx = 0x7c7ffe0c, argv = 0x8), line 1228 in "connection.c" [16] ldap_int_thread_pool_wrapper(xpool = 0x3f3ad8), line 663 in "tpool.c"
On Sun, 25 Jan 2009, Howard Chu wrote:
socket to become writable. What happened to the other servers at this point?
https://www.nbcs.rutgers.edu/~richton/test050logs.tgz
I think that con2 is trying to replicate back-config poorly, note in particular:
send_ldap_result: conn=-1 op=0 p=3
conn=-1? assuming that hasn't been retooled and still should be a monotonically incrementing counter, that's an interesting state...
Backtrace [con2]:
t@1 (l@1) stopped in __lwp_wait at 0x7fb1ff64 0x7fb1ff64: __lwp_wait+0x0004: ta %icc,0x00000008 current thread: t@1 [1] __lwp_wait(0x2, 0xffbff2bc, 0x7f98f9e0, 0x7f9424fc, 0x1, 0xffbff284), at 0x7fb1ff64 [2] lwp_wait(0x2, 0xffbff2bc, 0x2cf98, 0x7f984e70, 0x5, 0xffbff2b4), at 0x7f94d1cc [3] _thrp_join(0x2, 0x0, 0x0, 0x1, 0x81010100, 0xff00), at 0x7f9490c4 =>[4] ldap_pvt_thread_join(thread = 2U, thread_return = (nil)), line 197 in "thr_posix.c" [5] slapd_daemon(), line 2658 in "daemon.c" [6] main(argc = 8, argv = 0xffbff4dc), line 948 in "main.c" Current function is slapd_daemon_task 2291 SLAP_EVENT_WAIT( tvp, &ns ); t@2 (l@2) stopped in _poll at 0x7fb1e238 0x7fb1e238: _poll+0x0004: ta %icc,0x00000008 current thread: t@2 [1] _poll(0x7effbb88, 0x1, 0xea60, 0x0, 0x3c, 0x7effbd91), at 0x7fb1e238 [2] select_large_fdset(0xb, 0x20, 0x7effbd90, 0x0, 0x7effbd90, 0x7effbd90), at 0x7fad2b6c =>[3] slapd_daemon_task(ptr = (nil)), line 2291 in "daemon.c" Current function is ldap_pvt_thread_cond_wait 277 return ERRVAL( pthread_cond_wait( cond, mutex ) ); t@3 (l@3) stopped in __lwp_park at 0x7f9554b0 0x7f9554b0: __lwp_park+0x0010: ta %icc,0x00000008 current thread: t@3 [1] __lwp_park(0x0, 0x0, 0x0, 0x1, 0x7f968000, 0x6c), at 0x7f9554b0 [2] cond_wait_queue(0x3f3af8, 0x7f968c08, 0x0, 0x0, 0x7f870400, 0x7f968000), at 0x7f9526b8 [3] _cond_wait_cancel(0x3f3af8, 0x3f3ae0, 0x9, 0x0, 0x0, 0x0), at 0x7f952e74 [4] _pthread_cond_wait(0x3f3af8, 0x3f3ae0, 0xe4768, 0x0, 0x0, 0x0), at 0x7f952eb0 =>[5] ldap_pvt_thread_cond_wait(cond = 0x3f3af8, mutex = 0x3f3ae0), line 277 in "thr_posix.c" [6] handle_pause(tpool = 0x3a6d3c, do_pause = 1), line 721 in "tpool.c" [7] ldap_pvt_thread_pool_pause(tpool = 0x3a6d3c), line 761 in "tpool.c" [8] config_back_add(op = 0x7e7ff960, rs = 0x7e7ff5d0), line 4711 in "bconfig.c" [9] syncrepl_entry(si = 0x452cd0, op = 0x7e7ff960, entry = 0x523ac4, modlist = 0x7e7ff74c, syncstate = 1, syncUUID = 0x7e7ff7a8, syncCSN = (nil)), line 2166 in "syncrepl.c" [10] do_syncrep2(op = 0x7e7ff960, si = 0x452cd0), line 892 in "syncrepl.c" [11] do_syncrepl(ctx = 0x7e7ffe0c, arg = 0x452e48), line 1333 in "syncrepl.c" [12] ldap_int_thread_pool_wrapper(xpool = 0x3f3ad8), line 663 in "tpool.c" Current function is ldap_pvt_thread_mutex_lock 296 return ERRVAL( pthread_mutex_lock( mutex ) ); t@4 (l@4) stopped in __lwp_park at 0x7f9554b0 0x7f9554b0: __lwp_park+0x0010: ta %icc,0x00000008 current thread: t@4 [1] __lwp_park(0x0, 0x0, 0x0, 0x0, 0x1, 0x0), at 0x7f9554b0 [2] mutex_lock_queue(0x7f968c04, 0x0, 0x4382b0, 0x7f968000, 0x81010100, 0xff00), at 0x7f951188 [3] slow_lock(0x4382b0, 0x7f870600, 0x2b81a0, 0x7dfffc38, 0x7f968000, 0xff0000), at 0x7f951b88 =>[4] ldap_pvt_thread_mutex_lock(mutex = 0x4382b0), line 296 in "thr_posix.c" [5] connection_get(s = 10), line 266 in "connection.c" [6] connection_read(s = 10, cri = 0x7dfffd64), line 1286 in "connection.c" [7] connection_read_thread(ctx = 0x7dfffe0c, argv = 0xa), line 1219 in "connection.c" [8] ldap_int_thread_pool_wrapper(xpool = 0x3f3ad8), line 663 in "tpool.c" Current function is ldap_pvt_thread_mutex_lock 296 return ERRVAL( pthread_mutex_lock( mutex ) ); t@5 (l@5) stopped in __lwp_park at 0x7f9554b0 0x7f9554b0: __lwp_park+0x0010: ta %icc,0x00000008 current thread: t@5 [1] __lwp_park(0x0, 0x0, 0x0, 0x0, 0x81010100, 0xff00), at 0x7f9554b0 [2] mutex_lock_queue(0x7f968c04, 0x0, 0x438350, 0x7f968000, 0x4, 0x83037092), at 0x7f951188 [3] slow_lock(0x438350, 0x7f870800, 0x20d78, 0x7fa611f8, 0x0, 0x0), at 0x7f951b88 =>[4] ldap_pvt_thread_mutex_lock(mutex = 0x438350), line 296 in "thr_posix.c" [5] connection_closing(c = 0x438298, why = 0x2bbfb0 "connection lost on write"), line 764 in "connection.c" [6] send_ldap_ber(conn = 0x438298, ber = 0x7d7ff6d0), line 202 in "result.c" [7] slap_send_search_entry(op = 0x451ce0, rs = 0x7d7ffcb0), line 1246 in "result.c" [8] config_send(op = 0x451ce0, rs = 0x7d7ffcb0, ce = 0x427b98, depth = 1), line 3646 in "bconfig.c" [9] config_send(op = 0x451ce0, rs = 0x7d7ffcb0, ce = 0x422c98, depth = 0), line 3653 in "bconfig.c" [10] config_back_search(op = 0x451ce0, rs = 0x7d7ffcb0), line 5519 in "bconfig.c" [11] fe_op_search(op = 0x451ce0, rs = 0x7d7ffcb0), line 366 in "search.c" [12] do_search(op = 0x451ce0, rs = 0x7d7ffcb0), line 217 in "search.c" [13] connection_operation(ctx = 0x7d7ffe0c, arg_v = 0x451ce0), line 1100 in "connection.c" [14] connection_read_thread(ctx = 0x7d7ffe0c, argv = 0xa), line 1226 in "connection.c" [15] ldap_int_thread_pool_wrapper(xpool = 0x3f3ad8), line 663 in "tpool.c" Current function is ldap_pvt_thread_cond_wait 277 return ERRVAL( pthread_cond_wait( cond, mutex ) ); t@6 (l@6) stopped in __lwp_park at 0x7f9554b0 0x7f9554b0: __lwp_park+0x0010: ta %icc,0x00000008 current thread: t@6 [1] __lwp_park(0x0, 0x0, 0x0, 0x1, 0x7f968000, 0x6c), at 0x7f9554b0 [2] cond_wait_queue(0x3f3b08, 0x7f968c08, 0x0, 0x0, 0x7f870a00, 0x7f968000), at 0x7f9526b8 [3] _cond_wait_cancel(0x3f3b08, 0x3f3ae0, 0x9, 0x0, 0x0, 0x0), at 0x7f952e74 [4] _pthread_cond_wait(0x3f3b08, 0x3f3ae0, 0xe4768, 0x0, 0x0, 0x0), at 0x7f952eb0 =>[5] ldap_pvt_thread_cond_wait(cond = 0x3f3b08, mutex = 0x3f3ae0), line 277 in "thr_posix.c" [6] handle_pause(tpool = 0x3a6d3c, do_pause = 1), line 738 in "tpool.c" [7] ldap_pvt_thread_pool_pause(tpool = 0x3a6d3c), line 761 in "tpool.c" [8] config_back_add(op = 0x7cfff8d8, rs = 0x7cfff548), line 4711 in "bconfig.c" [9] syncrepl_entry(si = 0x452b58, op = 0x7cfff8d8, entry = 0x523a9c, modlist = 0x7cfff6c4, syncstate = 1, syncUUID = 0x7cfff720, syncCSN = (nil)), line 2166 in "syncrepl.c" [10] do_syncrep2(op = 0x7cfff8d8, si = 0x452b58), line 892 in "syncrepl.c" [11] do_syncrepl(ctx = 0x7cfffe0c, arg = 0x457ad0), line 1333 in "syncrepl.c" [12] connection_read_thread(ctx = 0x7cfffe0c, argv = 0x9), line 1228 in "connection.c" [13] ldap_int_thread_pool_wrapper(xpool = 0x3f3ad8), line 663 in "tpool.c"
Backtrace [provider]:
t@1 (l@1) stopped in __lwp_wait at 0x7fb1ff64 0x7fb1ff64: __lwp_wait+0x0004: ta %icc,0x00000008 current thread: t@1 [1] __lwp_wait(0x2, 0xffbff2c4, 0x7f98f9e0, 0x7f9424fc, 0x1, 0xffbff28c), at 0x7fb1ff64 [2] lwp_wait(0x2, 0xffbff2c4, 0x2cf98, 0x7f984e70, 0x5, 0xffbff2bc), at 0x7f94d1cc [3] _thrp_join(0x2, 0x0, 0x0, 0x1, 0x81010100, 0xff00), at 0x7f9490c4 =>[4] ldap_pvt_thread_join(thread = 2U, thread_return = (nil)), line 197 in "thr_posix.c" [5] slapd_daemon(), line 2658 in "daemon.c" [6] main(argc = 8, argv = 0xffbff4e4), line 948 in "main.c" Current function is slapd_daemon_task 2291 SLAP_EVENT_WAIT( tvp, &ns ); t@2 (l@2) stopped in _poll at 0x7fb1e238 0x7fb1e238: _poll+0x0004: ta %icc,0x00000008 current thread: t@2 [1] _poll(0x7effbb88, 0x6, 0xffffffffffffffff, 0xfffffffffffffff8, 0x0, 0x7effbd91), at 0x7fb1e238 [2] select_large_fdset(0x11, 0x20, 0x7effbd90, 0x0, 0x7effbd90, 0x7effbd90), at 0x7fad2b6c =>[3] slapd_daemon_task(ptr = (nil)), line 2291 in "daemon.c" Current function is ldap_pvt_thread_cond_wait 277 return ERRVAL( pthread_cond_wait( cond, mutex ) ); t@3 (l@3) stopped in __lwp_park at 0x7f9554b0 0x7f9554b0: __lwp_park+0x0010: ta %icc,0x00000008 current thread: t@3 [1] __lwp_park(0x0, 0x0, 0x0, 0x1, 0x7f968000, 0x0), at 0x7f9554b0 [2] cond_wait_queue(0x3f3af8, 0x7f968c08, 0x0, 0x0, 0x7f870400, 0x7f968000), at 0x7f9526b8 [3] _cond_wait_cancel(0x3f3af8, 0x3f3ae0, 0x0, 0x7f968000, 0x8, 0x0), at 0x7f952e74 [4] _pthread_cond_wait(0x3f3af8, 0x3f3ae0, 0x7e7ffe0c, 0x1, 0x0, 0x7e7ffd7d), at 0x7f952eb0 =>[5] ldap_pvt_thread_cond_wait(cond = 0x3f3af8, mutex = 0x3f3ae0), line 277 in "thr_posix.c" [6] ldap_int_thread_pool_wrapper(xpool = 0x3f3ad8), line 654 in "tpool.c" Current function is ldap_pvt_thread_cond_wait 277 return ERRVAL( pthread_cond_wait( cond, mutex ) ); t@4 (l@4) stopped in __lwp_park at 0x7f9554b0 0x7f9554b0: __lwp_park+0x0010: ta %icc,0x00000008 current thread: t@4 [1] __lwp_park(0x0, 0x0, 0x0, 0x1, 0x7f968000, 0x0), at 0x7f9554b0 [2] cond_wait_queue(0x3f3af8, 0x7f968c08, 0x0, 0x0, 0x7f870600, 0x7f968000), at 0x7f9526b8 [3] _cond_wait_cancel(0x3f3af8, 0x3f3ae0, 0x0, 0x7f968000, 0x8, 0x0), at 0x7f952e74 [4] _pthread_cond_wait(0x3f3af8, 0x3f3ae0, 0x2cf094, 0x46186c, 0x0, 0x0), at 0x7f952eb0 =>[5] ldap_pvt_thread_cond_wait(cond = 0x3f3af8, mutex = 0x3f3ae0), line 277 in "thr_posix.c" [6] ldap_int_thread_pool_wrapper(xpool = 0x3f3ad8), line 654 in "tpool.c" Current function is ldap_pvt_thread_cond_wait 277 return ERRVAL( pthread_cond_wait( cond, mutex ) ); t@5 (l@5) stopped in __lwp_park at 0x7f9554b0 0x7f9554b0: __lwp_park+0x0010: ta %icc,0x00000008 current thread: t@5 [1] __lwp_park(0x0, 0x0, 0x0, 0x1, 0x7f968000, 0x0), at 0x7f9554b0 [2] cond_wait_queue(0x3f3af8, 0x7f968c08, 0x0, 0x0, 0x7f870800, 0x7f968000), at 0x7f9526b8 [3] _cond_wait_cancel(0x3f3af8, 0x3f3ae0, 0x0, 0x7f968000, 0x1, 0x7d7ff974), at 0x7f952e74 [4] _pthread_cond_wait(0x3f3af8, 0x3f3ae0, 0x2cf0cc, 0x46186c, 0x4, 0x0), at 0x7f952eb0 =>[5] ldap_pvt_thread_cond_wait(cond = 0x3f3af8, mutex = 0x3f3ae0), line 277 in "thr_posix.c" [6] ldap_int_thread_pool_wrapper(xpool = 0x3f3ad8), line 654 in "tpool.c" Current function is ldap_pvt_thread_cond_wait 277 return ERRVAL( pthread_cond_wait( cond, mutex ) ); t@6 (l@6) stopped in __lwp_park at 0x7f9554b0 0x7f9554b0: __lwp_park+0x0010: ta %icc,0x00000008 current thread: t@6 [1] __lwp_park(0x0, 0x0, 0x0, 0x1, 0x7f968000, 0x0), at 0x7f9554b0 [2] cond_wait_queue(0x3f3af8, 0x7f968c08, 0x0, 0x0, 0x7f870a00, 0x7f968000), at 0x7f9526b8 [3] _cond_wait_cancel(0x3f3af8, 0x3f3ae0, 0x0, 0x7f968000, 0x8, 0x0), at 0x7f952e74 [4] _pthread_cond_wait(0x3f3af8, 0x3f3ae0, 0x2cf094, 0x468954, 0x0, 0x0), at 0x7f952eb0 =>[5] ldap_pvt_thread_cond_wait(cond = 0x3f3af8, mutex = 0x3f3ae0), line 277 in "thr_posix.c" [6] ldap_int_thread_pool_wrapper(xpool = 0x3f3ad8), line 654 in "tpool.c" Current function is ldap_pvt_thread_cond_wait 277 return ERRVAL( pthread_cond_wait( cond, mutex ) ); t@7 (l@7) stopped in __lwp_park at 0x7f9554b0 0x7f9554b0: __lwp_park+0x0010: ta %icc,0x00000008 current thread: t@7 [1] __lwp_park(0x0, 0x0, 0x0, 0x1, 0x7f968000, 0x0), at 0x7f9554b0 [2] cond_wait_queue(0x3f3af8, 0x7f968c08, 0x0, 0x0, 0x7f870c00, 0x7f968000), at 0x7f9526b8 [3] _cond_wait_cancel(0x3f3af8, 0x3f3ae0, 0x2cf094, 0x7c7ff920, 0x7, 0x0), at 0x7f952e74 [4] _pthread_cond_wait(0x3f3af8, 0x3f3ae0, 0x2cf094, 0x45eef4, 0x0, 0x0), at 0x7f952eb0 =>[5] ldap_pvt_thread_cond_wait(cond = 0x3f3af8, mutex = 0x3f3ae0), line 277 in "thr_posix.c" [6] ldap_int_thread_pool_wrapper(xpool = 0x3f3ad8), line 654 in "tpool.c"
Aaron Richton wrote:
On Sun, 25 Jan 2009, Howard Chu wrote:
socket to become writable. What happened to the other servers at this point?
https://www.nbcs.rutgers.edu/~richton/test050logs.tgz
I think that con2 is trying to replicate back-config poorly, note in particular:
send_ldap_result: conn=-1 op=0 p=3
conn=-1? assuming that hasn't been retooled and still should be a monotonically incrementing counter, that's an interesting state...
conn=-1 is normal for a syncrepl consumer thread.
Based on these traces, I think this has been fixed with my last commit to result.c. Please update and try again...
So far, so good on this end with HEAD (result.c 1.326).
Aaron Richton wrote:
So far, so good on this end with HEAD (result.c 1.326).
Thanks for the confirmation.
--On Saturday, January 24, 2009 8:15 PM -0500 William Jojo w.jojo@hvcc.edu wrote:
How can I do successive runs of a specific test such as you've described here?
Wait a second, I pulled down OPENLDAP_REL_ENG_2_4 should I be grabbing HEAD for these tests?
The specific call in this was for OPENLDAP_REL_ENG_2_4, so you were using the right branch for testing.
In answer to your first question, in the "tests" directory after you've run make, there'll be a script called "run". You can use it to run specific tests, like:
./run test001
will run test001.
Today, I added a new option to the script (-l) which allows you to run a given test in a loop X amount of times, like:
./run -l 50 test001
would run test001 50 times. If there is an error during a run, it'll exit, so you can examine why the error occurred. This change is in both HEAD and RE24.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration