We recently upgraded from 2.3.30 to 2.3.41. Ever since, slapd has died about once a week due to an assertion failure. A backtrace is below, but I forgot to disable optimization in the build, so there's been some inlining.
The only assertions in connection_close() are:
assert( connections != NULL ); assert( c != NULL ); [...] assert( c->c_struct_state == SLAP_C_USED ); assert( c->c_conn_state == SLAP_C_CLOSING );
I'm not sure if this gives enough to go on, but I've rebuilt with optimization disabled and have a debugger attached for the next time it fails.
Program received signal SIGABRT, Aborted. [Switching to Thread -1884128336 (LWP 26138)] 0xb7b98947 in raise () from /lib/tls/libc.so.6 #0 0xb7b98947 in raise () from /lib/tls/libc.so.6 #1 0xb7b9a0c9 in abort () from /lib/tls/libc.so.6 #2 0xb7b9205f in __assert_fail () from /lib/tls/libc.so.6 #3 0x0806d052 in connection_close (c=<value optimized out>) at /var/jwm/openldap/servers/slapd/connection.c:680 #4 0x0806d67d in connection_operation (ctx=0x8fb272c8, arg_v=0x993a830) at /var/jwm/openldap/servers/slapd/connection.c:1722 #5 0xb7f14e7f in ldap_int_thread_pool_wrapper (xpool=0x814d358) at /var/jwm/openldap/libraries/libldap_r/tpool.c:478 #6 0xb7ca70bd in start_thread () from /lib/tls/libpthread.so.0 #7 0xb7c3c01e in clone () from /lib/tls/libc.so.6
john
--On Wednesday, March 26, 2008 10:57 AM -0400 John Morrissey jwm@horde.net wrote:
I'm not sure if this gives enough to go on, but I've rebuilt with optimization disabled and have a debugger attached for the next time it fails.
You may never hit it. I recall having this problem with certain buggy versions of gcc (which is pretty much any version of gcc) in how it optimizes the code. I never build OL with optimizations anymore, as gcc just cannot do the job right.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
On Wed, Mar 26, 2008 at 12:06:45PM -0700, Quanah Gibson-Mount wrote:
--On Wednesday, March 26, 2008 10:57 AM -0400 John Morrissey jwm@horde.net wrote:
I'm not sure if this gives enough to go on, but I've rebuilt with optimization disabled and have a debugger attached for the next time it fails.
You may never hit it. I recall having this problem with certain buggy versions of gcc (which is pretty much any version of gcc) in how it optimizes the code. I never build OL with optimizations anymore, as gcc just cannot do the job right.
The unoptimized build crashed earlier today; I only have a partial backtrace (below), but I can try to get a complete one next time if it would be more useful.
The assertion 'c->c_writewaiter == 0' in connection_destroy() is what seems to be failing.
#0 0xb7be3947 in raise () from /lib/tls/libc.so.6 #1 0xb7be50c9 in abort () from /lib/tls/libc.so.6 #2 0xb7bdd05f in __assert_fail () from /lib/tls/libc.so.6 #3 0x08071d03 in connection_destroy (c=0x9036b6a8) at /var/jwm/o2/openldap/servers/slapd/connection.c:680 #4 0x080726ab in connection_close (c=0x9036b6a8) at /var/jwm/o2/openldap/servers/slapd/connection.c:900 #5 0x080740bd in connection_resched (conn=0x9036b6a8) at /var/jwm/o2/openldap/servers/slapd/connection.c:1722 #6 0x08073009 in connection_operation (ctx=0x903672b4, arg_v=0x8245118) at /var/jwm/o2/openldap/servers/slapd/connection.c:1179
john
On Sun, Apr 06, 2008 at 09:41:31PM -0400, John Morrissey wrote:
On Wed, Mar 26, 2008 at 12:06:45PM -0700, Quanah Gibson-Mount wrote:
--On Wednesday, March 26, 2008 10:57 AM -0400 John Morrissey jwm@horde.net wrote:
I'm not sure if this gives enough to go on, but I've rebuilt with optimization disabled and have a debugger attached for the next time it fails.
You may never hit it. I recall having this problem with certain buggy versions of gcc (which is pretty much any version of gcc) in how it optimizes the code. I never build OL with optimizations anymore, as gcc just cannot do the job right.
The unoptimized build crashed earlier today; I only have a partial backtrace (below), but I can try to get a complete one next time if it would be more useful.
The assertion 'c->c_writewaiter == 0' in connection_destroy() is what seems to be failing.
[snip]
FWIW, here's the complete backtrace. slapd's crashing about every three or four days under moderate load (three or four writes/second, maybe a dozen reads/sec). If there's anything else I can provide to help debug this, please let me know.
Program received signal SIGABRT, Aborted. [Switching to Thread -1884222544 (LWP 15692)] 0xb7b8c947 in raise () from /lib/tls/libc.so.6 #0 0xb7b8c947 in raise () from /lib/tls/libc.so.6 #1 0xb7b8e0c9 in abort () from /lib/tls/libc.so.6 #2 0xb7b8605f in __assert_fail () from /lib/tls/libc.so.6 #3 0x08071d03 in connection_destroy (c=0x903143c8) at /var/jwm/o2/openldap/servers/slapd/connection.c:680 #4 0x080726ab in connection_close (c=0x903143c8) at /var/jwm/o2/openldap/servers/slapd/connection.c:900 #5 0x080740bd in connection_resched (conn=0x903143c8) at /var/jwm/o2/openldap/servers/slapd/connection.c:1722 #6 0x08073009 in connection_operation (ctx=0x8fb102b4, arg_v=0x823eeb0) at /var/jwm/o2/openldap/servers/slapd/connection.c:1179 #7 0xb7f0afd9 in ldap_int_thread_pool_wrapper (xpool=0x816f358) at /var/jwm/o2/openldap/libraries/libldap_r/tpool.c:478 #8 0xb7c9b0bd in start_thread () from /lib/tls/libpthread.so.0 #9 0xb7c3001e in clone () from /lib/tls/libc.so.6
john
John Morrissey wrote:
FWIW, here's the complete backtrace. slapd's crashing about every three or four days under moderate load (three or four writes/second, maybe a dozen reads/sec). If there's anything else I can provide to help debug this, please let me know.
Get the full trace for all threads in the process, not just the one that aborted. In gdb: thread apply all bt full
--On Thursday, April 10, 2008 10:36 AM -0400 John Morrissey jwm@horde.net wrote:
On Sun, Apr 06, 2008 at 09:41:31PM -0400, John Morrissey wrote:
On Wed, Mar 26, 2008 at 12:06:45PM -0700, Quanah Gibson-Mount wrote:
--On Wednesday, March 26, 2008 10:57 AM -0400 John Morrissey jwm@horde.net wrote:
I'm not sure if this gives enough to go on, but I've rebuilt with optimization disabled and have a debugger attached for the next time it fails.
You may never hit it. I recall having this problem with certain buggy versions of gcc (which is pretty much any version of gcc) in how it optimizes the code. I never build OL with optimizations anymore, as gcc just cannot do the job right.
The unoptimized build crashed earlier today; I only have a partial backtrace (below), but I can try to get a complete one next time if it would be more useful.
The assertion 'c->c_writewaiter == 0' in connection_destroy() is what seems to be failing.
[snip]
FWIW, here's the complete backtrace. slapd's crashing about every three or four days under moderate load (three or four writes/second, maybe a dozen reads/sec). If there's anything else I can provide to help debug this, please let me know.
Program received signal SIGABRT, Aborted. [Switching to Thread -1884222544 (LWP 15692)] 0xb7b8c947 in raise () from /lib/tls/libc.so.6 # 0 0xb7b8c947 in raise () from /lib/tls/libc.so.6 # 1 0xb7b8e0c9 in abort () from /lib/tls/libc.so.6 # 2 0xb7b8605f in __assert_fail () from /lib/tls/libc.so.6 # 3 0x08071d03 in connection_destroy (c=0x903143c8) at /var/jwm/o2/openldap/servers/slapd/connection.c:680 # 4 0x080726ab in connection_close (c=0x903143c8) at /var/jwm/o2/openldap/servers/slapd/connection.c:900 # 5 0x080740bd in connection_resched (conn=0x903143c8) at /var/jwm/o2/openldap/servers/slapd/connection.c:1722 # 6 0x08073009 in connection_operation (ctx=0x8fb102b4, arg_v=0x823eeb0) at /var/jwm/o2/openldap/servers/slapd/connection.c:1179 # 7 0xb7f0afd9 in ldap_int_thread_pool_wrapper (xpool=0x816f358) at /var/jwm/o2/openldap/libraries/libldap_r/tpool.c:478 # 8 0xb7c9b0bd in start_thread () from /lib/tls/libpthread.so.0 # 9 0xb7c3001e in clone () from /lib/tls/libc.so.6
Open an ITS with the relevant backtrace information, and add your configs to it, minus any passwords.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
openldap-software@openldap.org