The changes made to 2.4.17 seem to have fixed the crashes in the caching module. Thanks for that.
We still are able to crash 2.4.17, however. It only happens after a heavy load is placed on the producer for >24 hours continuous. Unfortunately, we've not been able to get good tracebacks. They all look like this,
(gdb) where #0 0x00869410 in __kernel_vsyscall () #1 0x00390d80 in raise () from /lib/libc.so.6 #2 0x00392691 in abort () from /lib/libc.so.6 #3 0x0038a1fb in __assert_fail () from /lib/libc.so.6 #4 0x0808d532 in malloc () #5 0x0822c93f in ?? () #6 0x0822c933 in ?? () #7 0x00000039 in ?? () #8 0x0822c908 in ?? () #9 0x00000000 in ?? ()
The producer slowly grows its memory footprint. I can't tell if it's from just normal operations or memory leaks. I suspect it's a little of both. The end result, as you can see from the core above, is that there's likely some corrupted (or unfreed) memory somewhere. Sorry I can't nail it down further.
The load profile that we placed on the server is documented in my prior report. See above.
--- Tracy Stenvik University Computing Services 354843. University of Washington email: imf@u.washington.edu voice: (206) 685-3344