Hello,
I am having trouble with Segfaults on Solaris Sparc, it seems like the sporadic error that some people have coming up during testing, but not others.
Oddly, this only happens for me on an older not-patched box myunhappyserver, and NOT a more recently patched box myhappyserver (same slapd/bdb binaries and libraries)
The "new" box (happily runs all openldap versions i have tried, up to and including the stable version based on 2.4.19) is :
SunOS myhappyserver 5.10 Generic_141414-07 sun4v sparc SUNW,Sun-Fire-T200
The "old" box (segfaults even on simple start sometimes - but very occasionally it will run for awhile before it segfaults) is :
SunOS myunhappyserver 5.10 Generic_127111-11 sun4v sparc SUNW,Sun-Fire-T200
From adb on "myunhappyserver" i get (in response to "adb slapd
core_qgpro01_slapd_404_404_1260924086_4710") :
::status debugging core file of slapd (64-bit) from myunhappyserver file: slapd initial argv: /usr/local/openldap/libexec/slapd -f /usr/local/openldap/etc/openldap/slapd_who threading model: multi-threaded using native lwps status: process terminated by SIGSEGV (Segmentation Fault) ::regs %g0 = 0x0000000000000000 %l0 = 0x000000010050ef48 %g1 = 0x000000010551e0d8 %l1 = 0x0000000105522440 %g2 = 0x0000000105551220 %l2 = 0x000000010050f020 %g3 = 0x0000000000000000 %l3 = 0x00000001001225e0 %g4 = 0x0000000000000000 %l4 = 0x0000000100122610 %g5 = 0x0000000000000008 %l5 = 0x0000000000000000 %g6 = 0x0000000000000000 %l6 = 0x00000001001bbdf8 avl_dup_error %g7 = 0xffffffff7e802200 %l7 = 0x0000000105522440 %o0 = 0x0000000000000000 %i0 = 0x000000010551d110 %o1 = 0xffffffffffffffff %i1 = 0x0000000000000000 %o2 = 0x0000000105551220 %i2 = 0x000000010050ef40 %o3 = 0xffffffff00000001 %i3 = 0x0000000105522440 %o4 = 0x0000000000000000 %i4 = 0x000000010050ef60 %o5 = 0x00000000ffffffff %i5 = 0x000000010050eef0 %o6 = 0xffffffff6c27e4f1 %i6 = 0xffffffff6c27e6d1 %o7 = 0x0000000100122bd8 hdb_cache_find_parent+0x124 %i7 = 0x00000001001233a8 hdb_cache_find_id+0x13c
%ccr = 0x44 xcc=nZvc icc=nZvc %y = 0x0000000000000000 %pc = 0x0000000100122c8c hdb_cache_find_parent+0x1d8 %npc = 0x0000000100122c90 hdb_cache_find_parent+0x1dc %sp = 0xffffffff6c27e4f1 %fp = 0xffffffff6c27e6d1
%asi = 0x82 %fprs = 0x07
If i can do any more useful things with adb or the like, please provide some example and i'll run it.
The ldd for slapd on the "new" server returns :
-bash-3.00$ ldd /usr/local/openldap/libexec/slapd libdb-4.8.so => /usr/local/openldap/lib/libdb-4.8.so librt.so.1 => /lib/64/librt.so.1 libperl.so => /usr/local/lib/perl5/5.10.1/sun4-solaris-thread-multi-64/CORE/libperl.so libm.so.2 => /lib/64/libm.so.2 libpthread.so.1 => /lib/64/libpthread.so.1 libicuuc.so.3 => /usr/lib/64/libicuuc.so.3 libicudata.so.3 => /usr/lib/64/libicudata.so.3 libsasl2.so.2 => /usr/local/openldap/lib/libsasl2.so.2 libdl.so.1 => /lib/64/libdl.so.1 libssl.so.0.9.7 => /usr/sfw/lib/sparcv9/libssl.so.0.9.7 libcrypto.so.0.9.7 => /usr/sfw/lib/sparcv9/libcrypto.so.0.9.7 libresolv.so.2 => /lib/64/libresolv.so.2 libgen.so.1 => /lib/64/libgen.so.1 libnsl.so.1 => /lib/64/libnsl.so.1 libsocket.so.1 => /lib/64/libsocket.so.1 libc.so.1 => /lib/64/libc.so.1 libaio.so.1 => /lib/64/libaio.so.1 libmd.so.1 => /lib/64/libmd.so.1 libCrun.so.1 => /usr/lib/64/libCrun.so.1 libmp.so.2 => /lib/64/libmp.so.2 libscf.so.1 => /lib/64/libscf.so.1 libdoor.so.1 => /lib/64/libdoor.so.1 libuutil.so.1 => /lib/64/libuutil.so.1 libssl_extra.so.0.9.7 => /usr/sfw/lib/sparcv9/libssl_extra.so.0.9.7 libcrypto_extra.so.0.9.7 => /usr/sfw/lib/sparcv9/libcrypto_extra.so.0.9.7 /platform/SUNW,Sun-Fire-T200/lib/sparcv9/libc_psr.so.1 /platform/SUNW,Sun-Fire-T200/lib/sparcv9/libmd_psr.so.1
The ldd for slapd on the "old" server adds (which would appear to be harmless) :
libgss.so.1 => /usr/lib/64/libgss.so.1 libcmd.so.1 => /lib/64/libcmd.so.1
All compiled libraries are 64 bit, linked against the 64 bit solaris libraries.
I am using the same libraries for berkeley 4.8, on all servers.
Cheers Brett
On Thu, 17 Dec 2009, Brett @Google wrote:
Oddly, this only happens for me on an older not-patched box myunhappyserver, and NOT a more recently patched box myhappyserver (same slapd/bdb binaries and libraries)
What makes you believe that you haven't figured this out in full (i.e. root cause is a corruption in a buggy system call; patch the system call and the corruption disappears)? With these data points, it sounds like you're going to go through a ton of work just to end up with the number of a Solaris bug report that's already marked as Fix Delivered.
--On Thursday, December 17, 2009 1:12 PM +1000 "Brett @Google" brett.maxfield@gmail.com wrote:
Hello,
I am having trouble with Segfaults on Solaris Sparc, it seems like the sporadic error that some people have coming up during testing, but not others.
Oddly, this only happens for me on an older not-patched box myunhappyserver, and NOT a more recently patched box myhappyserver (same slapd/bdb binaries and libraries)
The "new" box (happily runs all openldap versions i have tried, up to and including the stable version based on 2.4.19) is :
SunOS myhappyserver 5.10 Generic_141414-07 sun4v sparc SUNW,Sun-Fire-T200
The "old" box (segfaults even on simple start sometimes - but very occasionally it will run for awhile before it segfaults) is :
SunOS myunhappyserver 5.10 Generic_127111-11 sun4v sparc SUNW,Sun-Fire-T200
From adb on "myunhappyserver" i get (in response to "adb slapd core_qgpro01_slapd_404_404_1260924086_4710") :
Sounds like a question for Sun to me. If their patch levels fix the bug, then it must be a problem they know about?
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
On Thu, Dec 17, 2009 at 1:12 PM, Brett @Google brett.maxfield@gmail.comwrote:
Hello,
I am having trouble with Segfaults on Solaris Sparc, it seems like the sporadic error that some people have coming up during testing, but not others.
Oddly, this only happens for me on an older not-patched box myunhappyserver, and NOT a more recently patched box myhappyserver (same slapd/bdb binaries and libraries)
this has turned out to be a "Sun" issue or user error, depending on your opinion, and only for the sun studio 12 compiler:
I was using : CFLAGS="-fast -xtarget=ultraT1 -xarch=sparcvis2 -xcode=pic32 -g -xs -O"
for some reason, using -O (which translates -x03) maybe conflicted / overrided the sun "macro" option -fast, which implies -x05 and quite a few other options.
The following is now working ok (no segfaults) : CFLAGS="-fast -xtarget=ultraT1 -xarch=sparcvis2 -xcode=pic32 -m64"
also prefixed "-fast -m64" to LDFLAGS as per sun docs, so this might also have helped.
(the omission of -g -xs are coincidental, they are only options to turn on debugging)
Cheers Brett
openldap-software@openldap.org