Fixes to the dreaded solaris hang with back-monitor included. Please test. :)
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 1/13/11 21:08 , Quanah Gibson-Mount wrote:
Fixes to the dreaded solaris hang with back-monitor included. Please test. :)
All clear on OS X 10.6.6 with -march x86_64
jens
Am Thu, 13 Jan 2011 12:08:23 -0800 schrieb Quanah Gibson-Mount quanah@zimbra.com:
Fixes to the dreaded solaris hang with back-monitor included. Please test. :)
./scripts/test020-proxycache failed for hdb (exit 255) make: *** [hdb-yes] Fehler 255
openSuSE-11.3 x86_64
-Dieter
--On Friday, January 14, 2011 6:25 PM +0100 Dieter Kluenter dieter@dkluenter.de wrote:
Am Thu, 13 Jan 2011 12:08:23 -0800 schrieb Quanah Gibson-Mount quanah@zimbra.com:
Fixes to the dreaded solaris hang with back-monitor included. Please test. :)
./scripts/test020-proxycache failed for hdb (exit 255) make: *** [hdb-yes] Fehler 255
While I appreciate you taking the time to test, this report provides no useful information. Please provide some data that can actually be examined for issues.
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
Am Fri, 14 Jan 2011 12:38:31 -0800 schrieb Quanah Gibson-Mount quanah@zimbra.com:
--On Friday, January 14, 2011 6:25 PM +0100 Dieter Kluenter dieter@dkluenter.de wrote:
Am Thu, 13 Jan 2011 12:08:23 -0800 schrieb Quanah Gibson-Mount quanah@zimbra.com:
Fixes to the dreaded solaris hang with back-monitor included. Please test. :)
./scripts/test020-proxycache failed for hdb (exit 255) make: *** [hdb-yes] Fehler 255
While I appreciate you taking the time to test, this report provides no useful information. Please provide some data that can actually be examined for issues.
This submisssion was unintentional and too early, sorry
Starting test020-proxycache for hdb...
Starting master slapd on TCP/IP port 9011... Using ldapsearch to check that master slapd is running... Using ldapadd to populate the master directory... Starting proxy cache on TCP/IP port 9012... Using ldapsearch to check that proxy slapd is running... Waiting 5 seconds for slapd to start... Waiting 5 seconds for slapd to start... Waiting 5 seconds for slapd to start... Waiting 5 seconds for slapd to start... Waiting 5 seconds for slapd to start... Waiting 5 seconds for slapd to start... ldapsearch failed (255)!
./scripts/test020-proxycache failed for hdb (exit 255)
make: *** [hdb-yes] Fehler 255
slapd.2.log doesn't show much, the last lines where
put_filter: simple put_simple_filter: "namingContexts:distinguishedNameMatch:=dc=example,dc=com" ber_scanf fmt ({mm}) ber: ber_scanf fmt ({mm}) ber: ber_scanf fmt ({t) ber: ber_scanf fmt (m) ber: ber_scanf fmt (t) ber: ber_scanf fmt (m) ber: ber_scanf fmt (t) ber: ber_scanf fmt (m) ber: ber_scanf fmt (}) ber:
dnPretty: <dc=example,dc=com>
<<< dnPretty: <dc=example,dc=com>
dnNormalize: <dc=example,dc=com>
<<< dnNormalize: <dc=example,dc=com> => monitor_back_search
dnNormalize: <dc=example,dc=com>
<<< dnNormalize: <dc=example,dc=com> send_ldap_result: conn=-1 op=0 p=0
The core doesn't provide much information either:
Core was generated by `/home/dieter/build/openldap/servers/slapd/.libs/lt-slapd -s0 -f /home/dieter/bu'. Program terminated with signal 11, Segmentation fault. #0 0x00002b8afa89117d in ?? () (gdb) bt #0 0x00002b8afa89117d in ?? ()
-Dieter
Am Fri, 14 Jan 2011 12:38:31 -0800 schrieb Quanah Gibson-Mount quanah@zimbra.com:
--On Friday, January 14, 2011 6:25 PM +0100 Dieter Kluenter dieter@dkluenter.de wrote:
Am Thu, 13 Jan 2011 12:08:23 -0800 schrieb Quanah Gibson-Mount quanah@zimbra.com:
Fixes to the dreaded solaris hang with back-monitor included. Please test. :)
./scripts/test020-proxycache failed for hdb (exit 255) make: *** [hdb-yes] Fehler 255
While I appreciate you taking the time to test, this report provides no useful information. Please provide some data that can actually be examined for issues.
This submisssion was unintentional and too early, sorry
Starting test020-proxycache for hdb...
Starting master slapd on TCP/IP port 9011... Using ldapsearch to check that master slapd is running... Using ldapadd to populate the master directory... Starting proxy cache on TCP/IP port 9012... Using ldapsearch to check that proxy slapd is running... Waiting 5 seconds for slapd to start... Waiting 5 seconds for slapd to start... Waiting 5 seconds for slapd to start... Waiting 5 seconds for slapd to start... Waiting 5 seconds for slapd to start... Waiting 5 seconds for slapd to start... ldapsearch failed (255)!
./scripts/test020-proxycache failed for hdb (exit 255)
make: *** [hdb-yes] Fehler 255
slapd.2.log doesn't show much, the last lines where
put_filter: simple put_simple_filter: "namingContexts:distinguishedNameMatch:=dc=example,dc=com" ber_scanf fmt ({mm}) ber: ber_scanf fmt ({mm}) ber: ber_scanf fmt ({t) ber: ber_scanf fmt (m) ber: ber_scanf fmt (t) ber: ber_scanf fmt (m) ber: ber_scanf fmt (t) ber: ber_scanf fmt (m) ber: ber_scanf fmt (}) ber:
dnPretty: <dc=example,dc=com>
<<< dnPretty: <dc=example,dc=com>
dnNormalize: <dc=example,dc=com>
<<< dnNormalize: <dc=example,dc=com> => monitor_back_search
dnNormalize: <dc=example,dc=com>
<<< dnNormalize: <dc=example,dc=com> send_ldap_result: conn=-1 op=0 p=0
The core doesn't provide much information either:
Core was generated by `/home/dieter/build/openldap/servers/slapd/.libs/lt-slapd -s0 -f /home/dieter/bu'. Program terminated with signal 11, Segmentation fault. #0 0x00002b8afa89117d in ?? () (gdb) bt #0 0x00002b8afa89117d in ?? ()
Maybe you can start slapd manually (-h ldap://:9012 -f tesrtrun/slapd.2.conf -dargs,trace,stats) and see if it becomes responsive and so. What you show is the result of an internal operation (conn=-1), and there seems to be no logging related to the ldapsearch that's checking if it started correctly.
p.
Am Sat, 15 Jan 2011 18:01:31 +0100 (CET) schrieb masarati@aero.polimi.it:
Am Fri, 14 Jan 2011 12:38:31 -0800 schrieb Quanah Gibson-Mount quanah@zimbra.com:
--On Friday, January 14, 2011 6:25 PM +0100 Dieter Kluenter dieter@dkluenter.de wrote:
Am Thu, 13 Jan 2011 12:08:23 -0800 schrieb Quanah Gibson-Mount quanah@zimbra.com:
Fixes to the dreaded solaris hang with back-monitor included. Please test. :)
./scripts/test020-proxycache failed for hdb (exit 255) make: *** [hdb-yes] Fehler 255
While I appreciate you taking the time to test, this report provides no useful information. Please provide some data that can actually be examined for issues.
This submisssion was unintentional and too early, sorry
> Starting test020-proxycache for hdb...
Starting master slapd on TCP/IP port 9011... Using ldapsearch to check that master slapd is running... Using ldapadd to populate the master directory... Starting proxy cache on TCP/IP port 9012... Using ldapsearch to check that proxy slapd is running... Waiting 5 seconds for slapd to start... Waiting 5 seconds for slapd to start... Waiting 5 seconds for slapd to start... Waiting 5 seconds for slapd to start... Waiting 5 seconds for slapd to start... Waiting 5 seconds for slapd to start... ldapsearch failed (255)!
> ./scripts/test020-proxycache failed for hdb (exit 255)
make: *** [hdb-yes] Fehler 255
slapd.2.log doesn't show much, the last lines where
put_filter: simple put_simple_filter: "namingContexts:distinguishedNameMatch:=dc=example,dc=com" ber_scanf fmt ({mm}) ber: ber_scanf fmt ({mm}) ber: ber_scanf fmt ({t) ber: ber_scanf fmt (m) ber: ber_scanf fmt (t) ber: ber_scanf fmt (m) ber: ber_scanf fmt (t) ber: ber_scanf fmt (m) ber: ber_scanf fmt (}) ber:
dnPretty: <dc=example,dc=com>
<<< dnPretty: <dc=example,dc=com>
dnNormalize: <dc=example,dc=com>
<<< dnNormalize: <dc=example,dc=com> => monitor_back_search
dnNormalize: <dc=example,dc=com>
<<< dnNormalize: <dc=example,dc=com> send_ldap_result: conn=-1 op=0 p=0
The core doesn't provide much information either:
Core was generated by `/home/dieter/build/openldap/servers/slapd/.libs/lt-slapd -s0 -f /home/dieter/bu'. Program terminated with signal 11, Segmentation fault. #0 0x00002b8afa89117d in ?? () (gdb) bt #0 0x00002b8afa89117d in ?? ()
Maybe you can start slapd manually (-h ldap://:9012 -f tesrtrun/slapd.2.conf -dargs,trace,stats) and see if it becomes responsive and so. What you show is the result of an internal operation (conn=-1), and there seems to be no logging related to the ldapsearch that's checking if it started correctly.
I did run both slapd manually -with -d256, this is the outcome: <server 1> ../servers/slapd/.libs/lt-slapd -h ldap://:9010/ -f testrun/slapd.1.conf -d256 @(#) $OpenLDAP: slapd 2.4.X (Jan 15 2011 16:45:59) $ dieter@rubin:/home/dieter/build/openldap/servers/slapd hdb_db_open: warning - no DB_CONFIG file found in directory /home/dieter/build/openldap/tests/testrun/db.1.a: (2). Expect poor performance for suffix "dc=example,dc=com". slapd starting
<server 2> ../servers/slapd/.libs/lt-slapd -h ldap://:9011/ -f testrun/slapd.2.conf -d256 @(#) $OpenLDAP: slapd 2.4.X (Jan 15 2011 16:45:59) $ dieter@rubin:/home/dieter/build/openldap/servers/slapd hdb_db_open: database "dc=example,dc=com": unclean shutdown detected; attempting recovery. hdb_db_open: warning - no DB_CONFIG file found in directory /home/dieter/build/openldap/tests/testrun/db.2.a: (2). Expect poor performance for suffix "dc=example,dc=com".
and than the process hangs infinitely a strace attached throws constantly
sched_yield() = 0 sched_yield() = 0 sched_yield() = 0 sched_yield() = 0 sched_yield() = 0
only a kill -9 killed this process.
running server 2 with -d-1 show the last lines
access_allowed: search access to "cn=Database 2,cn=Databases,cn=Monitor" "monitoredInfo" requested
=> slap_access_allowed: backend default search access granted to "(anonymous)" => access_allowed: search access granted by read(=rscxd) <= test_filter 5 <= test_filter_and 5 <= test_filter 5 send_ldap_result: conn=-1 op=0 p=0 send_ldap_result: err=0 matched="" text=""
and here it hangs
-Dieter
Am Sat, 15 Jan 2011 18:01:31 +0100 (CET) schrieb masarati@aero.polimi.it:
Am Fri, 14 Jan 2011 12:38:31 -0800 schrieb Quanah Gibson-Mount quanah@zimbra.com:
--On Friday, January 14, 2011 6:25 PM +0100 Dieter Kluenter dieter@dkluenter.de wrote:
Am Thu, 13 Jan 2011 12:08:23 -0800 schrieb Quanah Gibson-Mount quanah@zimbra.com:
Fixes to the dreaded solaris hang with back-monitor included. Please test. :)
./scripts/test020-proxycache failed for hdb (exit 255) make: *** [hdb-yes] Fehler 255
While I appreciate you taking the time to test, this report provides no useful information. Please provide some data that can actually be examined for issues.
This submisssion was unintentional and too early, sorry
> Starting test020-proxycache for hdb...
Starting master slapd on TCP/IP port 9011... Using ldapsearch to check that master slapd is running... Using ldapadd to populate the master directory... Starting proxy cache on TCP/IP port 9012... Using ldapsearch to check that proxy slapd is running... Waiting 5 seconds for slapd to start... Waiting 5 seconds for slapd to start... Waiting 5 seconds for slapd to start... Waiting 5 seconds for slapd to start... Waiting 5 seconds for slapd to start... Waiting 5 seconds for slapd to start... ldapsearch failed (255)!
> ./scripts/test020-proxycache failed for hdb (exit 255)
make: *** [hdb-yes] Fehler 255
slapd.2.log doesn't show much, the last lines where
put_filter: simple put_simple_filter: "namingContexts:distinguishedNameMatch:=dc=example,dc=com" ber_scanf fmt ({mm}) ber: ber_scanf fmt ({mm}) ber: ber_scanf fmt ({t) ber: ber_scanf fmt (m) ber: ber_scanf fmt (t) ber: ber_scanf fmt (m) ber: ber_scanf fmt (t) ber: ber_scanf fmt (m) ber: ber_scanf fmt (}) ber:
dnPretty: <dc=example,dc=com>
<<< dnPretty: <dc=example,dc=com>
dnNormalize: <dc=example,dc=com>
<<< dnNormalize: <dc=example,dc=com> => monitor_back_search
dnNormalize: <dc=example,dc=com>
<<< dnNormalize: <dc=example,dc=com> send_ldap_result: conn=-1 op=0 p=0
The core doesn't provide much information either:
Core was generated by `/home/dieter/build/openldap/servers/slapd/.libs/lt-slapd -s0 -f /home/dieter/bu'. Program terminated with signal 11, Segmentation fault. #0 0x00002b8afa89117d in ?? () (gdb) bt #0 0x00002b8afa89117d in ?? ()
Maybe you can start slapd manually (-h ldap://:9012 -f tesrtrun/slapd.2.conf -dargs,trace,stats) and see if it becomes responsive and so. What you show is the result of an internal operation (conn=-1), and there seems to be no logging related to the ldapsearch that's checking if it started correctly.
This error might be due to -DBDB_MONITOR_IDX, if I compile without this flag, all tests run well. There might be something else strange in my script, please check:
export BDBDIR="/usr/local/BerkeleyDB.4.8" export CFLAGS="-DBDB_MONITOR_IDX -g3 -march=athlon64" export LDFLAGS="-L${BDBDIR}/lib -R${BDBDIR}/lib" export CPPFLAGS="-I${BDBDIR}/include" PREFIX="/home/dieter/openldap/" DATABASE="hdb" make distclean ; ./configure \ --prefix=${PREFIX} \ --enable-dynamic \ --enable-aci \ --enable-modules \ --enable-rewrite \ --enable-bdb=yes \ --enable-hdb=yes \ --enable-ldap=yes \ --enable-monitor=yes \ --enable-meta=mod \ --enable-perl=mod \ --enable-relay=mod \ --enable-monitor=yes \ --enable-sql=mod \ --enable-overlays=mod make depend && make && cd tests export DB_CONFIG=/tmp/slapd1/DB_CONFIG export USE_SASL=yes export SLAPD_DEBUG=1 sleep 5 ; make $DATABASE ; exit 0
-Dieter
Dieter Kluenter wrote:
Am Sat, 15 Jan 2011 18:01:31 +0100 (CET) schrieb masarati@aero.polimi.it:
Am Fri, 14 Jan 2011 12:38:31 -0800 schrieb Quanah Gibson-Mount quanah@zimbra.com:
--On Friday, January 14, 2011 6:25 PM +0100 Dieter Kluenter dieter@dkluenter.de wrote:
Am Thu, 13 Jan 2011 12:08:23 -0800 schrieb Quanah Gibson-Mount quanah@zimbra.com:
Fixes to the dreaded solaris hang with back-monitor included. Please test. :)
./scripts/test020-proxycache failed for hdb (exit 255) make: *** [hdb-yes] Fehler 255
While I appreciate you taking the time to test, this report provides no useful information. Please provide some data that can actually be examined for issues.
This submisssion was unintentional and too early, sorry
>> Starting test020-proxycache for hdb...
Starting master slapd on TCP/IP port 9011... Using ldapsearch to check that master slapd is running... Using ldapadd to populate the master directory... Starting proxy cache on TCP/IP port 9012... Using ldapsearch to check that proxy slapd is running... Waiting 5 seconds for slapd to start... Waiting 5 seconds for slapd to start... Waiting 5 seconds for slapd to start... Waiting 5 seconds for slapd to start... Waiting 5 seconds for slapd to start... Waiting 5 seconds for slapd to start... ldapsearch failed (255)!
>> ./scripts/test020-proxycache failed for hdb (exit 255)
make: *** [hdb-yes] Fehler 255
slapd.2.log doesn't show much, the last lines where
put_filter: simple put_simple_filter: "namingContexts:distinguishedNameMatch:=dc=example,dc=com" ber_scanf fmt ({mm}) ber: ber_scanf fmt ({mm}) ber: ber_scanf fmt ({t) ber: ber_scanf fmt (m) ber: ber_scanf fmt (t) ber: ber_scanf fmt (m) ber: ber_scanf fmt (t) ber: ber_scanf fmt (m) ber: ber_scanf fmt (}) ber:
dnPretty: <dc=example,dc=com>
<<< dnPretty: <dc=example,dc=com>
dnNormalize: <dc=example,dc=com>
<<< dnNormalize: <dc=example,dc=com> => monitor_back_search
dnNormalize: <dc=example,dc=com>
<<< dnNormalize: <dc=example,dc=com> send_ldap_result: conn=-1 op=0 p=0
The core doesn't provide much information either:
Core was generated by `/home/dieter/build/openldap/servers/slapd/.libs/lt-slapd -s0 -f /home/dieter/bu'. Program terminated with signal 11, Segmentation fault. #0 0x00002b8afa89117d in ?? () (gdb) bt #0 0x00002b8afa89117d in ?? ()
Maybe you can start slapd manually (-h ldap://:9012 -f tesrtrun/slapd.2.conf -dargs,trace,stats) and see if it becomes responsive and so. What you show is the result of an internal operation (conn=-1), and there seems to be no logging related to the ldapsearch that's checking if it started correctly.
This error might be due to -DBDB_MONITOR_IDX, if I compile without this flag, all tests run well. There might be something else strange in my script, please check:
export BDBDIR="/usr/local/BerkeleyDB.4.8" export CFLAGS="-DBDB_MONITOR_IDX -g3 -march=athlon64" export LDFLAGS="-L${BDBDIR}/lib -R${BDBDIR}/lib" export CPPFLAGS="-I${BDBDIR}/include" PREFIX="/home/dieter/openldap/" DATABASE="hdb" make distclean ; ./configure \ --prefix=${PREFIX} \ --enable-dynamic \ --enable-aci \ --enable-modules \ --enable-rewrite \ --enable-bdb=yes \ --enable-hdb=yes \ --enable-ldap=yes \ --enable-monitor=yes \ --enable-meta=mod \ --enable-perl=mod \ --enable-relay=mod \ --enable-monitor=yes \ --enable-sql=mod \ --enable-overlays=mod make depend && make && cd tests export DB_CONFIG=/tmp/slapd1/DB_CONFIG export USE_SASL=yes export SLAPD_DEBUG=1 sleep 5 ; make $DATABASE ; exit 0
cannot reproduce; I built re24 with all backends/overlays built as dynamic modules and -DBDB_MONITOR_IDX and this test seems to work fine, including correctly populating the olmBDBNotIndexed attribute for non-indexed searches. Same when using exactly the same of your configure. Any other hint?
p.
Am Mon, 17 Jan 2011 13:12:24 +0100 schrieb Pierangelo Masarati masarati@aero.polimi.it:
Dieter Kluenter wrote:
Am Sat, 15 Jan 2011 18:01:31 +0100 (CET) schrieb masarati@aero.polimi.it:
Am Fri, 14 Jan 2011 12:38:31 -0800 schrieb Quanah Gibson-Mount quanah@zimbra.com:
--On Friday, January 14, 2011 6:25 PM +0100 Dieter Kluenter dieter@dkluenter.de wrote:
Am Thu, 13 Jan 2011 12:08:23 -0800 schrieb Quanah Gibson-Mount quanah@zimbra.com:
[...]
cannot reproduce; I built re24 with all backends/overlays built as dynamic modules and -DBDB_MONITOR_IDX and this test seems to work fine, including correctly populating the olmBDBNotIndexed attribute for non-indexed searches. Same when using exactly the same of your configure. Any other hint?
Unfortunately I have no hint. In the last few days I have compiled re24 multiple times always with the same result. OS and software are installed on bare metal, not a virtual machine, all I can think of is the btrfs file system. OTOH I just compiled and tested successfully HEAD of today, with the only difference ext4 file system.
-Dieter
Dieter Kluenter wrote:
Am Mon, 17 Jan 2011 13:12:24 +0100 schrieb Pierangelo Masarati masarati@aero.polimi.it:
Dieter Kluenter wrote:
Am Sat, 15 Jan 2011 18:01:31 +0100 (CET) schrieb masarati@aero.polimi.it:
Am Fri, 14 Jan 2011 12:38:31 -0800 schrieb Quanah Gibson-Mount quanah@zimbra.com:
--On Friday, January 14, 2011 6:25 PM +0100 Dieter Kluenter dieter@dkluenter.de wrote:
> Am Thu, 13 Jan 2011 12:08:23 -0800 > schrieb Quanah Gibson-Mount quanah@zimbra.com:
[...]
cannot reproduce; I built re24 with all backends/overlays built as dynamic modules and -DBDB_MONITOR_IDX and this test seems to work fine, including correctly populating the olmBDBNotIndexed attribute for non-indexed searches. Same when using exactly the same of your configure. Any other hint?
Unfortunately I have no hint. In the last few days I have compiled re24 multiple times always with the same result. OS and software are installed on bare metal, not a virtual machine, all I can think of is the btrfs file system. OTOH I just compiled and tested successfully HEAD of today, with the only difference ext4 file system.
Another possible difference is that I used no optimization (-O0 -g) because I need to be able to step in with gdb.
p.
Dieter Kluenter wrote:
./scripts/test020-proxycache failed for hdb (exit 255) make: *** [hdb-yes] Fehler 255
openSuSE-11.3 x86_64
I'm also running the build and tests on openSuSE-11.3 x86_64 with all the required -devel packages installed with the openSUSE 11.3 distro. No errors for me with 15+ iterations.
So I guess what's needed to track this down are the logs in directory tests/testrun/ after failure.
Ciao, Michael.