Hello
I'm setting up an openldap directory server (2.4.13) including a 2nd one for as backup/failover partner. After i compiled everything, installing, configuring everything (database, sync, schema, etc) and imported the basic LDAP Layout (ou=Users,ou=Groups, etc), I wanted to use this directory as Userdirectory for Userauthentication on ldap.
I was switching user/group lookups using the 'ldapclient' command and modified to /etc/nsswitch.conf to refer for "files ldap" for passwd and group.
Everything seems to work. 'genent passwd' and 'getent group' is listing my ldap user and groups. But when i try to restart the slapd server it crashes sometimes with a coredump.
---------------------------------------------------------------------- # /usr/local/libexec/slapd -d 65535 -u openldap -g openldap @(#) $OpenLDAP: slapd 2.4.13 (Jan 30 2009 12:02:48) $ root@ldapserver:/usr/local/src/openldap-2.4.13/servers/slapd ldap_pvt_gethostbyname_a: host=ldapserver, r=0 daemon_init: <null> daemon_init: listen on ldap:/// daemon_init: 1 listeners to open... ldap_url_parse_ext(ldap:///) daemon: listener initialized ldap:/// daemon_init: 2 listeners opened ldap_create Bus Error (core dumped) ----------------------------------------------------------------------
Not everytime, sometimes several times in a row, sometimes after a 2nd start. I've no clue what it could be.
In the ldap logfile, i found the 2 following lines
---------------------------------------------------------------------- an 30 16:26:56 ldapserver slapd[9494]: [ID 555073 local4.error] tid= 1: multiple threads per connection not supported Jan 30 16:26:56 ldapserver slapd[9494]: [ID 555073 local4.error] tid= 1: multiple threads per connection not supported ----------------------------------------------------------------------
I started to run the slapd server using "truss" to see when the server starts to coredump.
---------------------------------------------------------------------- # truss /usr/local/libexec/slapd -d 65535 -u openldap -g openldap [...] open("/etc/nsswitch.conf", O_RDONLY|O_LARGEFILE) = 9 fcntl(9, F_DUPFD, 0x00000100) Err#22 EINVAL read(9, " #\n # C o p y r i g h".., 1024) = 1024 read(9, " g u r e i t o u t ".., 1024) = 245 read(9, 0xFF092400, 1024) = 0 close(9) = 0 fstat(3, 0xFFBFCAE8) = 0 time() = 1233329411 getpid() = 9512 [9511] putmsg(3, 0xFFBFC1A0, 0xFFBFC194, 0) = 0 open("/var/run/syslog_door", O_RDONLY) = 9 door_info(9, 0xFFBFC0D8) = 0 getpid() = 9512 [9511] door_call(9, 0xFFBFC0C0) = 0 close(9) = 0 fstat(3, 0xFFBFCB88) = 0 time() = 1233329411 getpid() = 9512 [9511] putmsg(3, 0xFFBFC240, 0xFFBFC234, 0) = 0 open("/var/run/syslog_door", O_RDONLY) = 9 door_info(9, 0xFFBFC178) = 0 getpid() = 9512 [9511] door_call(9, 0xFFBFC160) = 0 close(9) = 0 Incurred fault #5, FLTACCESS %pc = 0x0008E1FC siginfo: SIGBUS BUS_ADRALN addr=0x00000191 Received signal #10, SIGBUS [default] siginfo: SIGBUS BUS_ADRALN addr=0x00000191 ----------------------------------------------------------------------
It looks like after reading nsswitch.conf, the server starts to crash. I changed the following lines in the nsswitch.conf and the server starts fine without any further problems. (even 20x in a row)
---------------------------------------------------------------------- Before: group files ldap After: group files ----------------------------------------------------------------------
Another thing is: If the server could startup without problems, it never crashed again. It's only sometimes during the initial startup.
I would be happy if anyone could help me or point me what i could adjust. If needed i could provide more information.
Daniel Hoffend wrote:
Hello
I'm setting up an openldap directory server (2.4.13) including a 2nd one for as backup/failover partner. After i compiled everything, installing, configuring everything (database, sync, schema, etc) and imported the basic LDAP Layout (ou=Users,ou=Groups, etc), I wanted to use this directory as Userdirectory for Userauthentication on ldap.
I was switching user/group lookups using the 'ldapclient' command and modified to /etc/nsswitch.conf to refer for "files ldap" for passwd and group.
Everything seems to work. 'genent passwd' and 'getent group' is listing my ldap user and groups. But when i try to restart the slapd server it crashes sometimes with a coredump.
Whenever you get a coredump, the most useful thing to do is to actually examine the coredump. Use a debugger to get a stack trace from the core file.
# /usr/local/libexec/slapd -d 65535 -u openldap -g openldap @(#) $OpenLDAP: slapd 2.4.13 (Jan 30 2009 12:02:48) $ root@ldapserver:/usr/local/src/openldap-2.4.13/servers/slapd ldap_pvt_gethostbyname_a: host=ldapserver, r=0 daemon_init:<null> daemon_init: listen on ldap:/// daemon_init: 1 listeners to open... ldap_url_parse_ext(ldap:///) daemon: listener initialized ldap:/// daemon_init: 2 listeners opened ldap_create Bus Error (core dumped)
Not everytime, sometimes several times in a row, sometimes after a 2nd start. I've no clue what it could be.
In the ldap logfile, i found the 2 following lines
an 30 16:26:56 ldapserver slapd[9494]: [ID 555073 local4.error] tid= 1: multiple threads per connection not supported Jan 30 16:26:56 ldapserver slapd[9494]: [ID 555073 local4.error] tid= 1: multiple threads per connection not supported
There is no such message anywhere in the OpenLDAP code. I suspect your're using the Sun NSS libraries and hitting a conflict between OpenLDAP's and Sun's libldap. It's worth noting that in future OpenSolaris releases, they will be shipping OpenLDAP's libraries and abandoning their old libldap.
I started to run the slapd server using "truss" to see when the server starts to coredump.
# truss /usr/local/libexec/slapd -d 65535 -u openldap -g openldap [...] open("/etc/nsswitch.conf", O_RDONLY|O_LARGEFILE) = 9 fcntl(9, F_DUPFD, 0x00000100) Err#22 EINVAL read(9, " #\n # C o p y r i g h".., 1024) = 1024 read(9, " g u r e i t o u t ".., 1024) = 245 read(9, 0xFF092400, 1024) = 0 close(9) = 0 fstat(3, 0xFFBFCAE8) = 0 time() = 1233329411 getpid() = 9512 [9511] putmsg(3, 0xFFBFC1A0, 0xFFBFC194, 0) = 0 open("/var/run/syslog_door", O_RDONLY) = 9 door_info(9, 0xFFBFC0D8) = 0 getpid() = 9512 [9511] door_call(9, 0xFFBFC0C0) = 0 close(9) = 0 fstat(3, 0xFFBFCB88) = 0 time() = 1233329411 getpid() = 9512 [9511] putmsg(3, 0xFFBFC240, 0xFFBFC234, 0) = 0 open("/var/run/syslog_door", O_RDONLY) = 9 door_info(9, 0xFFBFC178) = 0 getpid() = 9512 [9511] door_call(9, 0xFFBFC160) = 0 close(9) = 0 Incurred fault #5, FLTACCESS %pc = 0x0008E1FC siginfo: SIGBUS BUS_ADRALN addr=0x00000191 Received signal #10, SIGBUS [default] siginfo: SIGBUS BUS_ADRALN addr=0x00000191
It looks like after reading nsswitch.conf, the server starts to crash. I changed the following lines in the nsswitch.conf and the server starts fine without any further problems. (even 20x in a row)
Before: group files ldap After: group files
Another thing is: If the server could startup without problems, it never crashed again. It's only sometimes during the initial startup.
I would be happy if anyone could help me or point me what i could adjust. If needed i could provide more information.
Howard Chu wrote:
Whenever you get a coredump, the most useful thing to do is to actually examine the coredump. Use a debugger to get a stack trace from the core file.
okay, then i'll take a look on the debugger and the corefiles.
There is no such message anywhere in the OpenLDAP code. I suspect your're using the Sun NSS libraries and hitting a conflict between OpenLDAP's and Sun's libldap. It's worth noting that in future OpenSolaris releases, they will be shipping OpenLDAP's libraries and abandoning their old libldap.
That could be true, i already found something on google about the conflict between the sun libldap and the one from openldap. I just have to find out if i've any chance to replace them on a solaris 10 system or find another way to run openldap without running into such a conflict.
-- regards Daniel Hoffend
---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.
At Sat, 31 Jan 2009 02:56:06 +0100, dh@dotlan.net wrote:
There is no such message anywhere in the OpenLDAP code. I suspect your're using the Sun NSS libraries and hitting a conflict between OpenLDAP's and Sun's libldap. It's worth noting that in future OpenSolaris releases, they will be shipping OpenLDAP's libraries and abandoning their old libldap.
That could be true, i already found something on google about the conflict between the sun libldap and the one from openldap. I just have to find out if i've any chance to replace them on a solaris 10 system or find another way to run openldap without running into such a conflict.
I think you can avoid this problem with:
$ pwd /path/to/openldap-2.4.13 $ ./configure --disable-shared --enable-static --enable-slapd ...
or run the attached ad-hoc script:
$ pwd /path/to/openldap-2.4.13 $ /path/to/penldap-rename-symbols.pl YOUR_SITE_NAME \ include/*.h include/*.hin $ perl -i.dist -pe 's/^(CFLAGS\s*=.*)$/$1 -DLDAP_DEPRECATED/' \ clients/tools/Makefile $ ./configure --enable-shared --enable-slapd ...
Hi Satoh
thank you for the tip. I recompiled the openldap server as static binary, but it still crashes on startup (sometimes) at the same position.
greetings Daniel Hoffend
SATOH Fumiyasu schrieb:
At Sat, 31 Jan 2009 02:56:06 +0100, dh@dotlan.net wrote:
There is no such message anywhere in the OpenLDAP code. I suspect your're using the Sun NSS libraries and hitting a conflict between OpenLDAP's and Sun's libldap. It's worth noting that in future OpenSolaris releases, they will be shipping OpenLDAP's libraries and abandoning their old libldap.
That could be true, i already found something on google about the conflict between the sun libldap and the one from openldap. I just have to find out if i've any chance to replace them on a solaris 10 system or find another way to run openldap without running into such a conflict.
I think you can avoid this problem with:
$ pwd /path/to/openldap-2.4.13 $ ./configure --disable-shared --enable-static --enable-slapd ...
or run the attached ad-hoc script:
$ pwd /path/to/openldap-2.4.13 $ /path/to/penldap-rename-symbols.pl YOUR_SITE_NAME \ include/*.h include/*.hin $ perl -i.dist -pe 's/^(CFLAGS\s*=.*)$/$1 -DLDAP_DEPRECATED/' \ clients/tools/Makefile $ ./configure --enable-shared --enable-slapd ...
Hello
my slapd server is still crashing. I recompiled the slapd server already as static binary but that doesn't change anything. I still have no clue what this random crash in the startup phase is causing.
Howard Chu wrote:
Daniel Hoffend wrote:
I'm setting up an openldap directory server (2.4.13) including a 2nd one for as backup/failover partner. After i compiled everything, installing, configuring everything (database, sync, schema, etc) and imported the basic LDAP Layout (ou=Users,ou=Groups, etc), I wanted to use this directory as Userdirectory for Userauthentication on ldap.
I was switching user/group lookups using the 'ldapclient' command and modified to /etc/nsswitch.conf to refer for "files ldap" for passwd and group.
Everything seems to work. 'genent passwd' and 'getent group' is listing my ldap user and groups. But when i try to restart the slapd server it crashes sometimes with a coredump.
Whenever you get a coredump, the most useful thing to do is to actually examine the coredump. Use a debugger to get a stack trace from the core file.
Okay i don't really know how to process core files. When I'm doing a pstack on the core file I'm getting this output.
---------------------------------------------------------------------- -bash-3.00# pstack core core 'core' of 5861: /usr/local/libexec/slapd -d 511 -u 500 -g 500 0008e1fc slap_sl_malloc (18, 185, ff092a00, ff3f4910, fe2c3040, ff3f6a08) + 14 0008e594 slap_sl_calloc (1, 18, 185, 0, 23a4a0, 0) + 14 00153e84 ber_memcalloc_x (1, 18, 185, 0, ff3f42f0, 0) + 54 001519b0 ber_start_seqorset (2430c0, 30, 7b000000, 7b, 15226c, 15237c) + 50 00152270 ber_printf (2430c0, fe066fd4, 1, ffbfcecc, 6e2c6463, 2431d8) + 2d8 fe052c08 simple_bind_nolock (243520, 2431d8, 23cd30, ffffffff, 29704, 0) + 318 fe0a8e94 openConnection (ffbfdfe0, 23be30, 23c4c0, 2, 242a0c, 1) + 67c fe0a7e98 makeConnection (1, 0, 23c4c0, 23be90, ffbfdfd0, 242a0c) + 364 fe0aa31c __s_api_getConnection (0, 0, 0, ffbfe8dc, 23b048, 242a0c) + 578 fe09c6dc get_current_session (2429c8, 0, 3, 260, c, ffffffff) + 38 fe09d41c search_state_machine (2429c8, 242a00, 0, 1, 0, fe0ca000) + 208 fe09e0e4 __ns_ldap_list (fe0eeeac, ffbff380, 24299c, fe0eea28, 0, 0) + 220 fe0dab44 _nss_ldap_nocb_lookup (242980, 0, fe0eeeac, ffbff380, 0, fe0db300) + 44 fe0d4d00 getbymember (242980, ffbffa54, 193dc, ff092a00, fe25d0d8, ffbff280) + e8 fe25dc14 nss_search (fe0d4c18, fe2f1910, ff090340, 2, fe2f77bc, 1) + 20c fe248124 _getgroupsbymember (23bbc0, 23b278, 10, 1, ffbff654, fe3c1948) + b4 fe2519cc initgroups (23bbc0, 1f4, fe2f3700, ff092a00, 23b278, 10) + 70 00068860 slap_init_user (23bbc0, 23ad58, 1d2c00, 10, 23a800, 1) + e4 0002641c main (7, ffbffc4c, 0, 0, 23ad58, 23ad48) + d24 000251bc _start (0, 0, 0, 0, 0, 0) + 5c ----------------------------------------------------------------------
I would be happy for any help. Otherwise it seems that I've to write a custom start script, that is starting the LDAP server several times in a row, just to make sure that the server is starting up, instead of fixing the error for real.
Thanks Daniel Hoffend
--On Friday, February 13, 2009 3:15 PM +0100 Daniel Hoffend dh@dotlan.net wrote:
Hello
my slapd server is still crashing. I recompiled the slapd server already as static binary but that doesn't change anything. I still have no clue what this random crash in the startup phase is causing.
Howard Chu wrote:
Daniel Hoffend wrote:
I'm setting up an openldap directory server (2.4.13) including a 2nd one for as backup/failover partner. After i compiled everything, installing, configuring everything (database, sync, schema, etc) and imported the basic LDAP Layout (ou=Users,ou=Groups, etc), I wanted to use this directory as Userdirectory for Userauthentication on ldap.
I was switching user/group lookups using the 'ldapclient' command and modified to /etc/nsswitch.conf to refer for "files ldap" for passwd and group.
Everything seems to work. 'genent passwd' and 'getent group' is listing my ldap user and groups. But when i try to restart the slapd server it crashes sometimes with a coredump.
Whenever you get a coredump, the most useful thing to do is to actually examine the coredump. Use a debugger to get a stack trace from the core file.
Okay i don't really know how to process core files. When I'm doing a pstack on the core file I'm getting this output.
-bash-3.00# pstack core core 'core' of 5861: /usr/local/libexec/slapd -d 511 -u 500 -g 500 0008e1fc slap_sl_malloc (18, 185, ff092a00, ff3f4910, fe2c3040, ff3f6a08) + 14
Looks like you ran out of memory to me.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
Quanah Gibson-Mount wrote:
--On Friday, February 13, 2009 3:15 PM +0100 Daniel Hoffenddh@dotlan.net wrote:
Hello
my slapd server is still crashing. I recompiled the slapd server already as static binary but that doesn't change anything. I still have no clue what this random crash in the startup phase is causing.
Howard Chu wrote:
Daniel Hoffend wrote:
I'm setting up an openldap directory server (2.4.13) including a 2nd one for as backup/failover partner. After i compiled everything, installing, configuring everything (database, sync, schema, etc) and imported the basic LDAP Layout (ou=Users,ou=Groups, etc), I wanted to use this directory as Userdirectory for Userauthentication on ldap.
I was switching user/group lookups using the 'ldapclient' command and modified to /etc/nsswitch.conf to refer for "files ldap" for passwd and group.
Everything seems to work. 'genent passwd' and 'getent group' is listing my ldap user and groups. But when i try to restart the slapd server it crashes sometimes with a coredump.
Whenever you get a coredump, the most useful thing to do is to actually examine the coredump. Use a debugger to get a stack trace from the core file.
Okay i don't really know how to process core files. When I'm doing a pstack on the core file I'm getting this output.
-bash-3.00# pstack core core 'core' of 5861: /usr/local/libexec/slapd -d 511 -u 500 -g 500 0008e1fc slap_sl_malloc (18, 185, ff092a00, ff3f4910, fe2c3040, ff3f6a08) + 14
Looks like you ran out of memory to me.
No. It's memory corruption because his Solaris libldap is calling into OpenLDAP's liblber. The only reliable solution is to remove Solaris libldap from the machine and rebuild nss_ldap using OpenLDAP's libraries.
On Sat, Feb 14, 2009 at 7:40 AM, Howard Chu hyc@symas.com wrote:
Looks like you ran out of memory to me.
No. It's memory corruption because his Solaris libldap is calling into OpenLDAP's liblber. The only reliable solution is to remove Solaris libldap from the machine and rebuild nss_ldap using OpenLDAP's libraries.
wouldn't LD_RUN=/my/openldap/lib or -R/my/openldap/lib options (for shared libraries) or -L/my/openldap/lib (for static libraries) options during compiling, normally be sufficient to force the shared library link path at runtime (shared libraries) or link the openssl library specifically at compile time (static libraries) to avoid the unrequested solaris library surplus ?
If somebody has a split administrative model, it can be very hard to twiddle with "stock" solaris, changing or removing the solaris LDAP may not be possible..
Cheers Brett
Brett @Google wrote:
On Sat, Feb 14, 2009 at 7:40 AM, Howard Chu <hyc@symas.com mailto:hyc@symas.com> wrote:
Looks like you ran out of memory to me. No. It's memory corruption because his Solaris libldap is calling into OpenLDAP's liblber. The only reliable solution is to remove Solaris libldap from the machine and rebuild nss_ldap using OpenLDAP's libraries.
wouldn't LD_RUN=/my/openldap/lib or -R/my/openldap/lib options (for shared libraries) or -L/my/openldap/lib (for static libraries) options during compiling, normally be sufficient to force the shared library link path at runtime (shared libraries) or link the openssl library specifically at compile time (static libraries) to avoid the unrequested solaris library surplus ?
If somebody has a split administrative model, it can be very hard to twiddle with "stock" solaris, changing or removing the solaris LDAP may not be possible..
The nss_ldap module still needs to be rebuilt. If you arrange for only the OpenLDAP libraries to be present in the process, the nss_ldap module will still crash if it was originally built against the Sun LDAP. They're source-compatible, but they're not binary-compatible.
Howard Chu wrote
The nss_ldap module still needs to be rebuilt. If you arrange for only the OpenLDAP libraries to be present in the process, the nss_ldap module will still crash if it was originally built against the Sun LDAP. They're source-compatible, but they're not binary-compatible.
Okay, i just found this howto: http://mctalby.mc.man.ac.uk/~mc/_small_stuff/solaris_10_ldap_auth.html
It covers recompiling of nss_ldap and pam_ldap aswell. It seems that i've to replace /usr/lib/nss_ldap.so. Hmmm then i've to reconfigure my solaris 10 zone because /usr is shared on the server (expect /usr/local)
But as i'm the admin of the server i can change everything, but i would like to reduce difference between those an other zones and not maintain a full copy of /usr for every solaris zone.
I'll try those howto next week and see if it will work.
openldap-technical@openldap.org