slapd breaks NSS, NSS breaks slapd

List overview All Threads
Download

newer

older

A strange dn

Multi Master doesn't replicate...

manu＠netbsd.org

11 Aug 2008 11 Aug '08

10:51 a.m.

Hello

I have a machine where slapd will not start if ldap://localhost is listed in /etc/nss_ldap.conf. It just hangs:

# slapd -u slapd -h ldap://localhost -d5 @(#) $OpenLDAP: slapd 2.4.8 (Jun 24 2008 04:21:32) $

root@:/pkg_comp/obj/pkgsrc/databases/openldap-server/default/openldap-2. 4.8/servers/slapd daemon_init: ldap://localhost daemon_init: listen on ldap://localhost daemon_init: 1 listeners to open... ldap_url_parse_ext(ldap://localhost) daemon: listener initialized ldap://localhost daemon_init: 2 listeners opened ldap_create ldap_url_parse_ext(ldap://127.0.0.1) ldap_create ldap_url_parse_ext(ldap://127.0.0.1) ldap_create ldap_url_parse_ext(ldap://127.0.0.1) ldap_simple_bind ldap_sasl_bind ldap_send_initial_request ldap_new_connection 1 1 0 ldap_int_open_connection ldap_connect_to_host: TCP 127.0.0.1:389 ldap_new_socket: 8 ldap_prepare_socket: 8 ldap_connect_to_host: Trying 127.0.0.1:389 ldap_pvt_connect: fd: 8 tm: 30 async: 0 ldap_ndelay_on: 8 ldap_int_poll: fd: 8 tm: 30

If I remove ldap://localhost from nss_ldap.conf, is works fine. Any idea how to get that working?

Here is nss_ldap.conf: BASE dc=example,dc=net URI ldap://localhost ldap://ldap.example.net TLS_CACERT /etc/openssl/certs/ca.crt TLS_REQCERT demand

-- Emmanuel Dreyfus http://hcpnet.free.fr/pubz manu@netbsd.org

Show replies by date

Dmitriy Kirhlarov

11 Aug 11 Aug

10:59 a.m.

Emmanuel Dreyfus wrote:

...

Hello

I have a machine where slapd will not start if ldap://localhost is listed in /etc/nss_ldap.conf. It just hangs:

...

If I remove ldap://localhost from nss_ldap.conf, is works fine. Any idea how to get that working?

add: --- nss_reconnect_sleeptime 0 nss_reconnect_maxsleeptime 1 nss_reconnect_maxconntries 1 ---

For details read: http://www.liquidx.net/blog/2006/04/03/nss_ldap-undocumented-nss_reconnect_t...

...

Here is nss_ldap.conf: BASE dc=example,dc=net URI ldap://localhost ldap://ldap.example.net TLS_CACERT /etc/openssl/certs/ca.crt TLS_REQCERT demand

It's library config (etc/openldap/ldap.conf). Not equal with application conf (etc/ldap.conf). nss_ldap.conf can be symlink to application config.

WBR. Dmitriy

Howard Chu

11:07 a.m.

Emmanuel Dreyfus wrote:

...

Hello

I have a machine where slapd will not start if ldap://localhost is listed in /etc/nss_ldap.conf. It just hangs:

Get a gdb backtrace of the hang.

Show us your nsswitch.conf...

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

manu＠netbsd.org

12:52 p.m.

Howard Chu hyc@symas.com wrote:

...

Get a gdb backtrace of the hang.

Show us your nsswitch.conf...

Here is nsswitch.conf: group: files ldap group_compat: nis hosts: files dns netgroup: files [notfound=return] nis networks: files passwd: files ldap passwd_compat: nis shells: files

user slapd and group ldap are resloved locally: in /etc/passwd slapd:*:402:497:openldap-server slapd user:/nonexistent:/sbin/nologin

in /etc/group ldap:*:497:

$ id slapd uid=402(slapd) gid=497(ldap) groups=497(ldap)

Here is the backtrace (the bottom is not very helpful, but fortunately there is only one occurence of initgroups in slapd sources)

#0 0xbb7d6167 in poll () from /lib/libc.so.12 #1 0xbb89ce49 in poll () from /usr/lib/libpthread.so.0 #2 0xbbbd002d in ldap_int_poll () from /usr/pkg/lib/libldap_r-2.4.so.2 #3 0xbbbd0632 in ldap_connect_to_host () from /usr/pkg/lib/libldap_r-2.4.so.2 #4 0xbbbbbffe in ldap_int_open_connection () from /usr/pkg/lib/libldap_r-2.4.so.2 #5 0xbbbce04d in ldap_new_connection () from /usr/pkg/lib/libldap_r-2.4.so.2 #6 0xbbbbbf27 in ldap_open_defconn () from /usr/pkg/lib/libldap_r-2.4.so.2 #7 0xbbbcea73 in ldap_send_initial_request () from /usr/pkg/lib/libldap_r-2.4.so.2 #8 0xbbbc4fe5 in ldap_sasl_bind () from /usr/pkg/lib/libldap_r-2.4.so.2 #9 0xbbbc5433 in ldap_simple_bind () from /usr/pkg/lib/libldap_r-2.4.so.2 #10 0xbb753bc7 in _nss_ldap_init () from /usr/lib/nss_ldap.so.0 #11 0xbb755857 in _nss_ldap_ent_context_init_locked () from /usr/lib/nss_ldap.so.0 #12 0xbb755cc3 in _nss_ldap_search () from /usr/lib/nss_ldap.so.0 #13 0xbb755f68 in _nss_ldap_getent_ex () from /usr/lib/nss_ldap.so.0 #14 0xbb757c69 in _nss_ldap_initgroups_dyn () from /usr/lib/nss_ldap.so.0 #15 0xbb75e614 in _nss_ldap_mergeconfigfromdns () from /usr/lib/nss_ldap.so.0 #16 0xbb84d8b6 in nsdispatch () from /lib/libc.so.12 #17 0xbb81a0c8 in getgroupmembership () from /lib/libc.so.12 #18 0xbb7fd572 in getgrouplist () from /lib/libc.so.12 #19 0xbb7fb4f2 in initgroups () from /lib/libc.so.12 #20 0x0808becc in ?? () #21 0x0822c040 in ?? () #22 0x000001f1 in ?? () #23 0xbb80c8ec in uuid_is_nil () from /lib/libc.so.12 #24 0x08050004 in ?? () #25 0x0822c040 in ?? () #26 0x0822c050 in ?? () #27 0x000000a0 in ?? () #28 0x081c92e8 in _ctype_ () #29 0xbb8a81f8 in ?? () from /usr/lib/libpthread.so.0 #30 0xbb885c30 in tzname () from /lib/libc.so.12 #31 0xbfbfe90c in ?? () #32 0x00000009 in ?? () #33 0x00000001 in ?? () #34 0x0822b040 in ?? () #35 0x0822c040 in ?? () #36 0x0822c050 in ?? () #37 0x00000000 in ?? ()

-- Emmanuel Dreyfus http://hcpnet.free.fr/pubz manu@netbsd.org

Ralf Haferkamp

12 Aug 12 Aug

12:38 a.m.

On Montag, 11. August 2008, Emmanuel Dreyfus wrote:

...

Howard Chu hyc@symas.com wrote:

...
Get a gdb backtrace of the hang.

Show us your nsswitch.conf...

Here is nsswitch.conf: group: files ldap group_compat: nis hosts: files dns netgroup: files [notfound=return] nis networks: files passwd: files ldap passwd_compat: nis shells: files

user slapd and group ldap are resloved locally: in /etc/passwd slapd:*:402:497:openldap-server slapd user:/nonexistent:/sbin/nologin

in /etc/group ldap:*:497:

$ id slapd uid=402(slapd) gid=497(ldap) groups=497(ldap) Here is the backtrace (the bottom is not very helpful, but fortunately there is only one occurence of initgroups in slapd sources)

As it seems to hang in the initgroups call, does it help to add: nss_initgroups_ignoreusers root,slapd to your nss_ldap configuration?

...

#0 0xbb7d6167 in poll () from /lib/libc.so.12 #1 0xbb89ce49 in poll () from /usr/lib/libpthread.so.0 #2 0xbbbd002d in ldap_int_poll () from /usr/pkg/lib/libldap_r-2.4.so.2 #3 0xbbbd0632 in ldap_connect_to_host () from /usr/pkg/lib/libldap_r-2.4.so.2 #4 0xbbbbbffe in ldap_int_open_connection () from /usr/pkg/lib/libldap_r-2.4.so.2 #5 0xbbbce04d in ldap_new_connection () from /usr/pkg/lib/libldap_r-2.4.so.2 #6 0xbbbbbf27 in ldap_open_defconn () from /usr/pkg/lib/libldap_r-2.4.so.2 #7 0xbbbcea73 in ldap_send_initial_request () from /usr/pkg/lib/libldap_r-2.4.so.2 #8 0xbbbc4fe5 in ldap_sasl_bind () from /usr/pkg/lib/libldap_r-2.4.so.2 #9 0xbbbc5433 in ldap_simple_bind () from /usr/pkg/lib/libldap_r-2.4.so.2 #10 0xbb753bc7 in _nss_ldap_init () from /usr/lib/nss_ldap.so.0 #11 0xbb755857 in _nss_ldap_ent_context_init_locked () from /usr/lib/nss_ldap.so.0 #12 0xbb755cc3 in _nss_ldap_search () from /usr/lib/nss_ldap.so.0 #13 0xbb755f68 in _nss_ldap_getent_ex () from /usr/lib/nss_ldap.so.0 #14 0xbb757c69 in _nss_ldap_initgroups_dyn () from /usr/lib/nss_ldap.so.0 #15 0xbb75e614 in _nss_ldap_mergeconfigfromdns () from /usr/lib/nss_ldap.so.0 #16 0xbb84d8b6 in nsdispatch () from /lib/libc.so.12 #17 0xbb81a0c8 in getgroupmembership () from /lib/libc.so.12 #18 0xbb7fd572 in getgrouplist () from /lib/libc.so.12 #19 0xbb7fb4f2 in initgroups () from /lib/libc.so.12

[..]

-- Ralf

Emmanuel Dreyfus

1:02 a.m.

On Tue, Aug 12, 2008 at 09:38:21AM +0200, Ralf Haferkamp wrote:

...

As it seems to hang in the initgroups call, does it help to add: nss_initgroups_ignoreusers root,slapd to your nss_ldap configuration?

Yes, that fixes the problem. Thanks a lot.

Now, the thing I don't get is why the very same config works on other machines and breaks on that one.

-- Emmanuel Dreyfus manu@netbsd.org

Buchan Milne

2:17 a.m.

On Tuesday 12 August 2008 09:38:21 Ralf Haferkamp wrote:

...

On Montag, 11. August 2008, Emmanuel Dreyfus wrote:

...
Howard Chu hyc@symas.com wrote:

...
Get a gdb backtrace of the hang.

Show us your nsswitch.conf...

Here is nsswitch.conf: group: files ldap group_compat: nis hosts: files dns netgroup: files [notfound=return] nis networks: files passwd: files ldap passwd_compat: nis shells: files

user slapd and group ldap are resloved locally: in /etc/passwd slapd:*:402:497:openldap-server slapd user:/nonexistent:/sbin/nologin

in /etc/group ldap:*:497:

$ id slapd uid=402(slapd) gid=497(ldap) groups=497(ldap)

Maybe, but unlike a user account, the groups a user is a member of is not singular, and a user may be a member of groups that are defined in different nss plugins. It is impossible to determine this without doing the lookup ...

...

...
Here is the backtrace (the bottom is not very helpful, but fortunately there is only one occurence of initgroups in slapd sources)

I guess the man page for initgroups really needs to be updated to be more clear ...

...

As it seems to hang in the initgroups call, does it help to add: nss_initgroups_ignoreusers root,slapd to your nss_ldap configuration?

If you go down this path, you will end up adding a very long list of users to this list. IMHO it is the wrong approach (other problems aren't addressed), and not scalable.

Let's rather consider the example with an nss_ldap client that can't connect to any of it's configured LDAP servers (due to firewall which is dropping all LDAP traffic). No local accounts, besides those listed in nss_inigroups_ignoreusers would be able to log in, so LDAP groups would be useless.

However, either setting: bind_policy soft or setting the nss_reconnect_{sleeptime,maxsleeptime,maxconntries} options would in my opinion be the correct fix (not only addressing the "haldaemon doesn't start at boot","ldap doesn't start when it's not running" issues).

Anyway, I will point out that this issue is more or less an FAQ on the nss_ldap list.

Regards, Buchan

Emmanuel Dreyfus

3:01 a.m.

On Tue, Aug 12, 2008 at 11:17:13AM +0200, Buchan Milne wrote:

...

Anyway, I will point out that this issue is more or less an FAQ on the nss_ldap list.

IMO, the problem is in slapd: it starts listening for requests while it is not ready yet for answering requests.

If the listener was not ready when slapd would do its initgroups() call, then NSS would not contact local slapd, it would fallback to other sources (/etc/passwd and /etc/group), and everything would be fine.

What about a new slapd.conf option? delayed_service {none|warm|syncrepl} and slapd would... ... behave as it does now for "none" ... return LDAP_UNAVAILABLE until initialization is completed for "warm" ... return LDAP_UNAVAILABLE until syncrepl catch up with master for "syncrepl"

The later option would fix the stupid situation where your replica starts and answer outdated stuff until syncrepl catch up.

-- Emmanuel Dreyfus manu@netbsd.org

Buchan Milne

6:38 a.m.

On Tuesday 12 August 2008 12:01:16 Emmanuel Dreyfus wrote:

...

On Tue, Aug 12, 2008 at 11:17:13AM +0200, Buchan Milne wrote:

...
Anyway, I will point out that this issue is more or less an FAQ on the nss_ldap list.

IMO, the problem is in slapd: it starts listening for requests while it is not ready yet for answering requests.

If the listener was not ready when slapd would do its initgroups() call, then NSS would not contact local slapd, it would fallback to other sources (/etc/passwd and /etc/group), and everything would be fine.

Only for your case, where it is nss_ldap is preventing slapd from starting, not the case where haldaemon (or similar, but haldaemon is the most common suspect on RedHat-based systems).

...

What about a new slapd.conf option? delayed_service {none|warm|syncrepl}

Add another option, database

...

and slapd would... ... behave as it does now for "none" ... return LDAP_UNAVAILABLE until initialization is completed for "warm" ... return LDAP_UNAVAILABLE until syncrepl catch up with master for "syncrepl"

return LDAP_UNAVAILABLE until all databases are recovered and started.

...

The later option would fix the stupid situation where your replica starts and answer outdated stuff until syncrepl catch up.

Yes, this would be useful to me. But, I don't see a need for this to solve the chicken/egg slapd vs nss_ldap issue (because this is a sub-set of the whole problem).

Regards, Buchan

Howard Chu

11:04 a.m.

Emmanuel Dreyfus wrote:

...

On Tue, Aug 12, 2008 at 11:17:13AM +0200, Buchan Milne wrote:

...
Anyway, I will point out that this issue is more or less an FAQ on the nss_ldap list.

IMO, the problem is in slapd: it starts listening for requests while it is not ready yet for answering requests.

If the listener was not ready when slapd would do its initgroups() call, then NSS would not contact local slapd, it would fallback to other sources (/etc/passwd and /etc/group), and everything would be fine.

Hm, I don't think that's true. slap_init_user() which does the initgroups() call occurs before slapd starts listening on its sockets. While it has its sockets bound to their respective ports, clients will get a "connection refused" while the sockets are in this state. It only calls listen() long after the startup initializations are done, and only then can it receive any incoming requests.

...

What about a new slapd.conf option? delayed_service {none|warm|syncrepl} and slapd would... ... behave as it does now for "none" ... return LDAP_UNAVAILABLE until initialization is completed for "warm" ... return LDAP_UNAVAILABLE until syncrepl catch up with master for "syncrepl"

The later option would fix the stupid situation where your replica starts and answer outdated stuff until syncrepl catch up.

We've discussed that possibility (delaying queries until syncrepl completes) a few times on -devel in the past. I don't remember now why we didn't do it, check the archives...

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Buchan Milne

13 Aug 13 Aug

12:44 a.m.

On Tuesday 12 August 2008 12:01:16 Emmanuel Dreyfus wrote:

...

On Tue, Aug 12, 2008 at 11:17:13AM +0200, Buchan Milne wrote:

...
Anyway, I will point out that this issue is more or less an FAQ on the nss_ldap list.

IMO, the problem is in slapd: it starts listening for requests while it is not ready yet for answering requests.

Actually, if that were the case, I think a suitable timelimit in nss_ldap's ldap.conf should prevent any problems, but it doesn't due to nss_ldap's (IMHO) braindead defaults.

Regards, Buchan

Emmanuel Dreyfus

12:54 a.m.

On Wed, Aug 13, 2008 at 09:44:23AM +0200, Buchan Milne wrote:

...

Actually, if that were the case, I think a suitable timelimit in nss_ldap's ldap.conf should prevent any problems, but it doesn't due to nss_ldap's (IMHO) braindead defaults.

Such an approach leads to even worse problems with other applications: sendmail performs NSS lookups for local delivery (when looking for .forward), and it does it with getpwnam().

getpwnam() does not set errno, the caller has no way of distinguishing an inexesting entry or an unreachable NSS source. If you use getpwnam_r(), errno is set on failure and you can make the difference. But sendmail uses getpwnam().

So if NSS returns no answer because of a bind or search tiemout, sendmail will consider the recipient does not exit and will bounce the message.

This is off-topic, so if the reader is looking for a workaround in the sendmail config, (s)he should look for my post on comp.mail.sendmail thi smorning. But that is not fully satisfying, and I am still looking for a really reliable setup.

-- Emmanuel Dreyfus manu@netbsd.org

Michael Ströder

1:27 a.m.

Emmanuel Dreyfus wrote:

...

On Wed, Aug 13, 2008 at 09:44:23AM +0200, Buchan Milne wrote:

...
Actually, if that were the case, I think a suitable timelimit in nss_ldap's ldap.conf should prevent any problems, but it doesn't due to nss_ldap's (IMHO) braindead defaults.

Such an approach leads to even worse problems with other applications: sendmail performs NSS lookups for local delivery (when looking for .forward), and it does it with getpwnam().

One more reason to disallow such an ancient forwarding mechanism and switch over to forwarding addresses directly stored in the directory.

This is a great example showing that backward compability is not always a good thing by itself. Especially when system architecture changes dramatically, e.g. by introducing a directory service.

Ciao, Michael.

Dan White

6:33 a.m.

Emmanuel Dreyfus wrote:

...

On Wed, Aug 13, 2008 at 09:44:23AM +0200, Buchan Milne wrote:

...
Actually, if that were the case, I think a suitable timelimit in nss_ldap's ldap.conf should prevent any problems, but it doesn't due to nss_ldap's (IMHO) braindead defaults.

Such an approach leads to even worse problems with other applications: sendmail performs NSS lookups for local delivery (when looking for .forward), and it does it with getpwnam().

getpwnam() does not set errno, the caller has no way of distinguishing an inexesting entry or an unreachable NSS source. If you use getpwnam_r(), errno is set on failure and you can make the difference. But sendmail uses getpwnam().

So if NSS returns no answer because of a bind or search tiemout, sendmail will consider the recipient does not exit and will bounce the message.

This is off-topic, so if the reader is looking for a workaround in the sendmail config, (s)he should look for my post on comp.mail.sendmail thi smorning. But that is not fully satisfying, and I am still looking for a really reliable setup.

If you haven't already, you may want to give nss-ldapd a look:

http://ch.tudelft.nl/~arthur/nss-ldapd/design.html

- Dan

Kurt Zeilenga

7:28 a.m.

This thread has gone off-topic and is now closed. I suggest those wanting to discuss NSS/LDAP and NSS/LDAPD take their discussions to lists intended to support these software components, or take them to the openldap-technical list which allows a broader range of topics than this list. Thanks, your moderator.

-- Kurt

On Aug 13, 2008, at 6:33 AM, Dan White wrote:

...

Emmanuel Dreyfus wrote:

...
On Wed, Aug 13, 2008 at 09:44:23AM +0200, Buchan Milne wrote:

...
Actually, if that were the case, I think a suitable timelimit in nss_ldap's ldap.conf should prevent any problems, but it doesn't due to nss_ldap's (IMHO) braindead defaults.

Such an approach leads to even worse problems with other applications: sendmail performs NSS lookups for local delivery (when looking for .forward), and it does it with getpwnam().

getpwnam() does not set errno, the caller has no way of distinguishing an inexesting entry or an unreachable NSS source. If you use getpwnam_r(), errno is set on failure and you can make the difference. But sendmail uses getpwnam(). So if NSS returns no answer because of a bind or search tiemout, sendmail will consider the recipient does not exit and will bounce the message. This is off-topic, so if the reader is looking for a workaround in the sendmail config, (s)he should look for my post on comp.mail.sendmail thi smorning. But that is not fully satisfying, and I am still looking for a really reliable setup.

If you haven't already, you may want to give nss-ldapd a look:

http://ch.tudelft.nl/~arthur/nss-ldapd/design.html

Dan

Philip Guenther

11 Aug 11 Aug

11:27 a.m.

On Mon, 11 Aug 2008, Emmanuel Dreyfus wrote:

...

I have a machine where slapd will not start if ldap://localhost is listed in /etc/nss_ldap.conf. It just hangs:

...

If I remove ldap://localhost from nss_ldap.conf, is works fine. Any idea how to get that working?

This sounds a bit like this thread: http://www.openldap.org/lists/openldap-software/200804/msg00004.html

There were a couple suggestions there.

Philip Guenther

BTW: the subject on your message was excellent...as opposed to the message in the archives referenced above, which had a completely generic subject that made it more difficult for me to find when my memory said "wasn't something like this encountered earlier this year?". So, thank you, Emmanuel!

manu＠netbsd.org

12:52 p.m.

Philip Guenther guenther+ldapsoft@sendmail.com wrote:

...

This sounds a bit like this thread: http://www.openldap.org/lists/openldap-software/200804/msg00004.html

There were a couple suggestions there.

So here are the 3 solutions given in this thread:

- No problem with group resolution, it is defined locally - I tried to add -g ldap, no improvement. I also tried -g <gid of ldap> - bind_policy soft in /etc/nss_ldap.conf does not help either.

But the person that started that thread tracked the problem down to group resolution. I tried with this in nss_ldap.conf URI ldap://localhost ldap://ldap.example.net

And this in /etc/nsswitch.conf: group: files (instead of "files ldap")

and I can get it starting.

-- Emmanuel Dreyfus http://hcpnet.free.fr/pubz manu@netbsd.org

6166

Age (days ago)

6168

Last active (days ago)

openldap-software@openldap.org

16 comments

10 participants

tags (0)

participants (10)

Buchan Milne
Dan White
Dmitriy Kirhlarov
Emmanuel Dreyfus
Howard Chu
Kurt Zeilenga
manu＠netbsd.org
Michael Ströder
Philip Guenther
Ralf Haferkamp