[I’ve posted this on the OpenStack list as well, but maybe someone here knows more]
I’m setting up (Open)LDAP (v2.4.40) on my old Newton installation, with the LDAP servers behind a HAProxy LB.
I’m trying to have one at a time enabled to see if I can get them working individually before I try them as a whole/group..
I tried all day yesterday, and I could do the initial connection, but not get any results:
ldap_sasl_interactive_bind_s: Can't contact LDAP server (-1)
I see the connection in syslog on the LDAP server, but don’t get any results back.
Now, first thing I did this morning was to just run the exact same command (kinit && ldapwhoami) that I did last night.
AND IT WORKED!!
No idea why! It shouldn’t have. Glad it did, but since I can’t explain WHY it worked, it’s annoying!! :)
So I then disabled that (working) LDAP server in the LB member list and enabled the second. And now that is experiencing the same problem as the first yesterday…
I didn’t change anything else - last thing I did before I went to bed last night was try the ldapwhoami command -> “can’t contact ldap server”. And the very first thing I did this morning was kdestroy my ticket, get a new one and then run ldapwhoami.
I’ve run with multiple types of debugging, but there’s nothing obvious that I can see, either from ‘-d -1’ or with KRB5_TRACE set).
So … “something” internally in OS changed. Any suggestions to what or how to debug this?
What is ldap_sasl_interactive_bind_s() actually doing? Why does the ldap_bind() earlier seem to work, but not the SASL bind?
See http://bayour.com/misc/ldapwhoami_output.txt http://bayour.com/misc/ldapwhoami_output.txt for full output from
KRB5_TRACE=/dev/stdout ldapwhoami -YGSSAPI -H ldaps://ldap.bayour.net -d -1
and while this is happening, this is the output from slapd in the logs (running with “loglevel sync stats):
Nov 19 12:42:40 admin-auth-ldap-31 slapd[26613]: conn=1015 fd=29 ACCEPT from IP=10.0.17.34:53451 (IP=10.0.17.31:636) Nov 19 12:42:40 admin-auth-ldap-31 slapd[26613]: conn=1015 fd=29 TLS established tls_ssf=256 ssf=256 Nov 19 12:42:40 admin-auth-ldap-31 slapd[26613]: conn=1015 op=0 BIND dn="" method=163 Nov 19 12:43:09 admin-auth-ldap-31 slapd[26613]: conn=1013 fd=22 closed (connection lost)
With ‘loglevel -1’ (and filtering out 'daemon: epoll: listen|daemon: activity on’ because it ends up filling the screen), I get:
Nov 19 12:49:28 admin-auth-ldap-31 slapd[27043]: Nov 19 12:49:28 admin-auth-ldap-31 slapd[27043]: slap_listener_activate(12): Nov 19 12:49:28 admin-auth-ldap-31 slapd[27043]: >>> slap_listener(ldaps://admin-auth-ldap-31.bayour.net:636/) Nov 19 12:49:28 admin-auth-ldap-31 slapd[27043]: daemon: listen=12, new connection on 25 Nov 19 12:49:29 admin-auth-ldap-31 slapd[27043]: Nov 19 12:49:33 admin-auth-ldap-31 slapd[27043]: daemon: added 25r (active) listener=(nil) Nov 19 12:49:33 admin-auth-ldap-31 slapd[27043]: conn=1001 fd=25 ACCEPT from IP=10.0.17.34:54740 (IP=10.0.17.31:636) Nov 19 12:49:34 admin-auth-ldap-31 slapd[27043]: 25r Nov 19 12:49:34 admin-auth-ldap-31 slapd[27043]: Nov 19 12:49:34 admin-auth-ldap-31 slapd[27043]: daemon: read active on 25 Nov 19 12:49:34 admin-auth-ldap-31 slapd[27043]: connection_get(25) Nov 19 12:49:34 admin-auth-ldap-31 slapd[27043]: connection_get(25): got connid=1001 Nov 19 12:49:34 admin-auth-ldap-31 slapd[27043]: connection_read(25): checking for input on id=1001 Nov 19 12:49:35 admin-auth-ldap-31 slapd[27043]: 25r Nov 19 12:49:35 admin-auth-ldap-31 slapd[27043]: Nov 19 12:49:35 admin-auth-ldap-31 slapd[27043]: daemon: read active on 25 Nov 19 12:49:35 admin-auth-ldap-31 slapd[27043]: connection_get(25) Nov 19 12:49:35 admin-auth-ldap-31 slapd[27043]: connection_get(25): got connid=1001 Nov 19 12:49:35 admin-auth-ldap-31 slapd[27043]: connection_read(25): checking for input on id=1001 Nov 19 12:49:35 admin-auth-ldap-31 slapd[27043]: connection_read(25): unable to get TLS client DN, error=49 id=1001 Nov 19 12:49:35 admin-auth-ldap-31 slapd[27043]: conn=1001 fd=25 TLS established tls_ssf=256 ssf=256 Nov 19 12:49:36 admin-auth-ldap-31 slapd[27043]: 25r Nov 19 12:49:36 admin-auth-ldap-31 slapd[27043]: Nov 19 12:49:36 admin-auth-ldap-31 slapd[27043]: daemon: read active on 25 Nov 19 12:49:36 admin-auth-ldap-31 slapd[27043]: connection_get(25) Nov 19 12:49:36 admin-auth-ldap-31 slapd[27043]: connection_get(25): got connid=1001 Nov 19 12:49:36 admin-auth-ldap-31 slapd[27043]: connection_read(25): checking for input on id=1001 Nov 19 12:49:36 admin-auth-ldap-31 slapd[27043]: op tag 0x60, time 1511095776 Nov 19 12:49:36 admin-auth-ldap-31 slapd[27043]: conn=1001 op=0 do_bind Nov 19 12:49:36 admin-auth-ldap-31 slapd[27043]: >>> dnPrettyNormal: <> Nov 19 12:49:36 admin-auth-ldap-31 slapd[27043]: <<< dnPrettyNormal: <>, <> Nov 19 12:49:36 admin-auth-ldap-31 slapd[27043]: conn=1001 op=0 BIND dn="" method=163 Nov 19 12:49:36 admin-auth-ldap-31 slapd[27043]: do_bind: dn () SASL mech GSSAPI Nov 19 12:49:36 admin-auth-ldap-31 slapd[27043]: ==> sasl_bind: dn="" mech=GSSAPI datalen=617 Nov 19 12:49:37 admin-auth-ldap-31 slapd[27043]: Nov 19 12:49:54 admin-auth-ldap-31 slapd[27043]: Nov 19 12:49:55 admin-auth-ldap-31 slapd[27043]: Nov 19 12:50:26 admin-auth-ldap-31 slapd[27043]: 25r Nov 19 12:50:26 admin-auth-ldap-31 slapd[27043]: Nov 19 12:50:26 admin-auth-ldap-31 slapd[27043]: daemon: read active on 25 Nov 19 12:50:26 admin-auth-ldap-31 slapd[27043]: connection_get(25) Nov 19 12:50:26 admin-auth-ldap-31 slapd[27043]: connection_get(25): got connid=1001 Nov 19 12:50:26 admin-auth-ldap-31 slapd[27043]: connection_read(25): checking for input on id=1001 Nov 19 12:50:26 admin-auth-ldap-31 slapd[27043]: ber_get_next on fd 25 failed errno=0 (Success) Nov 19 12:50:26 admin-auth-ldap-31 slapd[27043]: connection_read(25): input error=-2 id=1001, closing. Nov 19 12:50:26 admin-auth-ldap-31 slapd[27043]: connection_closing: readying conn=1001 sd=25 for close Nov 19 12:50:26 admin-auth-ldap-31 slapd[27043]: connection_close: deferring conn=1001 sd=25 Nov 19 12:50:27 admin-auth-ldap-31 slapd[27043]: Nov 19 12:50:28 admin-auth-ldap-31 slapd[27043]:
So nothing obvious that I can see. Which is reasonable, because “eventually” it worked on the previous LDAP server, so can’t be a slapd problem. But I was hoping someone that have tried this on OS or behind a HAProxy setup might be able to shed some light on this.
PS. I’ve done the exact same thing at work, in AWS and there it works just fine. So I’m fairly certain it’s something with OS/HAProxy, but I don’t know how to debug that bit..
Turbo Fredriksson wrote:
I tried all day yesterday, and I could do the initial connection, but not get any results:
ldap_sasl_interactive_bind_s: Can't contact LDAP server (-1)
I see the connection in syslog on the LDAP server, but don’t get any results back.
Note that ldap_initialize() does not really open the connection. The first LDAP operation function called will actually open the connection.
I suspect the issue is in your load-balancer setup. Especially if you see slapd logging the request to syslog but your client does not receive a result.
Ciao, Michael.
On 19 Nov 2017, at 16:59, Michael Ströder michael@stroeder.com wrote:
Note that ldap_initialize() does not really open the connection.
Yes, that I knew. But it does work in the ldap_connect_to_host() at the beginning, it’s just the ldap_sasl_interactive_bind_s() a few microseconds later that fails for some reason..
I suspect the issue is in your load-balancer setup.
Yes, I’m absolutely convinced of that. That’s why I mentioned several times.
The fact that it works “eventually” (within two hours is the last number I have) is proof of that. The question is what/why [it takes so long to start working].
The listener (port 636 only) is there (and working almost immediately), which is indicated by the fact that the initial connection works), so the ldap_sasl_interactive_bind_s() should work through that one, right?
Have anyone tried running OpenLDAP behind HAProxy? Anything special one needs to do?
2017-11-19 18:09 GMT+01:00 Turbo Fredriksson turbo@bayour.com:
Have anyone tried running OpenLDAP behind HAProxy? Anything special one needs to do?
I do this often, without any particular issue. If you use LDAPS, you can add option ssl-hello-chk.
Here is a sample configuration file:
global log 127.0.0.1 local5 notice chroot /var/lib/haproxy user haproxy group haproxy daemon quiet
defaults log global option dontlognull option ldap-check retries 3 mode tcp balance roundrobin option redispatch
listen openldap :389 server ldap1 IP_LDAP1:390 check server ldap2 IP_LDAP2:390 check server ldap3 IP_LDAP3:390 check
defaults log global option dontlognull retries 3 mode tcp balance roundrobin option redispatch option ssl-hello-chk
listen openldap-ssl :636 server ldap1 IP_LDAP1:637 check server ldap2 IP_LDAP2:637 check server ldap3 IP_LDAP3:637 check
Clément.
On 20 Nov 2017, at 08:07, Clément OUDOT clem.oudot@gmail.com wrote:
2017-11-19 18:09 GMT+01:00 Turbo Fredriksson turbo@bayour.com:
Have anyone tried running OpenLDAP behind HAProxy?
I do this often, without any particular issue.
Ok, thanx. I thought so :(. I might be running an old version (v1.6.10) perhaps?
You’ve never had the issue I’m having? Or heard about it?
2017-11-20 11:59 GMT+01:00 Turbo Fredriksson turbo@bayour.com:
You’ve never had the issue I’m having? Or heard about it?
No but I don't use Kerberos authentication.
On 20 Nov 2017, at 11:06, Clément OUDOT clem.oudot@gmail.com wrote:
2017-11-20 11:59 GMT+01:00 Turbo Fredriksson turbo@bayour.com:
You’ve never had the issue I’m having? Or heard about it?
No but I don't use Kerberos authentication.
Ok, thanx for the info!!
On 20 Nov 2017, at 08:07, Clément OUDOT clem.oudot@gmail.com wrote:
option ldap-check option ssl-hello-chk
I’ve now had a chance to test both of these. Together and separate. Still no dice :( :(.
Dec 5 22:32:17 ldap50 slapd[786]: conn=1018 fd=23 ACCEPT from IP=<LOADBALANCER>:55807 (IP=<LDAP_SERVER>:636) Dec 5 22:32:17 ldap50 slapd[786]: conn=1018 fd=23 closed (TLS negotiation failure)
So the LB will never put the LDAP server online… Not sure why it gets TLS negotiation failure - I can search (on LDAPS) directly to the host from my workstation..
The SSL cert have both the CN with the FQDN of the host and a DNS value of the load balancer FQDN.
The haproxy config that OpenStack creates:
----- s n i p ----- # Configuration for lbaas-admin-auth-ldap global daemon user nobody group nogroup log /dev/log local0 log /dev/log local1 notice stats socket /var/lib/neutron/lbaas/v2/03090662-bac2-495f-8809-0d1e25b0bf21/haproxy_stats.sock mode 0666 level user
defaults log global retries 3 option redispatch timeout connect 5000 timeout client 50000 timeout server 50000
frontend 64d7db20-b245-4646-b94e-1e2e523c01d0 option tcplog bind <LOADBALANCER>:636 mode tcp default_backend 360d0c51-eb59-40b8-8c9a-fc3a3ff02822
backend 360d0c51-eb59-40b8-8c9a-fc3a3ff02822 mode tcp balance leastconn option ssl-hello-chk option ldap-check timeout check 5 server 96c97080-54be-406e-81e5-3bc50e1becdb LDAP_SERVER:636 weight 1 check inter 30s fall 5 ----- s n i p -----
HAProxy will, with these two options (ssl-hello-chk and ldap-check), say that there’s no servers available:
----- s n i p ----- Dec 5 22:39:17 OS_SRV haproxy[8864]: Server d6ed5563-3d54-4853-aafe-3c4fe7e6f409/92bcf829-36b4-417f-b50c-e93c83e29427 is DOWN, reason: Layer7 invalid response, info: "Not LDAPv3 protocol", check duration: 2ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. Dec 5 22:39:17 OS_SRV haproxy[8864]: Server d6ed5563-3d54-4853-aafe-3c4fe7e6f409/92bcf829-36b4-417f-b50c-e93c83e29427 is DOWN, reason: Layer7 invalid response, info: "Not LDAPv3 protocol", check duration: 2ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. Dec 5 22:39:23 OS_SRV haproxy[8864]: Server d6ed5563-3d54-4853-aafe-3c4fe7e6f409/60a60f15-0486-4305-a746-c9040bdafde2 is DOWN, reason: Layer7 invalid response, info: "Not LDAPv3 protocol", check duration: 2ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. Dec 5 22:39:23 OS_SRV haproxy[8864]: Server d6ed5563-3d54-4853-aafe-3c4fe7e6f409/60a60f15-0486-4305-a746-c9040bdafde2 is DOWN, reason: Layer7 invalid response, info: "Not LDAPv3 protocol", check duration: 2ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. Dec 5 22:39:29 OS_SRV haproxy[8864]: Server d6ed5563-3d54-4853-aafe-3c4fe7e6f409/c63c9f47-6f87-4f67-8c41-ab8d78e51761 is DOWN, reason: Layer7 invalid response, info: "Not LDAPv3 protocol", check duration: 2ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. Dec 5 22:39:29 OS_SRV haproxy[8864]: Server d6ed5563-3d54-4853-aafe-3c4fe7e6f409/c63c9f47-6f87-4f67-8c41-ab8d78e51761 is DOWN, reason: Layer7 invalid response, info: "Not LDAPv3 protocol", check duration: 2ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. Dec 5 22:39:29 OS_SRV haproxy[8864]: backend d6ed5563-3d54-4853-aafe-3c4fe7e6f409 has no server available! Dec 5 22:40:09 OS_SRV haproxy[8610]: 10.0.5.254:4049 [05/Dec/2017:22:40:09.436] 64d7db20-b245-4646-b94e-1e2e523c01d0 360d0c51-eb59-40b8-8c9a-fc3a3ff02822/<NOSRV> -1/-1/0 0 SC 0/0/0/0/0 0/0 ----- s n i p -----
Reading up on the options, these seems to be only for checking for online servers, not for “normal” communication. Since I can talk LDAPS with the servers both directly and via the load balancer (in both cases, as long as I don’t do KerberosV auth), this doesn’t seem to be the correct solution for me… ?
If I try to talk LDAPS + KRB5 auth directly to the server:
----- s n i p ----- ldap_sasl_interactive_bind_s: Invalid credentials (49) additional info: SASL(-13): authentication failure: GSSAPI Failure: gss_accept_sec_context ----- s n i p -----
which is correct because of how Kerberos checks hosts, principals and host keys etc, but if I talk to the load balancer:
----- s n i p ----- ldap_sasl_interactive_bind_s: Can't contact LDAP server (-1) ----- s n i p -----
Now, this usually works “after a few hours” if I just leave it alone. This particular server is proving to be very obstinate..
On Sunday, November 19, 2017 9:09:50 AM PST, Turbo Fredriksson wrote:
Have anyone tried running OpenLDAP behind HAProxy? Anything special one needs to do?
For Kerberos the problem is in Cyrus SASL and is true for all load balancers. Indeed it is true for any system that has more than one name. SASL checks the name that the connection was made to and if they don't match fails.
There are two solutions that I know of. The first is to configure the LDAP servers and keytab as though all members of the load balanced pool had the load balanced name. If you do it this way you cannot make a GSSAPI LDAP connection to an individual server only to the load balancer.
The second is to apply a one line patch to Cyrus SASL. I just apply the following patch to the servers that I manage.
Description: Accept valid creds not just those matching server name. --- a/plugins/gssapi.c +++ b/plugins/gssapi.c @@ -719,7 +719,7 @@ gssapi_server_mech_authneg(context_t *text, if ( server_creds == GSS_C_NO_CREDENTIAL) { GSS_LOCK_MUTEX(params->utils); maj_stat = gss_acquire_cred(&min_stat, - text->server_name, + GSS_C_NO_NAME, GSS_C_INDEFINITE, GSS_C_NO_OID_SET, GSS_C_ACCEPT,
This is not a new problem. I am pretty sure I filed a bug report about this years ago when I worked at Stanford, but I could not find it. I did find Simon Wilkinson's excellent description of the problem that I embedded in an old message to the list at:
https://www.openldap.org/lists/openldap-technical/201009/msg00017.html
Of course, once you apply the patch you will need to use a keytab with both principal names in it, the hostname and the load balancer name. For example:
# klist -ke /etc/ldap/ldap.keytab Keytab name: FILE:/etc/ldap/ldap.keytab KVNO Principal ---- ------------------------------------------------------------------- 1 ldap/somehost.somedomain.tld@SOMEDOMAIN.TLD 1 ldap/somelb.somedomain.tld@SOMEDOMAIN.TLD
Bill
On 3 Dec 2017, at 20:44, Bill MacAllister bill@ca-zephyr.org wrote:
For Kerberos the problem is in Cyrus SASL and is true for all load balancers. Indeed it is true for any system that has more than one name. SASL checks the name that the connection was made to and if they don't match fails.
Yes, I had that problem at work where we run LDAP/MIT Kerberos V behind AWS ELBs.
I managed to fix (with great pain!) so that I can now access LDAP via the one-name ELB, but not individually. Which, as it turned out, I’d prefer anyway. So I wrote my security group (firewall) rules accordingly.
So here at home, behind a HAProxy running on OpenStack, I did exactly the same. But this time I have a much … “weirder” problem. Usually, it doesn’t work right away. But if left completely alone for “a few hours”, it automagically works!
So in my case here at home, there’s something more sinister at work..
I’m 99% certain it’s something in either OpenStack or HAProxy, but I can’t figure out what! There’s still that one percent that I can’t explain - I see the initial attempt in the slapd logs, but not the subsequent one. Meaning, I think, that I can talk to slapd just fine, but … “something” that ldapsearch/ldapwhoami does fails..
openldap-technical@openldap.org