I am upgrading the openldap servers and ran into a bit of a problem. SASL/GSSAPI binds to the new server are too slow. An ldapsearch to the old server using GSSAPI to bind is much faster on the old server then the same search on the new server.
I am not even sure where to start to debug this and am hoping that some one will have some ideas.
First off, here are a few details:
The old LDAP server is running Openldap 2.3.43 on CentOS 5.2 with the CentOS built MIT Kerberos(1.6.1) and saslauthd (2.1.22). This server is configured with the slapd.conf file. The host name is infinite.mines.edu in the example runs below.
The new LDAP server is running Openldap 2.4.31 on CentOS 6.3 with the CentOS built MIT Kerberos (1.9) and saslauthd (2.1.23). This server is configured with slapd-config (new dynamic configuration is very cool!) The host name is infinte-temp.mines.edu in the example runs below.
Both the old and new servers are configured to use SASL for GSSAPI and for simple binds.
First test, simple bind to new server and then the old server:
[testua@merlin ~]$ time ldapsearch -LLL -ZZ -Duid=testua,ou=People,dc=mines,dc=edu -y passwd -x -Hldap://infinite-temp.mines.edu/ -bou=People,dc=mines,dc=edu uid=testua dn: uid=testua,ou=People,dc=mines,dc=edu uid: testua cn: estua, t objectClass: account objectClass: posixAccount objectClass: top objectClass: shadowAccount shadowLastChange: 12780 shadowMax: 99999 shadowWarning: 7 loginShell: /bin/bash gidNumber: 11192 host: imagine.mines.edu gecos: estua, t homeDirectory: /u/ca/fl/testua userPassword:: e1NBU0x9dGVzdHVhQE1JTkVTLkVEVQ== uidNumber: 11192
real 0m0.324s user 0m0.017s sys 0m0.004s [testua@merlin ~]$ time ldapsearch -LLL -ZZ -Duid=testua,ou=People,dc=mines,dc=edu -y passwd -x -Hldap://infinite.mines.edu/ -bou=People,dc=mines,dc=edu uid=testua dn: uid=testua,ou=People,dc=mines,dc=edu uid: testua cn: estua, t objectClass: account objectClass: posixAccount objectClass: top objectClass: shadowAccount shadowLastChange: 12780 shadowMax: 99999 shadowWarning: 7 loginShell: /bin/bash gidNumber: 11192 host: imagine.mines.edu gecos: estua, t homeDirectory: /u/ca/fl/testua userPassword:: e1NBU0x9dGVzdHVhQE1JTkVTLkVEVQ== uidNumber: 11192
real 0m0.163s user 0m0.016s sys 0m0.004s [testua@merlin ~]$
As you can see, the new server takes nearly twice as long to perform the search as the old server. Both servers are using saslauthd to send the password to Kerberos for authentication.
Next test, GSSAPI bind to the new server and then the old server: [testua@merlin ~]$ kinit Password for testua@MINES.EDU: [testua@merlin ~]$ klist Ticket cache: FILE:/tmp/krb5cc_11192_d1yOuC Default principal: testua@MINES.EDU
Valid starting Expires Service principal 07/25/12 13:32:11 07/26/12 04:32:11 krbtgt/MINES.EDU@MINES.EDU renew until 07/26/12 13:32:07
Kerberos 4 ticket cache: /tmp/tkt11192 klist: You have no tickets cached [testua@merlin ~]$ time ldapsearch -LLL -ZZ -Hldap://infinite-temp.mines.edu/ -bou=People,dc=mines,dc=edu uid=testua SASL/GSSAPI authentication started SASL username: testua@MINES.EDU SASL SSF: 56 SASL data security layer installed. dn: uid=testua,ou=People,dc=mines,dc=edu uid: testua cn: estua, t objectClass: account objectClass: posixAccount objectClass: top objectClass: shadowAccount shadowLastChange: 12780 shadowMax: 99999 shadowWarning: 7 loginShell: /bin/bash gidNumber: 11192 host: imagine.mines.edu gecos: estua, t homeDirectory: /u/ca/fl/testua userPassword:: e1NBU0x9dGVzdHVhQE1JTkVTLkVEVQ== uidNumber: 11192
real 0m1.145s user 0m0.021s sys 0m0.004s [testua@merlin ~]$ time ldapsearch -LLL -ZZ -Hldap://infinite.mines.edu/ -bou=People,dc=mines,dc=edu uid=testua SASL/GSSAPI authentication started SASL username: testua@MINES.EDU SASL SSF: 56 SASL data security layer installed. dn: uid=testua,ou=People,dc=mines,dc=edu uid: testua cn: estua, t objectClass: account objectClass: posixAccount objectClass: top objectClass: shadowAccount shadowLastChange: 12780 shadowMax: 99999 shadowWarning: 7 loginShell: /bin/bash gidNumber: 11192 host: imagine.mines.edu gecos: estua, t homeDirectory: /u/ca/fl/testua userPassword:: e1NBU0x9dGVzdHVhQE1JTkVTLkVEVQ== uidNumber: 11192
real 0m0.123s user 0m0.021s sys 0m0.003s [testua@merlin ~]$ klist Ticket cache: FILE:/tmp/krb5cc_11192_d1yOuC Default principal: testua@MINES.EDU
Valid starting Expires Service principal 07/25/12 13:32:11 07/26/12 04:32:11 krbtgt/MINES.EDU@MINES.EDU renew until 07/26/12 13:32:07 07/25/12 13:32:33 07/26/12 04:32:11 ldap/infinite-temp.mines.edu@MINES.EDU renew until 07/26/12 13:32:07 07/25/12 13:32:41 07/26/12 04:32:11 ldap/infinite.mines.edu@MINES.EDU renew until 07/26/12 13:32:07
Kerberos 4 ticket cache: /tmp/tkt11192 klist: You have no tickets cached [testua@merlin ~]$
The old server is 9 times faster then the new server.
This last test is to show that an anonymous bind is very fast and indicates to me that the network, BDB, caching, etc are not the issue. This test is to both servers, using a simple bind, first the old server and then the new server:
[testua@merlin ~]$ time ldapsearch -LLL -ZZ -x -Hldap://infinite.mines.edu/ -bou=People,dc=mines,dc=edu uid=testua dn: uid=testua,ou=People,dc=mines,dc=edu uid: testua cn: estua, t objectClass: account objectClass: posixAccount objectClass: top objectClass: shadowAccount shadowLastChange: 12780 shadowMax: 99999 shadowWarning: 7 loginShell: /bin/bash gidNumber: 11192 host: imagine.mines.edu gecos: estua, t homeDirectory: /u/ca/fl/testua userPassword:: e1NBU0x9dGVzdHVhQE1JTkVTLkVEVQ== uidNumber: 11192
real 0m0.049s user 0m0.017s sys 0m0.005s [testua@merlin ~]$ time ldapsearch -LLL -ZZ -x -Hldap://infinite-temp.mines.edu/ -bou=People,dc=mines,dc=edu uid=testua dn: uid=testua,ou=People,dc=mines,dc=edu uid: testua cn: estua, t objectClass: account objectClass: posixAccount objectClass: top objectClass: shadowAccount shadowLastChange: 12780 shadowMax: 99999 shadowWarning: 7 loginShell: /bin/bash gidNumber: 11192 host: imagine.mines.edu gecos: estua, t homeDirectory: /u/ca/fl/testua userPassword:: e1NBU0x9dGVzdHVhQE1JTkVTLkVEVQ== uidNumber: 11192
real 0m0.029s user 0m0.014s sys 0m0.006s [testua@merlin ~]$
When using an anonymous bind, the old server takes longer then the new server -- which is what I would expect given that the new server has twice the number of faster processors and double the memory of the old server.
Any ideas?
Thanks!
Matt mbrookov@mines.edu
--On Wednesday, July 25, 2012 2:39 PM -0600 "Matthew B. Brookover" mbrookov@mines.edu wrote:
When using an anonymous bind, the old server takes longer then the new server -- which is what I would expect given that the new server has twice the number of faster processors and double the memory of the old server.
Any ideas?
DNS lookup issues? MIT's Kerberos replay cache? If it is pausing in the GSSAPI negotiation, see if you can figure out which function call is hanging the longest using a one entry result and -d -1 to ldapsearch.
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
On Tue, 2012-09-04 at 08:50 -0700, Quanah Gibson-Mount wrote:
--On Wednesday, July 25, 2012 2:39 PM -0600 "Matthew B. Brookover" mbrookov@mines.edu wrote:
When using an anonymous bind, the old server takes longer then the new server -- which is what I would expect given that the new server has twice the number of faster processors and double the memory of the old server.
Any ideas?
DNS lookup issues? MIT's Kerberos replay cache? If it is pausing in the GSSAPI negotiation, see if you can figure out which function call is hanging the longest using a one entry result and -d -1 to ldapsearch.
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc.
Zimbra :: the leader in open source messaging and collaboration
It turns out, that this is only a problem when selinux is enabled. Turn off selinux and SASL is much faster. I do not know if saslauthd or Kerberos is the problem. I have been able to recreate the issue with the sasl-sample client and server when selinux is enabled. I have also been able to recreate the issue with the CentOS 6.3 build of Cyrus SASL 2.1.23 and my own build of Cyrus SASL 2.1.25 on CentOS 6.3. On my todo list is to re-post the issue to the SASL mail list, but unfortunately I have not had time.
I figured my original message got gobbled up by a moderator or spam filter. I posted this in July and was rather surprised to see it show up yesterday.
I like your suggestion of looking for a specific function call where it is hanging. When I get a chance, I will try gprof on saslauthd.
thanks
Matt
openldap-technical@openldap.org