--On Monday, July 20, 2009 1:37 PM +0200 Christian Manal moenoel@informatik.uni-bremen.de wrote:
Hi everyone,
I have noticed a problem regarding the new writetimeout feature in OpenLDAP 2.4.17. I have a setup with one master and four replicas running on Solaris 10, using the hdb-backend with BDB 4.4 from the opencsw repository. Clients are Sun, Linux and Mac boxes in every possible variation.
I updated my servers last Friday from version 2.4.15 and now some cronjobs accessing the directory were sporadically failing over the weekend.
I monitored the server logs (log level 'stats') while running the involved scripts repeatedly and got something like the following results after fail-runs on all servers: (IPs and DNs altered)
Jul 20 11:42:43 ldapserver slapd[9053]: [ID 848112 local4.debug] conn=12479 fd=79 ACCEPT from IP=192.168.1.1:50210 (IP=0.0.0.0:389) Jul 20 11:42:43 ldapserver slapd[9053]: [ID 270379 local4.debug] conn=12479 op=0 EXT oid=1.3.6.1.4.1.1466.20037 Jul 20 11:42:43 ldapserver slapd[9053]: [ID 560212 local4.debug] conn=12479 op=0 STARTTLS Jul 20 11:42:43 ldapserver slapd[9053]: [ID 875301 local4.debug] conn=12479 op=0 RESULT oid= err=0 text= Jul 20 11:42:43 ldapserver slapd[9053]: [ID 105384 local4.debug] conn=12479 fd=79 TLS established tls_ssf=256 ssf=256 Jul 20 11:42:43 ldapserver slapd[9053]: [ID 215403 local4.debug] conn=12479 op=1 BIND dn="uid=dummyuser,ou=System,dc=example,dc=com" method=128 Jul 20 11:42:43 ldapserver slapd[9053]: [ID 600343 local4.debug] conn=12479 op=1 BIND dn="uid=dummyuser,ou=System,dc=example,dc=com" mech=SIMPLE ssf=0 Jul 20 11:42:43 ldapserver slapd[9053]: [ID 588225 local4.debug] conn=12479 op=1 RESULT tag=97 err=0 text= Jul 20 11:42:43 ldapserver slapd[9053]: [ID 469902 local4.debug] conn=12479 op=2 SRCH base="ou=people,dc=example,dc=com" scope=2 deref=2 filter="(objectClass=posixAccount)" Jul 20 11:42:43 ldapserver slapd[9053]: [ID 744844 local4.debug] conn=12479 op=2 SRCH attr=uid userpassword uidnumber gidnumber gecos homedirectory loginshell Jul 20 11:43:00 ldapserver slapd[9053]: [ID 485650 local4.debug] conn=12479 fd=79 closed (writetimeout)
I don't have the writetimeout keyword configured on any of the boxes and the affected script doesn't do any writes anyway. Also, the problem only seems to arise if the client takes a while to process a search result. The failing scripts are Net::LDAP based Perl scripts running on some old SPARC boxes, so it took them up to half a minute and more to complete.
Setting writetimeout to a high enough value seems solve the problem, but referring to the docs, this shouldn't happen with the keyword unset or set to 0.
Is this a bug or did I miss something? Did anyone else encounter this so far?
File an ITS:
http://www.openldap.org/its/ and include your server config.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration