We run 4 2.4.16 servers as 2 provider/consumer pairs, one pair for our
staff systems and one pair for our teaching facilities.
They are all on Solaris10u7 xen virtual hosts.
The staff pair run fine
The consumer on the teaching pair runs fine
The provider on the teaching pair runs fine until it gets hit by a heavy
load, eg start of a lab when ~100 PCs try and authenticate their user.  At
this point it refuses to serve LDAP requests.  Traffic is still coming in
to the box and existing connections seem OK.
The break point is about 35PCs, below that there isn't a problem.
Restarting slapd cures the problem and off we go until the start of the
next big lab.
I've run at various log levels but not been able to see any obvious
messages.  All I see, even when everything is fine, are messages of the
form
send_search_entry: conn 11639  ber write failed.
connection_read(38): no connection!
The slapd.conf (minux the syncprov bit) is:
include         /usr/local/etc/openldap/schema/core.schema
include         /usr/local/etc/openldap/schema/cosine.schema
include         /usr/local/etc/openldap/schema/inetorgperson.schema
include         /usr/local/etc/openldap/schema/nis.schema
include         /usr/local/etc/openldap/schema/duaconf.schema
include         /usr/local/etc/openldap/schema/local.schema
pidfile         /var/openldap/run/slapd.pid
argsfile        /var/openldap/run/slapd.args
conn_max_pending        200
idletimeout     60
sizelimit       2000
loglevel        256
database        bdb
suffix          "dc=my,dc=domain"
rootdn          "cn=me,dc=my,dc=domain"
rootpw          {SSHA}guess
directory       /var/openldap/openldap-data
index   cn,entryCSN,entryUUID,gidNumber,ipHostNumber,memberUid eq
index   objectclass,uid,uidNumber,uniqueMember  eq
cachefree       16
cachesize       1500
checkpoint      0 60
dncachesize     1500
idlcachesize    3000
access to attrs=userPassword
        by self write
        by anonymous auth
        by dn.base="cn=fred,ou=Profile,dc=my,dc=domain"
read
        by * none
access to *
        by self write
        by users read
        by * read
The only entry in DB_CONFIG is set_cachesize   0       26214400        0
cache hits are at 99%
I'm stumped for a cause/solution, can anyone either give me a pointer as
to what to look for in the logs or suggest a possible cause.  Could it be
hitting the 256 open file limit?
Thanks
-- 
John Landamore
Department of Computer Science
University of Leicester
University Road, LEICESTER, LE1 7RH
J.Landamore(a)mcs.le.ac.uk
Phone: +44 (0)116 2523410       Fax: +44 (0)116 2523604