Hello list,
So we are in the middle of a major upgrade of our OpenLDAP software, so it is a
bit unfortunate that I have to track down issues at the same time.
os: Solaris 10u8 x86
old: openldap-2.3.41 db-4.2.52.NC-PLUS_5_PATCHES
new: openldap-2.4.23 db-4.8.30.NC
We noticed that syncrepl stopped on pop01, pop03 and pop06 yesterday and fell
behind. The only hints in slaplog was:
Sep 28 11:23:09 pop06.unix slapd[29027]: [ID 968320 local4.debug] do_syncrep2: L
DAP_RES_INTERMEDIATE - NEW_COOKIE
Sep 28 11:24:44 pop06.unix slapd[29027]: [ID 763815 local4.debug] connection_inp
ut: conn=123099 deferring operation: too many executing
Sep 28 11:24:44 pop06.unix slapd[29027]: [ID 763815 local4.debug] connection_inp
ut: conn=123099 deferring operation: pending operations
Sep 28 11:24:48 pop06.unix last message repeated 72 times
and there were no more syncrepl messages until we restarted slapd, 2 hours
later. I wonder if the syncrepl connection received "too many executing". Is
that possible? Can we make it so sync connections get higher priority as it
were. In this case, it is new-ldap syncrepl to old-ldap for loopback lookups
(dovecot).
Now, I would guess that getting "too many executing" is undesirable. Googling
around it seems that what happens is that; one connection has more than half of
the connection-pool operations already, and gets deferred.
What does "one connection" mean? From one IP (all connections are over loopback,
except for syncrepl), or is it operations from one-tcp-stream? Or it some other
kind of cookie, like rid?
Can I get slapd to tell me which connection it actually means? Having looked at
the sources, it does not seem to have that ability, but I could always add our
own prints. At least to get the IP of the requester. (I tried "conns" in
LogLevel, but it prints all select() calls, and is unfortunately unrealistic to
run on live servers. Currently I have 'stats' running.)
Or rather than hacking at the sources, should I invest in getting the overlay
"monitor" to run? Would it show why we receive "too many executing".
I have also noticed a considerable performance drop when moving from old version
to new version, and not entirely sure if that is something we can do something
about.
Following this email is the juicy parts of slapd on most of our slaves/loopback
slapd.
--
Jorgen Lundman | <lundman(a)lundman.net>
Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell)
Japan | +81 (0)3 -3375-1767 (home)
loglevel sync stats
access to *
by dn.base="cn=replicator,dc=company,dc=jp" read
by * break
access to attrs=userPassword
by self write
by anonymous auth
by * none
access to *
by self write
by dn="cn=admin,dc=company,dc=jp" write
by peername.ip=172.20.12.6 none
by peername.ip=172.20.12.16 none
by peername.ip=172.20.12.26 none
by peername.ip=172.20.12.36 none
by peername.ip=172.20.12.46 none
by peername.ip=172.20.12.56 none
by peername.ip=172.20.12.66 none
by peername.ip=172.20.12.76 none
by * read
password-hash {CRYPT}
database hdb
suffix "dc=company,dc=jp"
rootdn "cn=admin,dc=company,dc=jp"
directory /usr/local/var/openldap-data
# Indices to maintain
index objectClass eq
##index uid pres,eq
index uid eq
index uidNumber eq
index mail eq
index mailAlternateAddress pres,eq
index deliveryMode eq
index accountStatus eq
index gecos eq
index radiusGroupName eq
index o pres,eq
index entryCSN,entryUUID eq
index gidNumber eq
index DNSType eq
index DNSIPAddr eq
index DNSData eq
index DNSHostName eq
checkpoint 128 15
cachesize 5000
idlcachesize 15000
overlay syncprov
syncprov-checkpoint 100 10
syncprov-sessionlog 100
dbconfig set_lk_detect DB_LOCK_DEFAULT
dbconfig set_lg_max 52428800
dbconfig set_cachesize 4 0 1
dbconfig set_flags db_log_autoremove
dbconfig set_lk_max_objects 1500
dbconfig set_lk_max_locks 1500
dbconfig set_lk_max_lockers 1500
# rid is last octet of IP, plus 256.
syncrepl rid=279
provider=ldap://172.20.12.163
type=refreshAndPersist
interval=00:00:00:30
searchbase="dc=company,dc=jp"
filter="(objectClass=*)"
attrs="*,+"
scope=sub
schemachecking=off
bindmethod=simple
binddn="cn=admin,dc=company,dc=jp"
credentials="<secret>"
retry="60 10 300 +"
updateref ldap://172.20.12.163