Too many executing vs syncrepl - openldap-technical

28 Sep 2010


      Hello list,
So we are in the middle of a major upgrade of our OpenLDAP software, so it is a 
bit unfortunate that I have to track down issues at the same time.
os:  Solaris 10u8 x86
old: openldap-2.3.41 db-4.2.52.NC-PLUS_5_PATCHES
new: openldap-2.4.23 db-4.8.30.NC
We noticed that syncrepl stopped on pop01, pop03 and pop06 yesterday and fell 
behind. The only hints in slaplog was:
Sep 28 11:23:09 pop06.unix slapd[29027]: [ID 968320 local4.debug] do_syncrep2: L
DAP_RES_INTERMEDIATE - NEW_COOKIE
Sep 28 11:24:44 pop06.unix slapd[29027]: [ID 763815 local4.debug] connection_inp
ut: conn=123099 deferring operation: too many executing
Sep 28 11:24:44 pop06.unix slapd[29027]: [ID 763815 local4.debug] connection_inp
ut: conn=123099 deferring operation: pending operations
Sep 28 11:24:48 pop06.unix last message repeated 72 times
and there were no more syncrepl messages until we restarted slapd, 2 hours 
later. I wonder if the syncrepl connection received "too many executing". Is 
that possible? Can we make it so sync connections get higher priority as it 
were. In this case, it is new-ldap syncrepl to old-ldap for loopback lookups 
(dovecot).
Now, I would guess that getting "too many executing" is undesirable. Googling 
around it seems that what happens is that; one connection has more than half of 
the connection-pool operations already, and gets deferred.
What does "one connection" mean? From one IP (all connections are over loopback, 
except for syncrepl), or is it operations from one-tcp-stream? Or it some other 
kind of cookie, like rid?
Can I get slapd to tell me which connection it actually means? Having looked at 
the sources, it does not seem to have that ability, but I could always add our 
own prints. At least to get the IP of the requester. (I tried "conns" in 
LogLevel, but it prints all select() calls, and is unfortunately unrealistic to 
run on live servers. Currently I have 'stats' running.)
Or rather than hacking at the sources, should I invest in getting the overlay 
"monitor" to run? Would it show why we receive "too many executing".
I have also noticed a considerable performance drop when moving from old version 
to new version, and not entirely sure if that is something we can do something 
about.
Following this email is the juicy parts of slapd on most of our slaves/loopback 
slapd.
-- 
Jorgen Lundman       | lundman@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
Japan                | +81 (0)3 -3375-1767          (home)


loglevel        sync stats
access to *
         by dn.base="cn=replicator,dc=company,dc=jp" read
         by * break

access to attrs=userPassword
         by self write
         by anonymous auth
         by * none
access to *
         by self write
         by dn="cn=admin,dc=company,dc=jp" write
         by peername.ip=172.20.12.6    none
         by peername.ip=172.20.12.16   none
         by peername.ip=172.20.12.26   none
         by peername.ip=172.20.12.36   none
         by peername.ip=172.20.12.46   none
         by peername.ip=172.20.12.56   none
         by peername.ip=172.20.12.66   none
         by peername.ip=172.20.12.76   none
         by * read

password-hash {CRYPT}

database        hdb
suffix          "dc=company,dc=jp"
rootdn          "cn=admin,dc=company,dc=jp"
directory       /usr/local/var/openldap-data

# Indices to maintain
index   objectClass     eq
##index   uid                     pres,eq
index   uid                     eq
index   uidNumber               eq
index   mail                    eq
index   mailAlternateAddress    pres,eq
index   deliveryMode            eq
index   accountStatus           eq
index   gecos                   eq
index   radiusGroupName         eq
index   o                       pres,eq
index   entryCSN,entryUUID      eq
index   gidNumber               eq
index   DNSType                 eq
index   DNSIPAddr               eq
index   DNSData                 eq
index   DNSHostName             eq


checkpoint 128 15
cachesize 5000
idlcachesize 15000

overlay syncprov

syncprov-checkpoint 100 10
syncprov-sessionlog 100

dbconfig set_lk_detect DB_LOCK_DEFAULT
dbconfig set_lg_max 52428800
dbconfig set_cachesize 4 0 1
dbconfig set_flags db_log_autoremove
dbconfig set_lk_max_objects 1500
dbconfig set_lk_max_locks 1500
dbconfig set_lk_max_lockers 1500

# rid is last octet of IP, plus 256.
syncrepl   rid=279
                 provider=ldap://172.20.12.163
                 type=refreshAndPersist
                 interval=00:00:00:30
                 searchbase="dc=company,dc=jp"
                 filter="(objectClass=*)"
                 attrs="*,+"
                 scope=sub
                 schemachecking=off
                 bindmethod=simple
                 binddn="cn=admin,dc=company,dc=jp"
                 credentials="<secret>"
                 retry="60 10 300 +"

updateref       ldap://172.20.12.163