Hello list,
So we are in the middle of a major upgrade of our OpenLDAP software, so it is a bit unfortunate that I have to track down issues at the same time.
os: Solaris 10u8 x86 old: openldap-2.3.41 db-4.2.52.NC-PLUS_5_PATCHES new: openldap-2.4.23 db-4.8.30.NC
We noticed that syncrepl stopped on pop01, pop03 and pop06 yesterday and fell behind. The only hints in slaplog was:
Sep 28 11:23:09 pop06.unix slapd[29027]: [ID 968320 local4.debug] do_syncrep2: L DAP_RES_INTERMEDIATE - NEW_COOKIE
Sep 28 11:24:44 pop06.unix slapd[29027]: [ID 763815 local4.debug] connection_inp ut: conn=123099 deferring operation: too many executing
Sep 28 11:24:44 pop06.unix slapd[29027]: [ID 763815 local4.debug] connection_inp ut: conn=123099 deferring operation: pending operations
Sep 28 11:24:48 pop06.unix last message repeated 72 times
and there were no more syncrepl messages until we restarted slapd, 2 hours later. I wonder if the syncrepl connection received "too many executing". Is that possible? Can we make it so sync connections get higher priority as it were. In this case, it is new-ldap syncrepl to old-ldap for loopback lookups (dovecot).
Now, I would guess that getting "too many executing" is undesirable. Googling around it seems that what happens is that; one connection has more than half of the connection-pool operations already, and gets deferred.
What does "one connection" mean? From one IP (all connections are over loopback, except for syncrepl), or is it operations from one-tcp-stream? Or it some other kind of cookie, like rid?
Can I get slapd to tell me which connection it actually means? Having looked at the sources, it does not seem to have that ability, but I could always add our own prints. At least to get the IP of the requester. (I tried "conns" in LogLevel, but it prints all select() calls, and is unfortunately unrealistic to run on live servers. Currently I have 'stats' running.)
Or rather than hacking at the sources, should I invest in getting the overlay "monitor" to run? Would it show why we receive "too many executing".
I have also noticed a considerable performance drop when moving from old version to new version, and not entirely sure if that is something we can do something about.
Following this email is the juicy parts of slapd on most of our slaves/loopback slapd.