quanah@OpenLDAP.org wrote:
Full_Name: Quanah Gibson-Mount Version: 2.3/2.4/HEAD OS: Linux 2.6 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (75.111.29.239)
I noticed back in testing with OpenLDAP 2.3 that if a master gets a high rate of changes, and you have 3+ replicas, usually 2 replicas will end up getting all of the changes while the 3rd+ replicas have to wait until those 2 finish before getting changes. If the high rate of changes goes on for a long enough period of time, this can cause the other replicas to get so far out of sync that it is more efficient to reload them than to wait on them to re-sync. I discussed this with Howard, and in reviewing the code, he sees there's an underlying design issue with updates that is causing this. His comments:
Once a thread for a psearch wakes up, it sends all the changes that were queued so it may hog an entire thread for a long time before the next psearch comes off the queue
Fixing this issue would require a complete redesign of the psearch queue handling. Instead of queuing up a separate response per psearch, there should be a single queue of responses, and the qplayer should iterate thru to match a response to each of the active psearches. That would guarantee that all replicas receive a given change before any of them receives the next change. This would also help with the ordering issues discussed recently on -technical and -devel.
I suspect this is too big a change to target the next (.16) release, since we're focusing on re-stabilizing the code right now.