Howard Chu wrote:
Yes, I've seen the same. My suspicion now is that it's due to an update arriving in the consumer near when it transitions from refresh to persist mode, but I haven't been able to isolate it. I also note that adding a SLEEP1 near the beginning of test050, after the consumers have been started but before the ldapadd to populate the privder, completely eliminated the problem. So there's definitely an issue there that needs to be tracked down.
OK, finally understand the situation.
Server 3's consumer is talking to Server 2 and has entered persist mode. But Server 2 is still performing a refresh against Server 1. During a refresh, individual entries have no CSN in their sync cookie, because they arrive in indeterminate order. Because there's no cookie CSN, the writes have no CSN queued either. Since there is no queued CSN, they don't get onto the psearch queue. When the refresh phase completes, and Server 2 enters its own persist phase, it receives a CSN for its cookie. Writing this cookie causes the NEW_COOKIE messages to be sent out which causes Server 3 to update its context, even though it's missing some number of entries.
Without the NEW_COOKIE message, the test succeeds because some other provider will eventually supply the missing updates. (I.e., mostly by luck because there are many servers operating at once.)