On Mon, Oct 15, 2018 at 02:54:56PM +0000, hyc@symas.com wrote:
Ondřej Kuzník wrote:
On Mon, Oct 15, 2018 at 01:53:30PM +0000, hyc@symas.com wrote:
ondra@mistotebe.net wrote:
Also, whenever we fall back from deltasync into plain syncrepl, we should make sure that the accesslog entries we generate from this are never used for further replication which might be thought to be a separate issue.
That should already be the case, since none of these ops will have a valid CSN.
I faintly remember Quanah seeing these accesslog entries used by consumers at some point, but I might be mistaken.
The more general point is making sure its potential syncrepl consumer not even try and use the accesslog entries we added before these - the refresh has created a strange gap in the middle (or worse, duplicated ops if a contextCSN element jumped backwards). But if we enforced that, the question is how to get modifications originating from this replica replicated elsewhere - unless we decide they can't be salvaged?
We could set the replica to reject user mods while in refresh phase. Not sure how friendly that is, whether apps would be smart enough to retry somewhere else.
The concern here is about changes that have happened before we found out we can't replicate from another server. And it is likely some of these changes are the reason we couldn't reconcile with our provider and would cause the same if we decided to push them.
And should the contextCSN reset terminate not just all inbound syncrepl sessions, but the outbound ones as well?
Need to be careful about race conditions here, or you could end up with all nodes just terminating each other and everything halting.
Yes, that would actually happen... The existing state seems quite destructive though, if you have that same situation now (two masters in present phase from each other at the same time), you lose data.
The question is what is the priority here? Currently it seems we want replication to continue at the expense of losing modifications on conflict. We might at least log that happened and allow someone to revert this decision later.