I've been trying to figure out why syncrepl used on a backend that is subordinate to a glue database with the syncprov overlay should save the contextCSN in the suffix of the glue database rather than the suffix of the backend where syncrepl is used. But all I come up with are reasons why this should not be the case. So, unless anyone can enlighten me as to what I'm missing, I suggest that this be changed.
The problem with the current design is that it makes it impossible to reliably replicate more than one subordinate db from the same remote server, as there are now race conditions where one of the subordinate backends could save an updated contextCSN value that is picked up by the other before it has finished its synchronization. An example of a configuration where more than one subordinate db replicated from the same server might be necessary is the central master described in my previous posting in http://www.openldap.org/lists/openldap-devel/200806/msg00041.html
My idea as to how this race condition could be verified was to add enough entries to one of the backends (while the consumer was stopped) to make it possible to restart the consumer after the first backend had saved the updated contextCSN but before the second has finished its synchronization. But I was able to produce it by simply add or delete of an entry in one of the backends before starting the consumer. Far to often was the backend without any changes able to pick up and save the updated contextCSN from the producer before syncrepl on the second backend fetched its initial value. I.e it started with an updated contextCSN and didn't receive the changes that had taken place on the producer. If syncrepl stored the values in the suffix of their own database then they wouldn't interfere with each other like this.
There is a similar problem in syncprov, as it must use the lowest contextCSN value (with a given sid) saved by the syncrepl backends configured within the subtree where syncprov is used. But to do that it also needs to distinguish the contextCSN values of each syncrepl backend, which it can't do when they all save them in the glue suffix. This also implies that syncprov must ignore contextCSN updates from syncrepl until all syncrepl backends has saved a value, and that syncprov on the provider must send newCookie sync info messages when it updates its contextCSN value when the changed entry isn't being replicated to a consumer. I.e as outlined in the message referred to above.
Neither of these changes should interfere with ordinary multi-master configurations where syncrepl and syncprov are both use on the same (glue) database.
I'll volunteer to implement and test the necessary changes if this is the right solution. But to know whether my analysis is correct or not I need feedback. So, comments please?
-- Rein Tollevik Basefarm AS
Rein Tollevik wrote:
I've been trying to figure out why syncrepl used on a backend that is subordinate to a glue database with the syncprov overlay should save the contextCSN in the suffix of the glue database rather than the suffix of the backend where syncrepl is used. But all I come up with are reasons why this should not be the case. So, unless anyone can enlighten me as to what I'm missing, I suggest that this be changed.
The problem with the current design is that it makes it impossible to reliably replicate more than one subordinate db from the same remote server, as there are now race conditions where one of the subordinate backends could save an updated contextCSN value that is picked up by the other before it has finished its synchronization. An example of a configuration where more than one subordinate db replicated from the same server might be necessary is the central master described in my previous posting in http://www.openldap.org/lists/openldap-devel/200806/msg00041.html
There are only two supported modes of operation intended here. In one case, the glued databases each have their own syncprov overlay, and replication does not cross glue boundaries. In the other case, there is a single syncprov overlay for the entire glued tree, and the boundaries between glued DBs are ignored. In this config, all of the contextCSNs must be saved in the glue DB so that the single syncprov overlay can stay informed about any underlying changes.
Howard Chu skrev:
There are only two supported modes of operation intended here. In one case, the glued databases each have their own syncprov overlay, and replication does not cross glue boundaries. In the other case, there is a single syncprov overlay for the entire glued tree, and the boundaries between glued DBs are ignored. In this config, all of the contextCSNs must be saved in the glue DB so that the single syncprov overlay can stay informed about any underlying changes.
I understand that syncprov needs to be informed about changes in the subordinate DBs, and it is my intention that it shall stay that way. syncrepl on the subordinate db must continue to write through the glue database so that syncprov sees all changes, including the update to the contextCSN. It is already being specially informed about the contextCSN updates, to exclude them from replication. Syncprov must itself update the contextCSN values it manages in its own suffix when it receives these updates from syncrepl. I.e the contextCSN values would end up being stored in the suffix of both the glue and the subordinate DB. And as syncprov in some situations must advertise an older csn value (for a given sid) than syncrepl on the subordinate DBs this seems correct to do.
My suggested change would add support for the kind of configuration I have outlined, without harming the currently supported configurations. It should be a fairly simple change, so I still suggest that it is made.
Rein
Rein Tollevik wrote:
I've been trying to figure out why syncrepl used on a backend that is subordinate to a glue database with the syncprov overlay should save the contextCSN in the suffix of the glue database rather than the suffix of the backend where syncrepl is used. But all I come up with are reasons why this should not be the case. So, unless anyone can enlighten me as to what I'm missing, I suggest that this be changed.
The problem with the current design is that it makes it impossible to reliably replicate more than one subordinate db from the same remote server, as there are now race conditions where one of the subordinate backends could save an updated contextCSN value that is picked up by the other before it has finished its synchronization. An example of a configuration where more than one subordinate db replicated from the same server might be necessary is the central master described in my previous posting in http://www.openldap.org/lists/openldap-devel/200806/msg00041.html
My idea as to how this race condition could be verified was to add enough entries to one of the backends (while the consumer was stopped) to make it possible to restart the consumer after the first backend had saved the updated contextCSN but before the second has finished its synchronization. But I was able to produce it by simply add or delete of an entry in one of the backends before starting the consumer. Far to often was the backend without any changes able to pick up and save the updated contextCSN from the producer before syncrepl on the second backend fetched its initial value. I.e it started with an updated contextCSN and didn't receive the changes that had taken place on the producer. If syncrepl stored the values in the suffix of their own database then they wouldn't interfere with each other like this.
OK.
There is a similar problem in syncprov, as it must use the lowest contextCSN value (with a given sid) saved by the syncrepl backends configured within the subtree where syncprov is used. But to do that it also needs to distinguish the contextCSN values of each syncrepl backend, which it can't do when they all save them in the glue suffix. This also implies that syncprov must ignore contextCSN updates from syncrepl until all syncrepl backends has saved a value, and that syncprov on the provider must send newCookie sync info messages when it updates its contextCSN value when the changed entry isn't being replicated to a consumer. I.e as outlined in the message referred to above.
Then (at least) at server startup time syncprov must retrieve the contextCSNs from all of its subordinate DBs. Perhaps a subtree search with filter "(contextCSN=*)" would suffice; this would of course require setting a presence index on this attribute to run reasonably. (Or we can add a glue function to return a list of the subordinate suffixes or DBs...)
By the way, please use "subordinate database" and "superior database" when discussing these things; "glue database" is too ambiguous.
Rein Tollevik wrote:
I've been trying to figure out why syncrepl used on a backend that is subordinate to a glue database with the syncprov overlay should save the contextCSN in the suffix of the glue database rather than the suffix of the backend where syncrepl is used. But all I come up with are reasons why this should not be the case. So, unless anyone can enlighten me as to what I'm missing, I suggest that this be changed.
The problem with the current design is that it makes it impossible to reliably replicate more than one subordinate db from the same remote server, as there are now race conditions where one of the subordinate backends could save an updated contextCSN value that is picked up by the other before it has finished its synchronization. An example of a configuration where more than one subordinate db replicated from the same server might be necessary is the central master described in my previous posting in http://www.openldap.org/lists/openldap-devel/200806/msg00041.html
My idea as to how this race condition could be verified was to add enough entries to one of the backends (while the consumer was stopped) to make it possible to restart the consumer after the first backend had saved the updated contextCSN but before the second has finished its synchronization. But I was able to produce it by simply add or delete of an entry in one of the backends before starting the consumer. Far to often was the backend without any changes able to pick up and save the updated contextCSN from the producer before syncrepl on the second backend fetched its initial value. I.e it started with an updated contextCSN and didn't receive the changes that had taken place on the producer. If syncrepl stored the values in the suffix of their own database then they wouldn't interfere with each other like this.
There is a similar problem in syncprov, as it must use the lowest contextCSN value (with a given sid) saved by the syncrepl backends configured within the subtree where syncprov is used. But to do that it also needs to distinguish the contextCSN values of each syncrepl backend, which it can't do when they all save them in the glue suffix. This also implies that syncprov must ignore contextCSN updates from syncrepl until all syncrepl backends has saved a value, and that syncprov on the provider must send newCookie sync info messages when it updates its contextCSN value when the changed entry isn't being replicated to a consumer. I.e as outlined in the message referred to above.
It appears that the current code is sending newCookie messages pretty much all the time. It's definitely too chatty now, and it appears that it's breaking test050 sometimes, though I still haven't identified exactly why. I thought it was because the consumer was accepting the new cookie values unconditionally, but even after filtering out old values test050 still failed. #if'ing out the relevant code in syncprov.c makes test050 run fine though. (syncprov.c:1675 thru 1723.)
Neither of these changes should interfere with ordinary multi-master configurations where syncrepl and syncprov are both use on the same (glue) database.
Having spent the last 12 hours prodding at test050 I find that whenever I have it working well, test058 "breaks" with contextCSN mismatches. At this point I really have to question the rationale behind test058. First and foremost, syncprov should not be sending gratuitous New Cookie messages to consumers whose search terms are outside the scope of the update. I.e., if the actual data update didn't go to the consumer, then the following cookie update should not either. In such an asymmetric configuration, it should be expected that the contextCSNs will not match across all the servers, and forcing them all to match is beginning to look like an error, to me.
I'll volunteer to implement and test the necessary changes if this is the right solution. But to know whether my analysis is correct or not I need feedback. So, comments please?
Howard Chu wrote:
It appears that the current code is sending newCookie messages pretty much all the time. It's definitely too chatty now, and it appears that it's breaking test050 sometimes, though I still haven't identified exactly why. I thought it was because the consumer was accepting the new cookie values unconditionally, but even after filtering out old values test050 still failed. #if'ing out the relevant code in syncprov.c makes test050 run fine though. (syncprov.c:1675 thru 1723.)
The newCookie messages should only be sent if the local csn set is updated without being accompanied by a real modification. And in an MMR setup like test050 that should never happen except maybe during the initial replication of the databases.
Any occurrences of newCookie message should be a symptom of another bug, and I do believe one such race condition exist in syncrepl. One possible scenario, with 4 or more hosts:
server1 makes two or more changes to the db, with csn n and n+1. server2 receive both, and starts replicating them to server3. server3 receives and starts processing the first change from server1. It updates cs_pvals in the syncrepl structure with the csn n of the first modification. Then, the same modification is received from server2, but is rejected as being too old. The second modification is received from server2, this time being accepted. This second modification is tagged with csn n+1, which gets stored in the db by syncrepl_updateCookie and picked up by syncprov. syncprov on server3 replicates the second change with csn n+1 to server4. server4 accepts the second modification from server3, without having received the first change. And when that arrives from server1 or 2 it will be rejected as being too old.
If the second modify operation is received and processed by server3 after it have added csn n to the csn queue, but before it is committed, the second modification will be tagged with csn n. The csn being written to the db is still csn n+1 though, which will be picked up by syncprov and trigger a newCookie message. Even without this, the csns stored in the db on server3 is invalid and will result in an incomplete db should it fail before the first modification completes.
The csns for any given sid are sent by the originating server in order, I think the fix should be to always process them in the same order in syncrepl. For each sid in the csn set there should be one mutex, and modifications with any given sid should only take place in the thread holding the mutex. To avoid stalling too long it must be possible for the other syncrepl stanzas to note that a csn is too old without waiting on the mutex for the csn sid.
I don't think it is correct for syncrepl to fetch csn values from syncprov either. The only csn syncprov can update is the one with the local sid, and syncrepl should simply ignore modifications tagged with csn values with its own sid. Provided syncrepl starts the replication phase with a csn value with its own sid that is. The latter is to cover the case where a server is being reinitialized from one of its peers, it should then accept any changes that originated on the local server before it was reinitialized. Upon completing the initial replication phase it will receive a csn set that may include its own sid, and it should start ignoring modification with that sid.
Neither of these changes should interfere with ordinary multi-master configurations where syncrepl and syncprov are both use on the same (glue) database.
Having spent the last 12 hours prodding at test050 I find that whenever I have it working well, test058 "breaks" with contextCSN mismatches. At this point I really have to question the rationale behind test058. First and foremost, syncprov should not be sending gratuitous New Cookie messages to consumers whose search terms are outside the scope of the update. I.e., if the actual data update didn't go to the consumer, then the following cookie update should not either. In such an asymmetric configuration, it should be expected that the contextCSNs will not match across all the servers, and forcing them all to match is beginning to look like an error, to me.
Whenever the provider makes a local change that should not be replicated to the consumer the consumers database state continutes to be in sync. Yet, its csn set indicates that it isn't and it will always start out replicating all changes made after the oldest csn it holds. Which can be quite a lot. The only way to fix this is to send the newCookie messages.
Rein
Rein Tollevik wrote:
Howard Chu wrote:
It appears that the current code is sending newCookie messages pretty much all the time. It's definitely too chatty now, and it appears that it's breaking test050 sometimes, though I still haven't identified exactly why. I thought it was because the consumer was accepting the new cookie values unconditionally, but even after filtering out old values test050 still failed. #if'ing out the relevant code in syncprov.c makes test050 run fine though. (syncprov.c:1675 thru 1723.)
The newCookie messages should only be sent if the local csn set is updated without being accompanied by a real modification. And in an MMR setup like test050 that should never happen except maybe during the initial replication of the databases.
Any occurrences of newCookie message should be a symptom of another bug,
Yes, you're right. I was finally able to trace one of the occurrences and fix it in HEAD. In this particular case, an entry was Added and that event was pushed into the syncprov response queue. Then the entry was Modified and again the response was queued. But when the queue was processed, it was retrieving the entry from the DB at that point in time - so the update was sent out with the Add's CSN in the sync control, but the entry's entryCSN attribute already had the Mod's stamp in it. That's why during the updateCookie step there was a missing CSN...
Anyway, this race condition has been fixed in HEAD by enqueuing the dup'd entry, so that the outbound updates all have consistent state.
and I do believe one such race condition exist in syncrepl. One possible scenario, with 4 or more hosts:
server1 makes two or more changes to the db, with csn n and n+1. server2 receive both, and starts replicating them to server3. server3 receives and starts processing the first change from server1. It updates cs_pvals in the syncrepl structure with the csn n of the first modification. Then, the same modification is received from server2, but is rejected as being too old. The second modification is received from server2, this time being accepted. This second modification is tagged with csn n+1, which gets stored in the db by syncrepl_updateCookie and picked up by syncprov. syncprov on server3 replicates the second change with csn n+1 to server4. server4 accepts the second modification from server3, without having received the first change. And when that arrives from server1 or 2 it will be rejected as being too old.
Cannot happen. Every server sends its changes out in order; server 4 cannot receive csn n+1 from server 3 unless it has already received csn n from server 3.
If the second modify operation is received and processed by server3 after it have added csn n to the csn queue, but before it is committed, the second modification will be tagged with csn n. The csn being written to the db is still csn n+1 though, which will be picked up by syncprov and trigger a newCookie message. Even without this, the csns stored in the db on server3 is invalid and will result in an incomplete db should it fail before the first modification completes.
Sholdn't happen now; the cs_pmutex will prevent a new sync op from starting. Likewise syncprov_op_response will prevent a new mod from completing.
The csns for any given sid are sent by the originating server in order, I think the fix should be to always process them in the same order in syncrepl. For each sid in the csn set there should be one mutex, and modifications with any given sid should only take place in the thread holding the mutex. To avoid stalling too long it must be possible for the other syncrepl stanzas to note that a csn is too old without waiting on the mutex for the csn sid.
That sounds ok.
I don't think it is correct for syncrepl to fetch csn values from syncprov either. The only csn syncprov can update is the one with the local sid, and syncrepl should simply ignore modifications tagged with csn values with its own sid. Provided syncrepl starts the replication phase with a csn value with its own sid that is. The latter is to cover the case where a server is being reinitialized from one of its peers, it should then accept any changes that originated on the local server before it was reinitialized. Upon completing the initial replication phase it will receive a csn set that may include its own sid, and it should start ignoring modification with that sid.
Makes sense.
Neither of these changes should interfere with ordinary multi-master configurations where syncrepl and syncprov are both use on the same (glue) database.
Having spent the last 12 hours prodding at test050 I find that whenever I have it working well, test058 "breaks" with contextCSN mismatches. At this point I really have to question the rationale behind test058. First and foremost, syncprov should not be sending gratuitous New Cookie messages to consumers whose search terms are outside the scope of the update. I.e., if the actual data update didn't go to the consumer, then the following cookie update should not either. In such an asymmetric configuration, it should be expected that the contextCSNs will not match across all the servers, and forcing them all to match is beginning to look like an error, to me.
Whenever the provider makes a local change that should not be replicated to the consumer the consumers database state continutes to be in sync. Yet, its csn set indicates that it isn't and it will always start out replicating all changes made after the oldest csn it holds.
Maybe. I think much of it will be a no-op because the intervening changes will prove to be irrelevant for that consumer.
Which can be quite a lot. The only way to fix this is to send the newCookie messages.