The remaining errors and race condition that test058 demonstrates cannot be solved unless syncrepl is changed to always store the contextCSN in the suffix of the database where it is configured, not the suffix of its glue database as it does today.
Assuming serverID 0 is reserved for the single master case, syncrepl and syncprov can in that case only be configured within the same database context if syncprov is a pure forwarding server I.e, it will not update any CSN value and syncrepl have no need to fetch any values from it.
In the multi-master case it is only the contextCSN whose SID matches the current serverID that syncprov maintains, the other are all received by syncrepl. So, the only time syncrepl should need an updated CSN from syncprov is when it is about to present it to its peer, i.e when it initiates a refresh phase. Actually, a race condition that would render the state of the database undetermined could occur if syncrepl fetches an updated CSN from syncprov during the initial refresh phase. So, it should be sufficient to read the contextCSN values from the database before a new refresh phase is initiated, independent of whether syncprov is in use or not.
Syncrepl will receive updates to the contextCSN value with its own SID from its peers, at least with ITS#5972 and ITS#5973 in place. I.e, the normal ignoring of updates tagged with a too old contextCSN value will continue to work. It should also be safe to ignore all updates tagged with a contextCSN or entryCSN value whose SID is the current servers non-zero serverID, provided a complete refresh cycle is known to have taken place. I.e, when a contextCSN value with the current non-zero serverID was read from the database before the refresh phase started, or after the persistent phase have been entered.
The state of the database will be undetermined unless an initial refresh (i.e starting from an empty database or CSN set) have been run to completion. I cannot see how this can be avoided, and as far as I know it is so now too. It might be worth mentioning in the doc. though (unless it already is).
Syncprov must continue to monitor the contextCSN updates from syncrepl. When it receives updates destined for the suffix of the database it itself is configured it must replace any CSN value whose SID matches its own non-zero serverID with the value it manages itself (which should be greater or equal to the value syncrepl tried to store unless something is seriously wrong). Updates to "foreign" contextCSN values (i.e those with a SID not matching the current non-zero serverID) should be imported into the set of contextCSN values syncprov itself maintain. Syncprov could also short-circuit the contextCSN update and delay it to its own checkpoint. I'm not sure what effect the checkpoint feature have today when syncrepl constantly updates the contextCSN..
Syncprov must, when syncrepl updates the contextCSN in the suffix of a subordinate DB, update its own knowledge of the "foreign" CSNs to be the *lowest* CSN with any given SID stored in all the subordinate DBs (where syncrepl is configured). And no update must take place unless a contextCSN value have been stored in *all* the syncrepl-enabled subordinate DBs. Any values matching the current non-zero serverID should be updated in this case too, but a new value should probably not be inserted.
These changes should (unless I'm completely lost that is..) create a cleaner interface between syncrepl and syncprov without harming the current multi-master configurations, and make asymmetric multiple masters configurations like the one in test058 work. Comments please?
Rein
Rein Tollevik wrote:
The remaining errors and race condition that test058 demonstrates cannot be solved unless syncrepl is changed to always store the contextCSN in the suffix of the database where it is configured, not the suffix of its glue database as it does today.
Assuming serverID 0 is reserved for the single master case, syncrepl and syncprov can in that case only be configured within the same database context if syncprov is a pure forwarding server I.e, it will not update any CSN value and syncrepl have no need to fetch any values from it.
In the multi-master case it is only the contextCSN whose SID matches the current serverID that syncprov maintains, the other are all received by syncrepl. So, the only time syncrepl should need an updated CSN from syncprov is when it is about to present it to its peer, i.e when it initiates a refresh phase. Actually, a race condition that would render the state of the database undetermined could occur if syncrepl fetches an updated CSN from syncprov during the initial refresh phase. So, it should be sufficient to read the contextCSN values from the database before a new refresh phase is initiated, independent of whether syncprov is in use or not.
Syncrepl will receive updates to the contextCSN value with its own SID from its peers, at least with ITS#5972 and ITS#5973 in place. I.e, the normal ignoring of updates tagged with a too old contextCSN value will continue to work. It should also be safe to ignore all updates tagged with a contextCSN or entryCSN value whose SID is the current servers non-zero serverID, provided a complete refresh cycle is known to have taken place. I.e, when a contextCSN value with the current non-zero serverID was read from the database before the refresh phase started, or after the persistent phase have been entered.
The state of the database will be undetermined unless an initial refresh (i.e starting from an empty database or CSN set) have been run to completion. I cannot see how this can be avoided, and as far as I know it is so now too. It might be worth mentioning in the doc. though (unless it already is).
Syncprov must continue to monitor the contextCSN updates from syncrepl. When it receives updates destined for the suffix of the database it itself is configured it must replace any CSN value whose SID matches its own non-zero serverID with the value it manages itself (which should be greater or equal to the value syncrepl tried to store unless something is seriously wrong).
Syncrepl should never be propagating contextCSN updates whose SID matches the current serverID. By definition, only the current server should ever be generating changes with the current serverID.
Updates to "foreign" contextCSN values (i.e those with a SID not matching the current non-zero serverID) should be imported into the set of contextCSN values syncprov itself maintain. Syncprov could also short-circuit the contextCSN update and delay it to its own checkpoint. I'm not sure what effect the checkpoint feature have today when syncrepl constantly updates the contextCSN..
The checkpoint probably only made sense for single-master.
Syncprov must, when syncrepl updates the contextCSN in the suffix of a subordinate DB, update its own knowledge of the "foreign" CSNs to be the *lowest* CSN with any given SID stored in all the subordinate DBs (where syncrepl is configured). And no update must take place unless a contextCSN value have been stored in *all* the syncrepl-enabled subordinate DBs. Any values matching the current non-zero serverID should be updated in this case too, but a new value should probably not be inserted.
Every source of updates to a DB must use its own unique SID. There should not be a lowest/highest foreign CSN to choose; there should only be one per SID. And as already noted, no syncrepl should ever be sending in a contextCSN update for the current serverID, those can only come from clients directly writing the local DB.
These changes should (unless I'm completely lost that is..) create a cleaner interface between syncrepl and syncprov without harming the current multi-master configurations, and make asymmetric multiple masters configurations like the one in test058 work. Comments please?
Rein
Howard Chu wrote:
Rein Tollevik wrote:
Syncrepl should never be propagating contextCSN updates whose SID matches the current serverID. By definition, only the current server should ever be generating changes with the current serverID.
Syncrepl is updating all csn values, including those with its own sid. Syncprov must correct those values in the very likely case that syncrepl's provider isn't up to date with the local servers changes. Or the csn with the current servers sid could be lowered, which is a really bad thing!
Updates to "foreign" contextCSN values (i.e those with a SID not matching the current non-zero serverID) should be imported into the set of contextCSN values syncprov itself maintain. Syncprov could also short-circuit the contextCSN update and delay it to its own checkpoint. I'm not sure what effect the checkpoint feature have today when syncrepl constantly updates the contextCSN..
The checkpoint probably only made sense for single-master.
OK, any point in making it work again? It shouldn't be that hard..
Syncprov must, when syncrepl updates the contextCSN in the suffix of a subordinate DB, update its own knowledge of the "foreign" CSNs to be the *lowest* CSN with any given SID stored in all the subordinate DBs (where syncrepl is configured). And no update must take place unless a contextCSN value have been stored in *all* the syncrepl-enabled subordinate DBs. Any values matching the current non-zero serverID should be updated in this case too, but a new value should probably not be inserted.
Every source of updates to a DB must use its own unique SID. There should not be a lowest/highest foreign CSN to choose; there should only be one per SID. And as already noted, no syncrepl should ever be sending in a contextCSN update for the current serverID, those can only come from clients directly writing the local DB.
All updates takes place on a remote server, with its unique sid. The problem with this configuration is that there may more than one syncrepl instance, each in its own subordinate db, replicating from that same remote provider. Some of these databases may be in sync, other not, implying that their csn values must not be mixed. Syncprov, sitting on the glue database and maintaining the joint set of databases, must not advertise that it is in sync when one of its subordinates isn't. I.e, it must choose the lowest foreign csn (for any given sid) stored in all its subordinate databases.
Note that, for ordinary MMR, syncrepl and syncprov must be configured in the same database, meaning that this case is not valid there.
Rein
Rein Tollevik wrote:
Howard Chu wrote:
Rein Tollevik wrote:
Syncrepl should never be propagating contextCSN updates whose SID matches the current serverID. By definition, only the current server should ever be generating changes with the current serverID.
Syncrepl is updating all csn values, including those with its own sid. Syncprov must correct those values in the very likely case that syncrepl's provider isn't up to date with the local servers changes. Or the csn with the current servers sid could be lowered, which is a really bad thing!
Updates to "foreign" contextCSN values (i.e those with a SID not matching the current non-zero serverID) should be imported into the set of contextCSN values syncprov itself maintain. Syncprov could also short-circuit the contextCSN update and delay it to its own checkpoint. I'm not sure what effect the checkpoint feature have today when syncrepl constantly updates the contextCSN..
The checkpoint probably only made sense for single-master.
OK, any point in making it work again? It shouldn't be that hard..
Perhaps. Most likely we should move si_cookieState into the BackendDB structure, and let both syncrepl and syncprov update it directly instead of each having their private copies.
All updates takes place on a remote server, with its unique sid. The problem with this configuration is that there may more than one syncrepl instance, each in its own subordinate db, replicating from that same remote provider. Some of these databases may be in sync, other not, implying that their csn values must not be mixed. Syncprov, sitting on the glue database and maintaining the joint set of databases, must not advertise that it is in sync when one of its subordinates isn't. I.e, it must choose the lowest foreign csn (for any given sid) stored in all its subordinate databases.
OK.
Note that, for ordinary MMR, syncrepl and syncprov must be configured in the same database, meaning that this case is not valid there.
Right.
Rein Tollevik wrote:
Howard Chu wrote:
Rein Tollevik wrote:
Syncrepl should never be propagating contextCSN updates whose SID matches the current serverID. By definition, only the current server should ever be generating changes with the current serverID.
Syncrepl is updating all csn values, including those with its own sid. Syncprov must correct those values in the very likely case that syncrepl's provider isn't up to date with the local servers changes. Or the csn with the current servers sid could be lowered, which is a really bad thing!
The obvious thing to do then is make syncprov ignore CSNs from syncrepl which match the local sid. (Assuming the sid is valid in the first place.)
Syncprov must, when syncrepl updates the contextCSN in the suffix of a subordinate DB, update its own knowledge of the "foreign" CSNs to be the *lowest* CSN with any given SID stored in all the subordinate DBs (where syncrepl is configured). And no update must take place unless a contextCSN value have been stored in *all* the syncrepl-enabled subordinate DBs. Any values matching the current non-zero serverID should be updated in this case too, but a new value should probably not be inserted.
Every source of updates to a DB must use its own unique SID. There should not be a lowest/highest foreign CSN to choose; there should only be one per SID. And as already noted, no syncrepl should ever be sending in a contextCSN update for the current serverID, those can only come from clients directly writing the local DB.
All updates takes place on a remote server, with its unique sid. The problem with this configuration is that there may more than one syncrepl instance, each in its own subordinate db, replicating from that same remote provider. Some of these databases may be in sync, other not, implying that their csn values must not be mixed. Syncprov, sitting on the glue database and maintaining the joint set of databases, must not advertise that it is in sync when one of its subordinates isn't. I.e, it must choose the lowest foreign csn (for any given sid) stored in all its subordinate databases.
I suppose so, but that means you will get redundant updates across subtrees that were already up to date.
Note that, for ordinary MMR, syncrepl and syncprov must be configured in the same database, meaning that this case is not valid there.