I have set up delta-syncrepl between provider A and consumer B, and it seems to work ok. Online updates to A are synch'ed to B. However, about once a month, there is a large update which contains hundreds of million records. Online update is going to take days. I tried to bring A offline, do slapdadd, and bring A back online. But these new entries were not synch'ed to B. Is there a way I can make this work?
Thanks, Khoa
On Thursday, 25 March 2010 02:31:07 Khoa Nguyen wrote:
I have set up delta-syncrepl between provider A and consumer B, and it seems to work ok. Online updates to A are synch'ed to B. However, about once a month, there is a large update which contains hundreds of million records. Online update is going to take days. I tried to bring A offline, do slapdadd, and bring A back online. But these new entries were not synch'ed to B. Is there a way I can make this work?
Depending on what you've done to the data before slapadd'ing it, you could have messed up replication.
If you are going to do this, you should:
1)slapadd on A 2)slapcat on A 3)slapadd result of (2) on B
But, why you need to bulk-load hundreds of millions of entries is another question ...
Regards, Buchan
Khoa Nguyen wrote:
I have set up delta-syncrepl between provider A and consumer B, and it seems to work ok. Online updates to A are synch'ed to B. However, about once a month, there is a large update which contains hundreds of million records. Online update is going to take days. I tried to bring A offline, do slapdadd, and bring A back online. But these new entries were not synch'ed to B. Is there a way I can make this work?
Delta-syncrepl works by writing a log of all your main database changes into a log database. When you add entries using slapadd, nothing is added to the log database, therefore delta-sync cannot replicate those changes.
You can force a resync by emptying the log database. When a delta-sync consumer tries to connect and the log no longer contains a record of the consumer's last change, it will automatically fallback to regular syncrepl to resync.
Note that since you're talking about new entries, which need to be replicated in whole anyway, delta-syncrepl offers no benefit over regular syncrepl here.
Also, as Buchan pointed out, replicating hundreds of millions of records will take a long time. You're better off just slapadding on both the provider and the consumer.
Thank you all for your insights, which make me think slapadding may not be a good companion with delta-syncrepl. So I plan to do all the updates online since I managed to get close to 3000 updates per second on my single-disk server.
Now, my colleague doesn't agree with me on the delta-syncrepl approach, and prefers to update A and B independently. His argument is that with delta-syncrepl, B is dependent on A, so if A's databases (main + log) are corrupted, and we have to restore A to a previous checkpoint, B would automatically rollback, and we would lost the latest data. I still prefer delta-syncrepl approach, since if updated independently, A and B can be out-of-synch over time and we wouldn't know it.
I also looked at other replication modes (mirror, n-way master, etc.), but since we only have 2 servers to work with, and our openldap version is still at 2.3, our choices are limited.
Your advices and suggestions on what should be the best approach are appreciated.
Khoa
On Fri, Mar 26, 2010 at 3:48 PM, Howard Chu hyc@symas.com wrote:
Khoa Nguyen wrote:
I have set up delta-syncrepl between provider A and consumer B, and it seems to work ok. Online updates to A are synch'ed to B. However, about once a month, there is a large update which contains hundreds of million records. Online update is going to take days. I tried to bring A offline, do slapdadd, and bring A back online. But these new entries were not synch'ed to B. Is there a way I can make this work?
Delta-syncrepl works by writing a log of all your main database changes into a log database. When you add entries using slapadd, nothing is added to the log database, therefore delta-sync cannot replicate those changes.
You can force a resync by emptying the log database. When a delta-sync consumer tries to connect and the log no longer contains a record of the consumer's last change, it will automatically fallback to regular syncrepl to resync.
Note that since you're talking about new entries, which need to be replicated in whole anyway, delta-syncrepl offers no benefit over regular syncrepl here.
Also, as Buchan pointed out, replicating hundreds of millions of records will take a long time. You're better off just slapadding on both the provider and the consumer.
-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
--On Friday, March 26, 2010 8:43 PM -0400 Khoa Nguyen khoa.coffee@gmail.com wrote:
Now, my colleague doesn't agree with me on the delta-syncrepl approach, and prefers to update A and B independently. His argument is that with delta-syncrepl, B is dependent on A, so if A's databases (main + log) are corrupted, and we have to restore A to a previous checkpoint, B would automatically rollback, and we would lost the latest data. I still prefer delta-syncrepl approach, since if updated independently, A and B can be out-of-synch over time and we wouldn't know it.
If A corrupts, then you slapcat B and reload A with the data.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
openldap-software@openldap.org