Howard Chu wrote:
Emmanuel Lécharny wrote:
Le 03/02/15 09:41, Howard Chu a écrit :
Emmanuel Lécharny wrote:
Le 03/02/15 05:11, Howard Chu a écrit :
Another option here is simply to perform batching. Now that we have the TXN api exposed in the backend interface, we could just batch up e.g. 500 entries per txn. much like slapadd -q already does. Ultimately we ought to be able to get syncrepl refresh to occur at nearly the same speed as slapadd -q.
Batching is ok, except that you never know how many entries you'll going to have, thus you will have to actually write the data after a period of time, even if you don't have the 500 entries.
This isn't a problem - we know exactly when refresh completes, so we can finish the batch regardless of how many entries are left over.
True for Refresh. I was thinking more specifically of updates when we are connected.
None of this is for Persist phase, I have only been talking about refresh.
Testing this out with the experimental ITS#8040 patch - with lazy commit the 2.8M entries (2.5GB data) takes ~10 minutes for the refresh to pull them across. With batching 500 entries/txn+lazy commit it takes ~7 minutes, a decent improvement. It's still 2x slower than slapadd -q though, which loads the data in 3-1/2 minutes.
Not bad at all. What makes it 2x slower, btw?
Still looking into it. slapadd -q uses 2 threads, one to parse the LDIF and one to write to the DB. syncrepl consumer only uses 1 thread. Probably if we split reading from the network apart from writing to the DB, that would make the difference.
Le 03/02/15 10:54, Howard Chu a écrit :
Howard Chu wrote:
Emmanuel Lécharny wrote:
Le 03/02/15 09:41, Howard Chu a écrit :
Emmanuel Lécharny wrote:
Le 03/02/15 05:11, Howard Chu a écrit :
Another option here is simply to perform batching. Now that we have the TXN api exposed in the backend interface, we could just batch up e.g. 500 entries per txn. much like slapadd -q already does. Ultimately we ought to be able to get syncrepl refresh to occur at nearly the same speed as slapadd -q.
Batching is ok, except that you never know how many entries you'll going to have, thus you will have to actually write the data after a period of time, even if you don't have the 500 entries.
This isn't a problem - we know exactly when refresh completes, so we can finish the batch regardless of how many entries are left over.
True for Refresh. I was thinking more specifically of updates when we are connected.
None of this is for Persist phase, I have only been talking about refresh.
Thanks for the clarification.
Testing this out with the experimental ITS#8040 patch - with lazy commit the 2.8M entries (2.5GB data) takes ~10 minutes for the refresh to pull them across. With batching 500 entries/txn+lazy commit it takes ~7 minutes, a decent improvement. It's still 2x slower than slapadd -q though, which loads the data in 3-1/2 minutes.
Not bad at all. What makes it 2x slower, btw?
Still looking into it. slapadd -q uses 2 threads, one to parse the LDIF and one to write to the DB. syncrepl consumer only uses 1 thread. Probably if we split reading from the network apart from writing to the DB, that would make the difference.
That would worth a try. Although I can expect the disk access to be the bottleneck here, and using two threads migth swamp the memory, up to a point. Intersting problem, intersting bechnhmark to conduct ;-)
Emmanuel.