Le 03/02/15 10:54, Howard Chu a écrit :
Howard Chu wrote:
> Emmanuel Lécharny wrote:
>> Le 03/02/15 09:41, Howard Chu a écrit :
>>> Emmanuel Lécharny wrote:
>>>> Le 03/02/15 05:11, Howard Chu a écrit :
>>>>> Another option here is simply to perform batching. Now that we have
>>>>> the TXN api exposed in the backend interface, we could just batch up
>>>>> e.g. 500 entries per txn. much like slapadd -q already does.
>>>>> Ultimately we ought to be able to get syncrepl refresh to occur at
>>>>> nearly the same speed as slapadd -q.
>>>>
>>>> Batching is ok, except that you never know how many entries you'll
>>>> going
>>>> to have, thus you will have to actually write the data after a
>>>> period of
>>>> time, even if you don't have the 500 entries.
>>>
>>> This isn't a problem - we know exactly when refresh completes, so we
>>> can finish the batch regardless of how many entries are left over.
>>
>> True for Refresh. I was thinking more specifically of updates when we
>> are connected.
>
> None of this is for Persist phase, I have only been talking about
> refresh.
Thanks for the clarification.
>
>>> Testing this out with the experimental ITS#8040 patch - with lazy
>>> commit the 2.8M entries (2.5GB data) takes ~10 minutes for the refresh
>>> to pull them across. With batching 500 entries/txn+lazy commit it
>>> takes ~7 minutes, a decent improvement. It's still 2x slower than
>>> slapadd -q though, which loads the data in 3-1/2 minutes.
>>
>> Not bad at all. What makes it 2x slower, btw?
>
> Still looking into it. slapadd -q uses 2 threads, one to parse the LDIF
> and one to write to the DB. syncrepl consumer only uses 1 thread.
> Probably if we split reading from the network apart from writing to the
> DB, that would make the difference.
That would worth a try. Although I can expect the disk access to be the
bottleneck here, and using two threads migth swamp the memory, up to a
point. Intersting problem, intersting bechnhmark to conduct ;-)
Emmanuel.