LMDB ITS 8324 WriteMap performance on Windows

List overview All Threads
Download

newer

older

Knowing free mdb space

2.4.43 "make test" hanging in...

Victor Baybekov

30 Dec 2015 30 Dec '15

9:57 p.m.

Hi,

Thanks a lot for ITS#8324! For embedded, not server, use case that change adds much convenience.

I have tested the master from .NET via P/Invoke and do not see any major slowdown with default options. To insert 10M <int32,int32> pairs inside a single transaction v.0.9.14 takes minimum 3400 msec, latest master takes minimum 3750 msec. This is not scientific, just best result from 10 runs. Sometimes both timings increase to 5000+ msecs. On average slowdown is visible but tolerable - from 2.9 Mops to 2.6 Mops (absolute numbers are still awesome!). With Append and NoSync I could get 3.45 Mops on the same test with master build.

However, with WriteMap performance of master drops 3x to 10000 msec or just 1 Mops, while for the v.0.9.14 performance with WriteMap improves to 2350 msec or 4.25 Mops.

Is this the cost of convenience or it could be fixed so that WriteMap still "is faster and uses fewer mallocs" as the docs say?

Best regards, Victor

Attachments:

attachment.htm (text/html — 7.6 KB)

Show replies by date

Howard Chu

31 Dec 31 Dec

12:32 a.m.

Victor Baybekov wrote:

...

Hi,

Thanks a lot for ITS#8324! For embedded, not server, use case that change adds much convenience.

I have tested the master from .NET via P/Invoke and do not see any major slowdown with default options. To insert 10M <int32,int32> pairs inside a single transaction v.0.9.14 takes minimum 3400 msec, latest master takes minimum 3750 msec. This is not scientific, just best result from 10 runs. Sometimes both timings increase to 5000+ msecs. On average slowdown is visible but tolerable - from 2.9 Mops to 2.6 Mops (absolute numbers are still awesome!). With Append and NoSync I could get 3.45 Mops on the same test with master build.

However, with WriteMap performance of |master| drops 3x to 10000 msec or just 1 Mops, while for the |v.0.9.14| performance with WriteMap improvesto 2350 msec or 4.25 Mops.

Is this the cost of convenience or it could be fixed so that WriteMap still "is faster and uses fewer mallocs" as the docs say?

That's pretty much the cost of this patch, it has the biggest impact on WriteMap usage. In default mode, regular Writes are done to grow the file so the code path is basically unchanged from before. In WriteMap mode the file has to be grown explicitly, right before accessing a new page and apparently the VirtualAlloc call that does this is expensive. Since it's the equivalent of both a malloc and a write together, it's actually more expensive than the default mode.

Please followup to the ITS so this conversation stays with the ticket.

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Victor Baybekov

1:23 a.m.

Thanks for the explanation! The ultimate speed is needed for server usage where, e.g. now, I have several TB free space and could allocate the entire space at start. For clients, where I use LMDB for replicated cache of data subsets, even the 3x perf drop is kind of tolerable given the absolute numbers. Could we have a flag to use the patch optionally? So that on a server where I could allocate 1+Tb and use hacks with WriteMap as we discussed previously, I could enjoy the speedup of WriteMap and the fact that the pointer addresses from MDB_RESERVE remain fixed over the lifetime of a process?

Another question: with this patch, do addresses of pointers remain the same after the map file size increases, or we must follow the LMDB rules as if there is no WriteMap flag is used? Does Windows reserve the entire virtual memory space, and the patch just affects the file size on a disk, or file growth with the patch is equivalent to remapping and pointer addresses become invalid every time? I am referring to this discussion http://www.openldap.org/lists/openldap-technical/201510/msg00022.html we had before.

P.S. I know your hate for Windows and am starting to share it given rich tools available on *nix, but for small non-hardcore-programmers teams doing exploratory number crunching the benefits of RDP & GUIs etc. outweigh the benefits *nix gives for high load production systems, so we use Windows as an internal server. LMDB is just a perfect off-heap data structure for many use cases.

On Wed, Dec 30, 2015 at 11:32 PM, Howard Chu hyc@symas.com wrote:

...

Victor Baybekov wrote:

...
Hi,

Thanks a lot for ITS#8324! For embedded, not server, use case that change adds much convenience.

I have tested the master from .NET via P/Invoke and do not see any major slowdown with default options. To insert 10M <int32,int32> pairs inside a single transaction v.0.9.14 takes minimum 3400 msec, latest master takes minimum 3750 msec. This is not scientific, just best result from 10 runs. Sometimes both timings increase to 5000+ msecs. On average slowdown is visible but tolerable - from 2.9 Mops to 2.6 Mops (absolute numbers are still awesome!). With Append and NoSync I could get 3.45 Mops on the same test with master build.

However, with WriteMap performance of |master| drops 3x to 10000 msec or just 1 Mops, while for the |v.0.9.14| performance with WriteMap improvesto 2350 msec or 4.25 Mops.

Is this the cost of convenience or it could be fixed so that WriteMap still "is faster and uses fewer mallocs" as the docs say?

That's pretty much the cost of this patch, it has the biggest impact on WriteMap usage. In default mode, regular Writes are done to grow the file so the code path is basically unchanged from before. In WriteMap mode the file has to be grown explicitly, right before accessing a new page and apparently the VirtualAlloc call that does this is expensive. Since it's the equivalent of both a malloc and a write together, it's actually more expensive than the default mode.

Please followup to the ITS so this conversation stays with the ticket.

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Howard Chu

1:43 a.m.

Victor Baybekov wrote:

...

Thanks for the explanation! The ultimate speed is needed for server usage where, e.g. now, I have several TB free space and could allocate the entire space at start. For clients, where I use LMDB for replicated cache of data subsets, even the 3x perf drop is kind of tolerable given the absolute numbers. Could we have a flag to use the patch optionally? So that on a server where I could allocate 1+Tb and use hacks with WriteMap as we discussed previously, I could enjoy the speedup of WriteMap and the fact that the pointer addresses from MDB_RESERVE remain fixed over the lifetime of a process?

I am loathe to add more option flags. All of this violates the principle of simplicity which is core to LMDB.

...

Another question: with this patch, do addresses of pointers remain the same after the map file size increases, or we must follow the LMDB rules as if there is no WriteMap flag is used? Does Windows reserve the entire virtual memory space, and the patch just affects the file size on a disk, or file growth with the patch is equivalent to remapping and pointer addresses become invalid every time? I am referring to this discussion http://www.openldap.org/lists/openldap-technical/201510/msg00022.html we had before.

The entire virtual memory space is reserved, so there is no change in behavior here. The patch would have been pointless if it required such a change.

...

On Wed, Dec 30, 2015 at 11:32 PM, Howard Chu <hyc@symas.com mailto:hyc@symas.com> wrote:

Victor Baybekov wrote:

    Hi,

    Thanks a lot for ITS#8324! For embedded, not server, use case that
    change adds
    much convenience.

    I have tested the master from .NET via P/Invoke and do not see any major
    slowdown with default options. To insert 10M <int32,int32> pairs inside a
    single transaction v.0.9.14 takes minimum 3400 msec, latest master takes
    minimum 3750 msec. This is not scientific, just best result from 10 runs.
    Sometimes both timings increase to 5000+ msecs. On average slowdown is
    visible
    but tolerable - from 2.9 Mops to 2.6 Mops (absolute numbers are still
    awesome!). With Append and NoSync I could get 3.45 Mops on the same
    test with
    master build.

    However, with WriteMap performance of |master| drops 3x to 10000 msec
    or just
    1 Mops, while for the |v.0.9.14| performance with WriteMap improvesto 2350
    msec or 4.25 Mops.

    Is this the cost of convenience or it could be fixed so that WriteMap
    still "is faster and uses fewer mallocs" as the docs say?


That's pretty much the cost of this patch, it has the biggest impact on
WriteMap usage. In default mode, regular Writes are done to grow the file
so the code path is basically unchanged from before. In WriteMap mode the
file has to be grown explicitly, right before accessing a new page and
apparently the VirtualAlloc call that does this is expensive. Since it's
the equivalent of both a malloc and a write together, it's actually more
expensive than the default mode.

Please followup to the ITS so this conversation stays with the ticket.

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Victor Baybekov

2:22 a.m.

Thanks, Howard! The last point is awesome so that I do not have to change logic for different use cases. I appreciate the simplicity of LMDB a lot, and given its performance I will switch to Linux only if we become really IO bound so that it becomes a bottleneck. Now our other parts are yet slower.

Happy holidays & New Year!

On Thu, Dec 31, 2015 at 12:43 AM, Howard Chu hyc@symas.com wrote:

...

Victor Baybekov wrote:

...
Thanks for the explanation! The ultimate speed is needed for server usage where, e.g. now, I have several TB free space and could allocate the entire space at start. For clients, where I use LMDB for replicated cache of data subsets, even the 3x perf drop is kind of tolerable given the absolute numbers. Could we have a flag to use the patch optionally? So that on a server where I could allocate 1+Tb and use hacks with WriteMap as we discussed previously, I could enjoy the speedup of WriteMap and the fact that the pointer addresses from MDB_RESERVE remain fixed over the lifetime of a process?

I am loathe to add more option flags. All of this violates the principle of simplicity which is core to LMDB.

Another question: with this patch, do addresses of pointers remain the same

...
after the map file size increases, or we must follow the LMDB rules as if there is no WriteMap flag is used? Does Windows reserve the entire virtual memory space, and the patch just affects the file size on a disk, or file growth with the patch is equivalent to remapping and pointer addresses become invalid every time? I am referring to this discussion http://www.openldap.org/lists/openldap-technical/201510/msg00022.html we had before.

The entire virtual memory space is reserved, so there is no change in behavior here. The patch would have been pointless if it required such a change.

P.S. I know your hate for Windows and am starting to share it given rich

...
tools available on *nix, but for small non-hardcore-programmers teams doing exploratory number crunching the benefits of RDP & GUIs etc. outweigh the benefits *nix gives for high load production systems, so we use Windows as an internal server. LMDB is just a perfect off-heap data structure for many use cases.

On Wed, Dec 30, 2015 at 11:32 PM, Howard Chu <hyc@symas.com mailto:hyc@symas.com> wrote:
Victor Baybekov wrote:

    Hi,

    Thanks a lot for ITS#8324! For embedded, not server, use case that
    change adds
    much convenience.

    I have tested the master from .NET via P/Invoke and do not see
any major slowdown with default options. To insert 10M <int32,int32> pairs inside a single transaction v.0.9.14 takes minimum 3400 msec, latest master takes minimum 3750 msec. This is not scientific, just best result from 10 runs. Sometimes both timings increase to 5000+ msecs. On average slowdown is visible but tolerable - from 2.9 Mops to 2.6 Mops (absolute numbers are still awesome!). With Append and NoSync I could get 3.45 Mops on the same test with master build.
    However, with WriteMap performance of |master| drops 3x to 10000
msec or just 1 Mops, while for the |v.0.9.14| performance with WriteMap improvesto 2350 msec or 4.25 Mops.
    Is this the cost of convenience or it could be fixed so that
WriteMap still "is faster and uses fewer mallocs" as the docs say?
That's pretty much the cost of this patch, it has the biggest impact
on WriteMap usage. In default mode, regular Writes are done to grow the file so the code path is basically unchanged from before. In WriteMap mode the file has to be grown explicitly, right before accessing a new page and apparently the VirtualAlloc call that does this is expensive. Since it's the equivalent of both a malloc and a write together, it's actually more expensive than the default mode.
Please followup to the ITS so this conversation stays with the ticket.
-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Viacheslav Usov

5 Jan 5 Jan

7:13 p.m.

On Wed, Dec 30, 2015 at 9:57 PM, Victor Baybekov vbaybekov@gmail.com wrote:

...

However, with WriteMap performance of master drops 3x to 10000 msec or

just 1 Mops

It was not entirely clear to me what options you used, so I ran the following four tests against the master branch on my test machine:

1. (baseline) 8 seconds for 10M int:int puts in one txn, 6 seconds for txn commit. 2. NOSYNC: 8, 0 3. WRITEMAP: 10, 5 4. WRITEMAP | NOSYNC: 10, 0

I observed that VirtualAlloc was always being called for 1-page allocations.

I theorized that calling VirtualAlloc for single-page allocations could have been a bottleneck, and I modified the code so that chunks of 2048 pages were allocated instead. With that, I got the following results:

5. (mod) WRITEMAP: 6, 2 6. (mod) WRITEMAP | NOSYNC 6 0

Chunking less than 2048 pages had worse performance; chunking more had no observable improvement.

The mod: replace p = VirtualAlloc... with

static char* limit = 0; char *end = (char*)p + env->me_psize * num; if (end > limit) { MEMORY_BASIC_INFORMATION mi = { 0 }; VirtualQuery(p, &mi, sizeof mi); if (mi.State == MEM_RESERVE) limit = p; else limit = (char*)mi.BaseAddress + mi.RegionSize; }

if (end > limit) { int adj = (end - limit) / env->me_psize; if (adj < 2048) adj = 2048;

p = VirtualAlloc(limit, env->me_psize * adj, MEM_COMMIT, (env->me_flags & MDB_WRITEMAP) ? PAGE_READWRITE : PAGE_READONLY); }

(end)

The mod is ugly in the way it determines and keeps track of the current allocation limit; this is mostly because of my ignorance of LMDB's internals. I am sure it can be made more elegant and not dependent on VirtualQuery (which is VERY slow). Also, my code does not correctly handle the corner case when the requested chunk would extend past the map size.

Cheers, V.

3467

Age (days ago)

3473

Last active (days ago)

openldap-technical@openldap.org

5 comments

3 participants

tags (0)

participants (3)

Howard Chu
Viacheslav Usov
Victor Baybekov