On Wed, Dec 30, 2015 at 9:57 PM, Victor Baybekov <vbaybekov@gmail.com> wrote:

> However, with WriteMap performance of master drops 3x to 10000 msec or just 1 Mops

It was not entirely clear to me what options you used, so I ran the following four tests against the master branch on my test machine: 

1. (baseline) 8 seconds for 10M int:int puts in one txn, 6 seconds for txn commit.
2. NOSYNC: 8, 0
3. WRITEMAP: 10, 5
4. WRITEMAP | NOSYNC: 10, 0

I observed that VirtualAlloc was always being called for 1-page allocations.

I theorized that calling VirtualAlloc for single-page allocations could have been a bottleneck, and I modified the code so that chunks of 2048 pages were allocated instead. With that, I got the following results:

5. (mod) WRITEMAP: 6, 2
6. (mod) WRITEMAP | NOSYNC 6 0

Chunking less than 2048 pages had worse performance; chunking more had no observable improvement.

The mod: replace p = VirtualAlloc... with

        static char* limit = 0;
        char *end = (char*)p + env->me_psize * num;
        if (end > limit)
        {
            MEMORY_BASIC_INFORMATION mi = { 0 };
            VirtualQuery(p, &mi, sizeof mi);
            if (mi.State == MEM_RESERVE)
                limit = p;
            else
                limit = (char*)mi.BaseAddress + mi.RegionSize;
        }

        if (end > limit)
        {
            int adj = (end - limit) / env->me_psize;
            if (adj < 2048)
                adj = 2048;

            p = VirtualAlloc(limit, env->me_psize * adj, MEM_COMMIT,
                (env->me_flags & MDB_WRITEMAP) ? PAGE_READWRITE :
                PAGE_READONLY);
        }

(end)

The mod is ugly in the way it determines and keeps track of the current allocation limit; this is mostly because of my ignorance of LMDB's internals. I am sure it can be made more elegant and not dependent on VirtualQuery (which is VERY slow). Also, my code does not correctly handle the corner case when the requested chunk would extend past the map size.

Cheers,
V.