Re: (ITS#9017) Improving performance of commit sync in Windows

30 Apr 2019


      kriszyp@gmail.com wrote:
...
Full_Name: Kristopher William Zyp
Version: LMDB 0.9.23
OS: Windows
URL: https://github.com/kriszyp/node-lmdb/commit/7ff525ae57684a163d32af74a0ab9332...
Submission from: (NULL) (71.199.6.148)
We have seen very poor performance on the sync of commits on large databases in
Windows. On databases with 2GB of data, in writemap mode, the sync of even small
commits is consistently well over 100ms (without writemap it is faster, but
still slow). It is expected that a sync should take some time while waiting for
disk confirmation of the writes, but more concerning is that these sync
operations (in writemap mode) are instead dominated by nearly 100% system CPU
utilization, so operations that requires sub-millisecond b-tree update
operations are then dominated by very large amounts of system CPU cycles during
the sync phase.
I think that the fundamental problem is that FlushViewOfFile seems to be an O(n)
operation where n is the size of the file (or map). I presume that Windows is
scanning the entire map/file for dirty pages to flush, I'm guessing because it
doesn't have an internal index of all the dirty pages for every file/map-view in
the OS disk cache. Therefore, the turns into an extremely expensive, CPU-bound
operation to find the dirty pages for (large file) and initiate their writes,
which, of course, is contrary to the whole goal of a scalable database system.
And FlushFileBuffers is also relatively slow as well. We have attempted to batch
as many operations into single transaction as possible, but this is still a very
large overhead.
The Windows docs for FlushFileBuffers itself warns about the inefficiencies of
this function (https://docs.microsoft.com/en-us/windows/desktop/api/fileapi/nf-fileapi-flus...).
Which also points to the solution: it is much faster to write out the dirty
pages with WriteFile through a sync file handle (FILE_FLAG_WRITE_THROUGH).
The associated patch
(https://github.com/kriszyp/node-lmdb/commit/7ff525ae57684a163d32af74a0ab9332...)
is my attempt at implementing this solution, for Windows. Fortunately, with the
design of LMDB, this is relatively straightforward. LMDB already supports
writing out dirty pages with WriteFile calls. I added a write-through handle for
sending these writes directly to disk. I then made that file-handle
overlapped/asynchronously, so all the writes for a commit could be started in
overlap mode, and (at least theoretically) transfer in parallel to the drive and
then used GetOverlappedResult to wait for the completion. So basically
mdb_page_flush becomes the sync. I extended the writing of dirty pages through
WriteFile to writemap mode as well (for writing meta too), so that WriteFile
with write-through can be used to flush the data without ever needing to call
FlushViewOfFile or FlushFileBuffers. I also implemented support for write
gathering in writemap mode where contiguous file positions infers contiguous
memory (by tracking the starting position with wdp and writing contiguous pages
in single operations). Sorting of the dirty list is maintained even in writemap
mode for this purpose.
What is the point of using writemap mode if you still need to use WriteFile
on every individual page?
...
The performance benefits of this patch, in my testing, are considerable. Writing
out/syncing transactions is typically over 5x faster in writemap mode, and 2x
faster in standard mode. And perhaps more importantly (especially in environment
with many threads/processes), the efficiency benefits are even larger,
particularly in writemap mode, where there can be a 50-100x reduction in the
system CPU usage by using this patch. This brings windows performance with
sync'ed transactions in LMDB back into the range of "lightning" performance :).
What is the performance difference between your patch using writemap, and just
not using writemap in the first place?
-- 
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: (ITS#9017) Improving performance of commit sync in Windows