hyc(a)symas.com wrote on 2019-06-17 05:25:
> grobins(a)pulsesecure.net wrote:
>> I am seeing LMDB crash, please find the stack trace.
>>
>> #0 0x009e362e in mdb_cursor_put (mc=0xffd59bb8, key=0xffd59d14,
>> data=0xffd59d0c, flags=0) at mdb.c:6688
>> #1 0x009e48ec in mdb_put (txn=0xd74d3008, dbi=2, key=0xffd59d14,
>> data=0xffd59d0c, flags=0) at mdb.c:8771
>> …
>>
>> Anybody has seen similar issue before?
> Doesn't sound familiar. But 0.9.18 is quite old. If you can reproduce this
> issue in 0.9.23 then we'll take a look. Test code to reproduce the problem
> would also be needed.
I'm seeing this in Firefox, which uses LMDB 0.9.23 (with minor changes).
Line 6688 in 0.9.18 occurs at line 6938 in 0.9.23 (line 6937 in Firefox
since we landed ITS#9030), and that's where we see crashes.
I haven't reported it here yet because I haven't been able to confirm
that it's a bug in LMDB as opposed to my own code. In fact I haven't
been able to reproduce it at all, I've only seen it in crash reports
submitted by Firefox installations (almost exclusively on Windows). So I
don't have test code to reproduce the problem.
Nevertheless, FWIW, here's the Firefox bug that tracks the issue:
https://bugzilla.mozilla.org/show_bug.cgi?id=1538541. And here are its
crash reports:
https://crash-stats.mozilla.org/signature/?signature=mdb_cursor_put
(only the last seven days of reports shown by default, but this has been
happening since we started using LMDB in Firefox nightly builds a couple
of months ago).
I've examined some of the dumps, and mc->mc_top is 0 when the crash
occurs, while mc->mc_pg[0] is a NULL pointer. So presumably the crash
occurs because IS_LEAF2 tries to dereference mc->mc_pg[mc->mc_top].
Further investigation shows that insert_data and insert_key are both
MDB_NOTFOUND, and flags is 0, so it isn't MDB_CURRENT, nor does it
contain MDB_APPEND. If I understand the code in mdb_cursor_put
correctly, this means that mdb_cursor_set was called on line 6614.
And mdb_cursor_set is in the stack of another crash I've been
investigating in mdb_page_search_root
(https://bugzilla.mozilla.org/show_bug.cgi?id=1550174,
https://crash-stats.mozilla.org/signature/?signature=mdb_page_search_root),
which happens on all of Firefox's primary platforms (Windows, macOS, Linux).
But I haven't been able to reproduce that one either, on any of those
platforms, so I have no idea if they're related (nor if mdb_cursor_set
is even implicated in this crash). And I still can't say that either is
an LMDB bug.
-myk
--On Monday, June 17, 2019 7:16 PM +0200 Armin T=C3=BCting=20
<Armin.Tueting(a)tueting-online.com> wrote:
>> I.e., it started and then got as far as reading your ldap.conf file.
>> What is the contents of ldap.conf?
> Attached 'ldap.conf'. Nothing unusual...
>
>> Have you run the test suite (make test)? Does it pass? fail?
> Attached 'make_test.txt'. As far as I can see - it has been passed.
Ok, so make test passes without issue, so it would appear there's something =
specific with your configuration that is triggering the problem. Would you =
be able to provide your slapd configuration (minus any passwords and the=20
like)?
Additionally, if you could get a full gdb backtrace of the hung slapd=20
process that would be useful as well. I.e.:
start up slapd
gdb /path/to/slapd <pid #>
at the gdb prompt:
thr apply all bt full
Thanks!
--Quanah
--
Quanah Gibson-Mount
Product Architect
Symas Corporation
Packaged, certified, and supported LDAP solutions powered by OpenLDAP:
<http://www.symas.com>
--000000000000db50ce058b89ac2f
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Is this still being considered/reviewed? Let me know if there are any other
changes you would like me to make. This patch has continued to yield
significant and reliable performance improvements for us, and seems like it
would be nice for this to be available for other Windows users.
On Fri, May 3, 2019 at 3:52 PM Kris Zyp <kriszyp(a)gmail.com> wrote:
> For the sake of putting this in the email thread (other code discussion i=
n
> GitHub), here is the latest squashed commit of the proposed patch (with t=
he
> on-demand, retained overlapped array to reduce re-malloc and opening even=
t
> handles):
> https://github.com/kriszyp/node-lmdb/commit/726a9156662c703bf3d453aab75ee=
222072b990f
>
>
>
> Thanks,
> Kris
>
>
>
> *From: *Kris Zyp <kriszyp(a)gmail.com>
> *Sent: *April 30, 2019 12:43 PM
> *To: *Howard Chu <hyc(a)symas.com>; openldap-its(a)OpenLDAP.org
> *Subject: *RE: (ITS#9017) Improving performance of commit sync in Windows
>
>
>
> > What is the point of using writemap mode if you still need to use
> WriteFile
>
> > on every individual page?
>
>
>
> As I understood from the documentation, and have observed, using writemap
> mode is faster (and uses less temporary memory) because it doesn=E2=80=99=
t require
> mallocs to allocate pages (docs: =E2=80=9CThis is faster and uses fewer m=
allocs=E2=80=9D).
> To be clear though, LMDB is so incredibly fast and efficient, that in
> sync-mode, it takes enormous transactions before the time spent allocatin=
g
> and creating the dirty pages with the updated b-tree is anywhere even
> remotely close to the time it takes to wait for disk flushing, even with =
an
> SSD. But the more pertinent question is efficiency, and measuring CPU
> cycles rather than time spent (efficiency is more important than just tim=
e
> spent). When I ran my tests this morning of 100 (sync) transactions with
> 100 puts per transaction, times varied quite a bit, but it seemed like
> running with writemap enabled typically averages about 500ms of CPU and
> with writemap disabled it typically averages around 600ms. Not a huge
> difference, but still definitely worthwhile, I think.
>
>
>
> Caveat emptor: Measuring LMDB performance with sync interactions on
> Windows is one of the most frustratingly erratic things to measure. It is
> sunny outside right now, times could be different when it starts raining
> later, but, this is what I saw this morning...
>
>
>
> > What is the performance difference between your patch using writemap,
> and just
>
> > not using writemap in the first place?
>
>
>
> Running 1000 sync transactions on 3GB db with a single put per
> transaction, without writemap map, without the patch took about 60 second=
s.
> And it took about 1 second with the patch with writemap mode enabled!
> (there is no significant difference in sync times with writemap enabled o=
r
> disabled with the patch.) So the difference was huge in my test. And not
> only that, without the patch, the CPU usage was actually _*higher*_
> during that 60 seconds (close to 100% of a core) than during the executio=
n
> with the patch for one second (close to 50%). Anyway, there are certainl=
y
> tests I have run where the differences are not as large (doing small
> commits on large dbs accentuates the differences), but the patch always
> seems to win. It could also be that my particular configuration causes
> bigger differences (on an SSD drive, and maybe a more fragmented file?).
>
>
>
> Anyway, I added error handling for the malloc, and fixed/changed the othe=
r
> things you suggested. Be happy to make any other changes you want. The
> updated patch is here:
>
>
> https://github.com/kriszyp/node-lmdb/commit/25366dea9453749cf6637f43ec17b=
9b62094acde
>
>
>
> > OVERLAPPED* ov =3D malloc((pagecount - keep) * sizeof(OVERLAPPED));
>
> > Probably this ought to just be pre-allocated based on the maximum
> number of dirty pages a txn allows.
>
>
>
> I wasn=E2=80=99t sure I understood this comment. Are you suggesting we ma=
lloc(MDB_IDL_UM_MAX
> * sizeof(OVERLAPPED)) for each environment, and retain it for the life of
> the environment? I think that is 4MB, if my math is right, which seems li=
ke
> a lot of memory to keep allocated (we usually have a lot of open
> environments). If the goal is to reduce the number of mallocs, how about =
we
> retain the OVERLAPPED array, and only free and re-malloc if the previous
> allocation wasn=E2=80=99t large enough? Then there isn=E2=80=99t unnecess=
ary allocation,
> and we only malloc when there is a bigger transaction than any previous. =
I
> put this together in a separate commit, as I wasn=E2=80=99t sure if this =
what you
> wanted (can squash if you prefer):
> https://github.com/kriszyp/node-lmdb/commit/2fe68fb5269c843e2e789746a17a4=
b2adefaac40
>
>
>
> Thank you for the review!
>
>
>
> Thanks,
> Kris
>
>
>
> *From: *Howard Chu <hyc(a)symas.com>
> *Sent: *April 30, 2019 7:12 AM
> *To: *kriszyp(a)gmail.com; openldap-its(a)OpenLDAP.org
> *Subject: *Re: (ITS#9017) Improving performance of commit sync in Windows
>
>
>
> kriszyp(a)gmail.com wrote:
>
> > Full_Name: Kristopher William Zyp
>
> > Version: LMDB 0.9.23
>
> > OS: Windows
>
> > URL:
> https://github.com/kriszyp/node-lmdb/commit/7ff525ae57684a163d32af74a0ab9=
332b7fc4ce9
>
> > Submission from: (NULL) (71.199.6.148)
>
> >
>
> >
>
> > We have seen very poor performance on the sync of commits on large
> databases in
>
> > Windows. On databases with 2GB of data, in writemap mode, the sync of
> even small
>
> > commits is consistently well over 100ms (without writemap it is faster,
> but
>
> > still slow). It is expected that a sync should take some time while
> waiting for
>
> > disk confirmation of the writes, but more concerning is that these sync
>
> > operations (in writemap mode) are instead dominated by nearly 100%
> system CPU
>
> > utilization, so operations that requires sub-millisecond b-tree update
>
> > operations are then dominated by very large amounts of system CPU cycle=
s
> during
>
> > the sync phase.
>
> >
>
> > I think that the fundamental problem is that FlushViewOfFile seems to b=
e
> an O(n)
>
> > operation where n is the size of the file (or map). I presume that
> Windows is
>
> > scanning the entire map/file for dirty pages to flush, I'm guessing
> because it
>
> > doesn't have an internal index of all the dirty pages for every
> file/map-view in
>
> > the OS disk cache. Therefore, the turns into an extremely expensive,
> CPU-bound
>
> > operation to find the dirty pages for (large file) and initiate their
> writes,
>
> > which, of course, is contrary to the whole goal of a scalable database
> system.
>
> > And FlushFileBuffers is also relatively slow as well. We have attempted
> to batch
>
> > as many operations into single transaction as possible, but this is
> still a very
>
> > large overhead.
>
> >
>
> > The Windows docs for FlushFileBuffers itself warns about the
> inefficiencies of
>
> > this function (
> https://docs.microsoft.com/en-us/windows/desktop/api/fileapi/nf-fileapi-f=
lushfilebuffers
> ).
>
> > Which also points to the solution: it is much faster to write out the
> dirty
>
> > pages with WriteFile through a sync file handle
> (FILE_FLAG_WRITE_THROUGH).
>
> >
>
> > The associated patch
>
> > (
> https://github.com/kriszyp/node-lmdb/commit/7ff525ae57684a163d32af74a0ab9=
332b7fc4ce9
> )
>
> > is my attempt at implementing this solution, for Windows. Fortunately,
> with the
>
> > design of LMDB, this is relatively straightforward. LMDB already suppor=
ts
>
> > writing out dirty pages with WriteFile calls. I added a write-through
> handle for
>
> > sending these writes directly to disk. I then made that file-handle
>
> > overlapped/asynchronously, so all the writes for a commit could be
> started in
>
> > overlap mode, and (at least theoretically) transfer in parallel to the
> drive and
>
> > then used GetOverlappedResult to wait for the completion. So basically
>
> > mdb_page_flush becomes the sync. I extended the writing of dirty pages
> through
>
> > WriteFile to writemap mode as well (for writing meta too), so that
> WriteFile
>
> > with write-through can be used to flush the data without ever needing t=
o
> call
>
> > FlushViewOfFile or FlushFileBuffers. I also implemented support for wri=
te
>
> > gathering in writemap mode where contiguous file positions infers
> contiguous
>
> > memory (by tracking the starting position with wdp and writing
> contiguous pages
>
> > in single operations). Sorting of the dirty list is maintained even in
> writemap
>
> > mode for this purpose.
>
>
>
> What is the point of using writemap mode if you still need to use WriteFi=
le
>
> on every individual page?
>
>
>
> > The performance benefits of this patch, in my testing, are considerable=
.
> Writing
>
> > out/syncing transactions is typically over 5x faster in writemap mode,
> and 2x
>
> > faster in standard mode. And perhaps more importantly (especially in
> environment
>
> > with many threads/processes), the efficiency benefits are even larger,
>
> > particularly in writemap mode, where there can be a 50-100x reduction i=
n
> the
>
> > system CPU usage by using this patch. This brings windows performance
> with
>
> > sync'ed transactions in LMDB back into the range of "lightning"
> performance :).
>
>
>
> What is the performance difference between your patch using writemap, and
> just
>
> not using writemap in the first place?
>
>
>
> --
>
> -- Howard Chu
>
> CTO, Symas Corp. http://www.symas.com
>
> Director, Highland Sun http://highlandsun.com/hyc/
>
> Chief Architect, OpenLDAP http://www.openldap.org/project/
>
>
>
>
>
--000000000000db50ce058b89ac2f
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr">Is this still being considered/reviewed? Let me know if th=
ere are any other changes you would like me to make. This patch has continu=
ed to yield significant and reliable performance improvements for us, and s=
eems like it would be nice for this to be available for other Windows users=
.</div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr"=
>On Fri, May 3, 2019 at 3:52 PM Kris Zyp <<a href=3D"mailto:kriszyp@gmai=
l.com">kriszyp(a)gmail.com</a>> wrote:<br></div><blockquote class=3D"gmail=
_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204=
,204);padding-left:1ex"><div lang=3D"EN-CA"><div class=3D"gmail-m_524243755=
9147988140WordSection1"><p class=3D"MsoNormal">For the sake of putting this=
in the email thread (other code discussion in GitHub), here is the latest =
squashed commit of the proposed patch (with the on-demand, retained overlap=
ped array to reduce re-malloc and opening event handles): <a href=3D"https:=
//github.com/kriszyp/node-lmdb/commit/726a9156662c703bf3d453aab75ee222072b9=
90f" target=3D"_blank">https://github.com/kriszyp/node-lmdb/commit/726a9156=
662c703bf3d453aab75ee222072b990f</a></p><p class=3D"MsoNormal"><u></u>=C2=
=A0<u></u></p><p class=3D"MsoNormal">Thanks,<br>Kris</p><p class=3D"MsoNorm=
al"><u></u>=C2=A0<u></u></p><div style=3D"border-right:none;border-bottom:n=
one;border-left:none;border-top:1pt solid rgb(225,225,225);padding:3pt 0cm =
0cm"><p class=3D"MsoNormal" style=3D"border:none;padding:0cm"><b>From: </b>=
<a href=3D"mailto:kriszyp@gmail.com" target=3D"_blank">Kris Zyp</a><br><b>S=
ent: </b>April 30, 2019 12:43 PM<br><b>To: </b><a href=3D"mailto:hyc@symas.=
com" target=3D"_blank">Howard Chu</a>; <a href=3D"mailto:openldap-its@OpenL=
DAP.org" target=3D"_blank">openldap-its(a)OpenLDAP.org</a><br><b>Subject: </b=
>RE: (ITS#9017) Improving performance of commit sync in Windows</p></div><p=
class=3D"MsoNormal"><u></u>=C2=A0<u></u></p><p class=3D"MsoNormal">> Wh=
at is the point of using writemap mode if you still need to use WriteFile<u=
></u><u></u></p><p class=3D"MsoNormal">> on every individual page?<u></u=
><u></u></p><p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p><p class=3D"MsoN=
ormal">As I understood from the documentation, and have observed, using wri=
temap mode is faster (and uses less temporary memory) because it doesn=E2=
=80=99t require mallocs to allocate pages (docs: =E2=80=9CThis is faster an=
d uses fewer mallocs=E2=80=9D). To be clear though, LMDB is so incredibly f=
ast and efficient, that in sync-mode, it takes enormous transactions before=
the time spent allocating and creating the dirty pages with the updated b-=
tree is anywhere even remotely close to the time it takes to wait for disk =
flushing, even with an SSD. But the more pertinent question is efficiency, =
and measuring CPU cycles rather than time spent (efficiency is more importa=
nt than just time spent). When I ran my tests this morning of 100 (sync) tr=
ansactions with 100 puts per transaction, times varied quite a bit, but it =
seemed like running with writemap enabled typically averages about 500ms of=
CPU and with writemap disabled it typically averages around 600ms. Not a h=
uge difference, but still definitely worthwhile, I think.<u></u><u></u></p>=
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p><p class=3D"MsoNormal">Cavea=
t emptor: Measuring LMDB performance with sync interactions on Windows is o=
ne of the most frustratingly erratic things to measure. It is sunny outside=
right now, times could be different when it starts raining later, but, thi=
s is what I saw this morning...<u></u><u></u></p><p class=3D"MsoNormal"><u>=
</u>=C2=A0<u></u></p><p class=3D"MsoNormal">> What is the performance di=
fference between your patch using writemap, and just<u></u><u></u></p><p cl=
ass=3D"MsoNormal">> not using writemap in the first place?<u></u><u></u>=
</p><p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p><p class=3D"MsoNormal">R=
unning 1000 sync transactions on 3GB db with a single put per transaction, =
without writemap map, without the patch took about 60 seconds. And it took =
about 1 second with the patch with writemap mode enabled! (there is no sign=
ificant difference in sync times with writemap enabled or disabled with the=
patch.) So the difference was huge in my test. And not only that, without =
the patch, the CPU usage was actually _<i>higher</i>_ during that 60 second=
s (close to 100% of a core) than during the execution with the patch for on=
e second (close to 50%).=C2=A0 Anyway, there are certainly tests I have run=
where the differences are not as large (doing small commits on large dbs a=
ccentuates the differences), but the patch always seems to win. It could al=
so be that my particular configuration causes bigger differences (on an SSD=
drive, and maybe a more fragmented file?).<u></u><u></u></p><p class=3D"Ms=
oNormal"><u></u>=C2=A0<u></u></p><p class=3D"MsoNormal">Anyway, I added err=
or handling for the malloc, and fixed/changed the other things you suggeste=
d. Be happy to make any other changes you want. The updated patch is here:<=
u></u><u></u></p><p class=3D"MsoNormal"><a href=3D"https://github.com/krisz=
yp/node-lmdb/commit/25366dea9453749cf6637f43ec17b9b62094acde" target=3D"_bl=
ank">https://github.com/kriszyp/node-lmdb/commit/25366dea9453749cf6637f43ec=
17b9b62094acde</a><u></u><u></u></p><p class=3D"MsoNormal"><u></u>=C2=A0<u>=
</u></p><p class=3D"MsoNormal">><span class=3D"gmail-m_52424375591479881=
40blob-code-inner"><span style=3D"font-size:9pt;font-family:Consolas;color:=
rgb(36,41,46)"> OVERLAPPED* ov =3D </span></span><span class=3D"gmail-m_524=
2437559147988140pl-c1"><span style=3D"font-size:9pt;font-family:Consolas;co=
lor:rgb(0,92,197)">malloc</span></span><span class=3D"gmail-m_5242437559147=
988140blob-code-inner"><span style=3D"font-size:9pt;font-family:Consolas;co=
lor:rgb(36,41,46)">((pagecount - keep) * </span></span><span class=3D"gmail=
-m_5242437559147988140pl-k"><span style=3D"font-size:9pt;font-family:Consol=
as;color:rgb(215,58,73)">sizeof</span></span><span class=3D"gmail-m_5242437=
559147988140blob-code-inner"><span style=3D"font-size:9pt;font-family:Conso=
las;color:rgb(36,41,46)">(OVERLAPPED));</span></span><span class=3D"gmail-m=
_5242437559147988140blob-code-inner"><span style=3D"font-size:9pt;font-fami=
ly:Consolas;color:rgb(36,41,46)"><u></u><u></u></span></span></p><p class=
=3D"MsoNormal"><span class=3D"gmail-m_5242437559147988140blob-code-inner"><=
span style=3D"font-size:9pt;font-family:Consolas;color:rgb(36,41,46)">> =
</span></span><span style=3D"font-size:10.5pt;font-family:"Segoe UI&qu=
ot;,sans-serif;color:rgb(36,41,46);background:white">Probably this ought to=
just be pre-allocated based on the maximum number of dirty pages a txn all=
ows.</span><span style=3D"font-size:10.5pt;font-family:"Segoe UI"=
,sans-serif;background:white"><u></u><u></u></span></p><p class=3D"MsoNorma=
l"><span style=3D"font-size:10.5pt;font-family:"Segoe UI",sans-se=
rif;color:rgb(36,41,46);background:white"><u></u>=C2=A0<u></u></span></p><p=
class=3D"MsoNormal"><span style=3D"font-size:10.5pt;font-family:"Sego=
e UI",sans-serif;color:rgb(36,41,46);background:white">I wasn=E2=80=99=
t sure I understood this comment. Are you suggesting we </span>malloc(MDB_I=
DL_UM_MAX * sizeof(OVERLAPPED)) for each environment, and retain it for the=
life of the environment? I think that is 4MB, if my math is right, which s=
eems like a lot of memory to keep allocated (we usually have a lot of open =
environments). If the goal is to reduce the number of mallocs, how about we=
retain the OVERLAPPED array, and only free and re-malloc if the previous a=
llocation wasn=E2=80=99t large enough? Then there isn=E2=80=99t unnecessary=
allocation, and we only malloc when there is a bigger transaction than any=
previous. I put this together in a separate commit, as I wasn=E2=80=99t su=
re if this what you wanted (can squash if you prefer): <a href=3D"https://g=ithub.com/kriszyp/node-lmdb/commit/2fe68fb5269c843e2e789746a17a4b2adefaac40=
" target=3D"_blank">https://github.com/kriszyp/node-lmdb/commit/2fe68fb5269=
c843e2e789746a17a4b2adefaac40</a><u></u><u></u></p><p class=3D"MsoNormal"><=
u></u>=C2=A0<u></u></p><p class=3D"MsoNormal">Thank you for the review! <sp=
an style=3D"font-size:10.5pt;font-family:"Segoe UI",sans-serif;co=
lor:rgb(36,41,46);background:white"><u></u><u></u></span></p><p class=3D"Ms=
oNormal"><u></u>=C2=A0<u></u></p><p class=3D"MsoNormal">Thanks,<br>Kris<u><=
/u><u></u></p><p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p><div style=3D"=
border-right:none;border-bottom:none;border-left:none;border-top:1pt solid =
rgb(225,225,225);padding:3pt 0cm 0cm"><p class=3D"MsoNormal"><b>From: </b><=
a href=3D"mailto:hyc@symas.com" target=3D"_blank">Howard Chu</a><br><b>Sent=
: </b>April 30, 2019 7:12 AM<br><b>To: </b><a href=3D"mailto:kriszyp@gmail.=
com" target=3D"_blank">kriszyp(a)gmail.com</a>; <a href=3D"mailto:openldap-it=
s(a)OpenLDAP.org" target=3D"_blank">openldap-its(a)OpenLDAP.org</a><br><b>Subje=
ct: </b>Re: (ITS#9017) Improving performance of commit sync in Windows<u></=
u><u></u></p></div><p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p><p class=
=3D"MsoNormal"><a href=3D"mailto:kriszyp@gmail.com" target=3D"_blank">krisz=
yp(a)gmail.com</a> wrote:<u></u><u></u></p><p class=3D"MsoNormal">> Full_N=
ame: Kristopher William Zyp<u></u><u></u></p><p class=3D"MsoNormal">> Ve=
rsion: LMDB 0.9.23<u></u><u></u></p><p class=3D"MsoNormal">> OS: Windows=
<u></u><u></u></p><p class=3D"MsoNormal">> URL: <a href=3D"https://githu=
b.com/kriszyp/node-lmdb/commit/7ff525ae57684a163d32af74a0ab9332b7fc4ce9" ta=
rget=3D"_blank">https://github.com/kriszyp/node-lmdb/commit/7ff525ae57684a1=
63d32af74a0ab9332b7fc4ce9</a><u></u><u></u></p><p class=3D"MsoNormal">> =
Submission from: (NULL) (71.199.6.148)<u></u><u></u></p><p class=3D"MsoNorm=
al">> <u></u><u></u></p><p class=3D"MsoNormal">> <u></u><u></u></p><p=
class=3D"MsoNormal">> We have seen very poor performance on the sync of=
commits on large databases in<u></u><u></u></p><p class=3D"MsoNormal">>=
Windows. On databases with 2GB of data, in writemap mode, the sync of even=
small<u></u><u></u></p><p class=3D"MsoNormal">> commits is consistently=
well over 100ms (without writemap it is faster, but<u></u><u></u></p><p cl=
ass=3D"MsoNormal">> still slow). It is expected that a sync should take =
some time while waiting for<u></u><u></u></p><p class=3D"MsoNormal">> di=
sk confirmation of the writes, but more concerning is that these sync<u></u=
><u></u></p><p class=3D"MsoNormal">> operations (in writemap mode) are i=
nstead dominated by nearly 100% system CPU<u></u><u></u></p><p class=3D"Mso=
Normal">> utilization, so operations that requires sub-millisecond b-tre=
e update<u></u><u></u></p><p class=3D"MsoNormal">> operations are then d=
ominated by very large amounts of system CPU cycles during<u></u><u></u></p=
><p class=3D"MsoNormal">> the sync phase.<u></u><u></u></p><p class=3D"M=
soNormal">> <u></u><u></u></p><p class=3D"MsoNormal">> I think that t=
he fundamental problem is that FlushViewOfFile seems to be an O(n)<u></u><u=
></u></p><p class=3D"MsoNormal">> operation where n is the size of the f=
ile (or map). I presume that Windows is<u></u><u></u></p><p class=3D"MsoNor=
mal">> scanning the entire map/file for dirty pages to flush, I'm gu=
essing because it<u></u><u></u></p><p class=3D"MsoNormal">> doesn't =
have an internal index of all the dirty pages for every file/map-view in<u>=
</u><u></u></p><p class=3D"MsoNormal">> the OS disk cache. Therefore, th=
e turns into an extremely expensive, CPU-bound<u></u><u></u></p><p class=3D=
"MsoNormal">> operation to find the dirty pages for (large file) and ini=
tiate their writes,<u></u><u></u></p><p class=3D"MsoNormal">> which, of =
course, is contrary to the whole goal of a scalable database system.<u></u>=
<u></u></p><p class=3D"MsoNormal">> And FlushFileBuffers is also relativ=
ely slow as well. We have attempted to batch<u></u><u></u></p><p class=3D"M=
soNormal">> as many operations into single transaction as possible, but =
this is still a very<u></u><u></u></p><p class=3D"MsoNormal">> large ove=
rhead.<u></u><u></u></p><p class=3D"MsoNormal">> <u></u><u></u></p><p cl=
ass=3D"MsoNormal">> The Windows docs for FlushFileBuffers itself warns a=
bout the inefficiencies of<u></u><u></u></p><p class=3D"MsoNormal">> thi=
s function (<a href=3D"https://docs.microsoft.com/en-us/windows/desktop/api=
/fileapi/nf-fileapi-flushfilebuffers" target=3D"_blank">https://docs.micros=oft.com/en-us/windows/desktop/api/fileapi/nf-fileapi-flushfilebuffers</a>).=
<u></u><u></u></p><p class=3D"MsoNormal">> Which also points to the solu=
tion: it is much faster to write out the dirty<u></u><u></u></p><p class=3D=
"MsoNormal">> pages with WriteFile through a sync file handle (FILE_FLAG=
_WRITE_THROUGH).<u></u><u></u></p><p class=3D"MsoNormal">> <u></u><u></u=
></p><p class=3D"MsoNormal">> The associated patch<u></u><u></u></p><p c=
lass=3D"MsoNormal">> (<a href=3D"https://github.com/kriszyp/node-lmdb/co=
mmit/7ff525ae57684a163d32af74a0ab9332b7fc4ce9" target=3D"_blank">https://gi=thub.com/kriszyp/node-lmdb/commit/7ff525ae57684a163d32af74a0ab9332b7fc4ce9<=
/a>)<u></u><u></u></p><p class=3D"MsoNormal">> is my attempt at implemen=
ting this solution, for Windows. Fortunately, with the<u></u><u></u></p><p =
class=3D"MsoNormal">> design of LMDB, this is relatively straightforward=
. LMDB already supports<u></u><u></u></p><p class=3D"MsoNormal">> writin=
g out dirty pages with WriteFile calls. I added a write-through handle for<=
u></u><u></u></p><p class=3D"MsoNormal">> sending these writes directly =
to disk. I then made that file-handle<u></u><u></u></p><p class=3D"MsoNorma=
l">> overlapped/asynchronously, so all the writes for a commit could be =
started in<u></u><u></u></p><p class=3D"MsoNormal">> overlap mode, and (=
at least theoretically) transfer in parallel to the drive and<u></u><u></u>=
</p><p class=3D"MsoNormal">> then used GetOverlappedResult to wait for t=
he completion. So basically<u></u><u></u></p><p class=3D"MsoNormal">> md=
b_page_flush becomes the sync. I extended the writing of dirty pages throug=
h<u></u><u></u></p><p class=3D"MsoNormal">> WriteFile to writemap mode a=
s well (for writing meta too), so that WriteFile<u></u><u></u></p><p class=
=3D"MsoNormal">> with write-through can be used to flush the data withou=
t ever needing to call<u></u><u></u></p><p class=3D"MsoNormal">> FlushVi=
ewOfFile or FlushFileBuffers. I also implemented support for write<u></u><u=
></u></p><p class=3D"MsoNormal">> gathering in writemap mode where conti=
guous file positions infers contiguous<u></u><u></u></p><p class=3D"MsoNorm=
al">> memory (by tracking the starting position with wdp and writing con=
tiguous pages<u></u><u></u></p><p class=3D"MsoNormal">> in single operat=
ions). Sorting of the dirty list is maintained even in writemap<u></u><u></=
u></p><p class=3D"MsoNormal">> mode for this purpose.<u></u><u></u></p><=
p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p><p class=3D"MsoNormal">What i=
s the point of using writemap mode if you still need to use WriteFile<u></u=
><u></u></p><p class=3D"MsoNormal">on every individual page?<u></u><u></u><=
/p><p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p><p class=3D"MsoNormal">&g=
t; The performance benefits of this patch, in my testing, are considerable.=
Writing<u></u><u></u></p><p class=3D"MsoNormal">> out/syncing transacti=
ons is typically over 5x faster in writemap mode, and 2x<u></u><u></u></p><=
p class=3D"MsoNormal">> faster in standard mode. And perhaps more import=
antly (especially in environment<u></u><u></u></p><p class=3D"MsoNormal">&g=
t; with many threads/processes), the efficiency benefits are even larger,<u=
></u><u></u></p><p class=3D"MsoNormal">> particularly in writemap mode, =
where there can be a 50-100x reduction in the<u></u><u></u></p><p class=3D"=
MsoNormal">> system CPU usage by using this patch. This brings windows p=
erformance with<u></u><u></u></p><p class=3D"MsoNormal">> sync'ed tr=
ansactions in LMDB back into the range of "lightning" performance=
:).<u></u><u></u></p><p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p><p cla=
ss=3D"MsoNormal">What is the performance difference between your patch usin=
g writemap, and just<u></u><u></u></p><p class=3D"MsoNormal">not using writ=
emap in the first place?<u></u><u></u></p><p class=3D"MsoNormal"><u></u>=C2=
=A0<u></u></p><p class=3D"MsoNormal">-- <u></u><u></u></p><p class=3D"MsoNo=
rmal">=C2=A0=C2=A0-- Howard Chu<u></u><u></u></p><p class=3D"MsoNormal">=C2=
=A0 CTO, Symas Corp.=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0 <a href=3D"http://www.symas.com" target=3D"_blank">http://www.symas.=
com</a><u></u><u></u></p><p class=3D"MsoNormal">=C2=A0 Director, Highland S=
un=C2=A0=C2=A0=C2=A0=C2=A0 <a href=3D"http://highlandsun.com/hyc/" target=
=3D"_blank">http://highlandsun.com/hyc/</a><u></u><u></u></p><p class=3D"Ms=
oNormal">=C2=A0 Chief Architect, OpenLDAP=C2=A0 <a href=3D"http://www.openl=dap.org/project/" target=3D"_blank">http://www.openldap.org/project/</a><u>=
</u><u></u></p><p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p><p class=3D"M=
soNormal"><u></u>=C2=A0<u></u></p></div></div></blockquote></div>
--000000000000db50ce058b89ac2f--
--On Monday, June 17, 2019 9:59 AM +0200 Armin T=C3=BCting=20
<Armin.Tueting(a)tueting-online.com> wrote:
>> > Hi,
>> >
>> > Thanks for the report. When reporting issues, please keep them on
>> > the=3D20 OpenLDAP bug tracking list, rather than directly emailing
>> > individuals.
>> You will need to provide substantially more information about what
>> you're seeing. I would start with starting slapd in full debug mode
>> (add -d -1 to the startup flags to slapd) and see what issue(s) it
>> reports.
> Slapd won't even start at all! No log!
> I'm attaching the redirected output from 'make test'. In addition the
> 'config.log' and 'slapd_start.txt'.
Hi Armin,
Again, I will please ask you CC replies to the OpenLDAP ITS list=20
(openldap-its(a)openldap.org) so they properly get entered into the issue=20
tracker.
Your slapd_start.txt attachment clearly shows slapd starting:
time /opt/openldap/libexec/slapd -F /opt/openldap/etc/openldap/slapd.d -u=20
ldap -h "ldapi:/// ldap:/// ldaps:///" -d -1=20
ldap_url_parse_ext(ldap://localhost/)
ldap_init: trying /opt/openldap/etc/openldap/ldap.conf
ldap_init: using /opt/openldap/etc/openldap/ldap.conf
^C
I.e., it started and then got as far as reading your ldap.conf file. What=20
is the contents of ldap.conf?
Have you run the test suite (make test)? Does it pass? fail?
Thanks,
Quanah
--
Quanah Gibson-Mount
Product Architect
Symas Corporation
Packaged, certified, and supported LDAP solutions powered by OpenLDAP:
<http://www.symas.com>
grobins(a)pulsesecure.net wrote:
> Full_Name: Robins George
> Version: 0.9.18
> OS: Centos
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (121.244.154.134)
>
>
> I am seeing LMDB crash, please find the stack trace.
>
> #0 0x009e362e in mdb_cursor_put (mc=0xffd59bb8, key=0xffd59d14,
> data=0xffd59d0c, flags=0) at mdb.c:6688
> #1 0x009e48ec in mdb_put (txn=0xd74d3008, dbi=2, key=0xffd59d14,
> data=0xffd59d0c, flags=0) at mdb.c:8771
> #2 0x00f5cef8 in LMDB::LMDBContext::createSession (this=0x80cc610,
> sid=0x80cd0fc "sid1422419e5fd6cd224f278d74293c50f5ef96593700000000+") at
> lmdbint.cc:460
> #3 0x0805398e in updateLMDB (request=..., forward=@0xffd59fff) at
> sessionserver.cc:1154
> #4 processRequestWithoutResponse (request=..., forward=@0xffd59fff) at
> sessionserver.cc:1240
> #5 0x08056936 in ZSubHandler::ioReady (this=0xffd5aa1c, fd=20) at
> sessionserver.cc:1501
> #6 0x00f870d2 in runCoreDispatcher (default_t=<value optimized out>, flags=-1)
> at fds.cc:889
> #7 0x00f87b37 in DSEvntFds::runDispatcher () at fds.cc:945
> #8 0x08056248 in main () at sessionserver.cc:1820
>
> Anybody has seen similar issue before?
Doesn't sound familiar. But 0.9.18 is quite old. If you can reproduce this
issue in 0.9.23 then we'll take a look. Test code to reproduce the problem
would also be needed.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
Hi Leo,
in my tests, everything is handled correctly. Have you been able to
reproduce this with an up-to-date version of OpenLDAP?
If so, can you provide slapd output (stderr) with -d -1? This is
different from slapd syslog output in that libldap/liblber logging is
also included.
Regards,
--
OndÅ™ej KuznÃk
Senior Software Engineer
Symas Corporation http://www.symas.com
Packaged, certified, and supported LDAP solutions powered by OpenLDAP
quanah(a)openldap.org wrote:
> Full_Name: Quanah Gibson-Mount
> Version: 2.4.45
> OS: Windows 10 64-bit
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (47.208.148.239)
>
>
> When building OpenLDAP, we find:
>
> C:/msys64/home/build/sold-master/openldap/tests/progs/slapd-read.c: In function
> 'do_read':
> C:/msys64/home/build/sold-master/openldap/tests/progs/slapd-read.c:384:11:
> warning: '_sleep' is deprecated [-Wdeprecated-declarations]
> sleep( delay );
> ^~~~~
> In file included from
> C:/msys64/home/build/sold-master/openldap/include/ac/stdlib.h:26:0,
> from
> C:/msys64/home/build/sold-master/openldap/tests/progs/slapd-read.c:24:
> C:/msys64/mingw64/x86_64-w64-mingw32/include/stdlib.h:613:24: note: declared
> here
> _CRTIMP void __cdecl _sleep(unsigned long _Duration)
> __MINGW_ATTRIB_DEPRECATED;
> ^~~~~~
>
> This comes from:
>
> include/ac/unistd.h
>
> The correct function call is Sleep
>
> #ifdef _WIN32
> #define sleep Sleep
> #endif
>
> should resolve it.
>
>
Need to be careful here. M$ says the function is supported as of Windows XP
https://docs.microsoft.com/en-us/windows/desktop/api/synchapi/nf-synchapi-s…
but is not defined in any header files until Windows Vista. Any "fix" should only
use Sleep if the current M$ SDK is new enough; the rest of the code still builds
on Windows XP.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/