Thanks! I was aware of little endian transformation. I did not know that
the change of insertion order affects write performance of the database
that much.
romange@gmail.com wrote:
--001a11c1e98008372804e46726c2
Content-Type: text/plain; charset=UTF-8
Hi, I extracted a small dataset that shows the problem.
you can download it from here:
https://docs.google.com/file/**d/**0B6o29pwkWoERdnFSaUtMNDljemc/**
edit?usp=sharinghttps://docs.google.com/file/d/0B6o29pwkWoERdnFSaUtMNDljemc/edit?usp=sharing
I modified mdb_copy.c to demonstrate the difference. copy it to source dir
from here
https://docs.google.com/file/**d/**0B6o29pwkWoERd3VuUm1DN0FpcUU/**
edit?usp=sharinghttps://docs.google.com/file/d/0B6o29pwkWoERd3VuUm1DN0FpcUU/edit?usp=sharing
build and
run "time ./mdb_copy foo foo2"
after this change the flag at line 64 and run it again.
at my computer the difference is 17s vs 1.7s for 3 million items.
This test doesn't prove the existence of a bug. You're running on a
Little-Endian machine, therefore data that is in sorted order as a string
is in hashed order when used as an integer. Your data insert turns into a
worst-case insert order in this case, causing the worst possible random
access strides through memory. Assuming the two orders to be equivalent is
a pretty common mistake for DB programmers. Microsoft has done the same
thing in ActiveDirectory, I mentioned it here a few years ago
http://www.openldap.org/lists/**openldap-devel/200711/**msg00002.htmlhttp://www.openldap.org/lists/openldap-devel/200711/msg00002.html
If you had run this test on a Big-Endian machine, like SPARC, the insert
order would be identical either way, and INTEGERKEY result would have been
faster.
Closing this ITS, no bug.
On Tue, Aug 20, 2013 at 9:04 PM, Quanah Gibson-Mount <quanah@zimbra.com
wrote:
--On Sunday, August 18, 2013 11:46 AM +0000 romange@gmail.com wrote:
Full_Name: Roman Gershman
Version:
OS: linux 3.8.0-25-generic
URL:
Submission from: (NULL) (212.150.97.210)
Please provide further information, specifically:
The size of values
Insert order
Sample code if possible
Thanks,
Quanah
--
Quanah Gibson-Mount
Lead Engineer
Zimbra, Inc
Zimbra :: the leader in open source messaging and collaboration
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/**project/http://www.openldap.org/project/
--
Best regards,
Roman
--f46d043c7d22cec95604e479b9f8
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr"><div class=3D"gmail_extra">Thanks! I was aware of little e=
ndian transformation. I did not know that the change of insertion order aff=
ects write performance of the database that much.=C2=A0</div><div class=3D"=
gmail_extra">
<br></div><div class=3D"gmail_extra"><br></div><div class=3D"gmail_extra"><=
br><div class=3D"gmail_quote">On Wed, Aug 21, <a href=3D"tel:2013" value=3D=
"+9722013" target=3D"_blank">2013</a> at 12:54 AM, Howard Chu <span dir=3D"=
ltr"><<a href=3D"mailto:hyc@symas.com" target=3D"_blank">hyc@symas.com</=
a>></span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><a href=3D"mailto:romange@gmail.com" target=
=3D"_blank">romange@gmail.com</a> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
--001a11c1e98008372804e46726c2<br>
Content-Type: text/plain; charset=3DUTF-8<div><br>
<br>
Hi, I extracted a small dataset that shows the problem.<br>
you can download it from here:<br>
<a href=3D"
https://docs.google.com/file/d/0B6o29pwkWoERdnFSaUtMNDljemc/edit=
?usp=3Dsharing" target=3D"_blank">
https://docs.google.com/file/<u></u>d/<u>=
</u>0B6o29pwkWoERdnFSaUtMNDljemc/<u></u>edit?usp=3Dsharing</a><br>
<br>
I modified mdb_copy.c to demonstrate the difference. copy it to source dir<=
br>
from here<br>
<a href=3D"
https://docs.google.com/file/d/0B6o29pwkWoERd3VuUm1DN0FpcUU/edit=
?usp=3Dsharing" target=3D"_blank">
https://docs.google.com/file/<u></u>d/<u>=
</u>0B6o29pwkWoERd3VuUm1DN0FpcUU/<u></u>edit?usp=3Dsharing</a><br>
<br>
build and<br>
run "time ./mdb_copy foo foo2"<br>
after this change the flag at line 64 and run it again.<br>
at my computer the difference is 17s vs 1.7s for 3 million items.<br>
</div></blockquote>
<br>
This test doesn't prove the existence of a bug. You're running on a=
Little-Endian machine, therefore data that is in sorted order as a string =
is in hashed order when used as an integer. Your data insert turns into a w=
orst-case insert order in this case, causing the worst possible random acce=
ss strides through memory. Assuming the two orders to be equivalent is a pr=
etty common mistake for DB programmers. Microsoft has done the same thing i=
n ActiveDirectory, I mentioned it here a few years ago <a href=3D"
http://ww=
w.openldap.org/lists/openldap-devel/200711/msg00002.html" target=3D"_blank"=
>
http://www.openldap.org/lists/<u></u>openldap-devel/200711/<u></u>msg00002=
.html</a><br>
<br>
If you had run this test on a Big-Endian machine, like SPARC, the insert or=
der would be identical either way, and INTEGERKEY result would have been fa=
ster.<br>
<br>
Closing this ITS, no bug.<div><div><br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
<br>
<br>
<br>
On Tue, Aug 20, <a href=3D"tel:2013" value=3D"+9722013" target=3D"_blank">2=
013</a> at 9:04 PM, Quanah Gibson-Mount <<a href=3D"mailto:quanah@zimbra=
.com" target=3D"_blank">quanah@zimbra.com</a>>wrote:<br>
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
--On Sunday, August 18, <a href=3D"tel:2013" value=3D"+9722013" target=3D"_=
blank">2013</a> 11:46 AM +0000 <a href=3D"mailto:romange@gmail.com" target=
=3D"_blank">romange@gmail.com</a> wrote:<br>
<br>
=C2=A0 Full_Name: Roman Gershman<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
Version:<br>
OS: linux 3.8.0-25-generic<br>
URL:<br>
Submission from: (NULL) (212.150.97.210)<br>
<br>
</blockquote>
<br>
Please provide further information, specifically:<br>
<br>
The size of values<br>
Insert order<br>
Sample code if possible<br>
<br>
Thanks,<br>
Quanah<br>
<br>
<br>
--<br>
<br>
Quanah Gibson-Mount<br>
Lead Engineer<br>
Zimbra, Inc<br>
--------------------<br>
Zimbra :: =C2=A0the leader in open source messaging and collaboration<br>
<br>
</blockquote>
<br>
<br>
<br>
</blockquote>
<br>
<br>
-- <br></div></div><span><font color=3D"#888888">
=C2=A0 -- Howard Chu<br>
=C2=A0 CTO, Symas Corp. =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 <a href=3D"http:=
//www.symas.com" target=3D"_blank">
http://www.symas.com</a><br>
=C2=A0 Director, Highland Sun =C2=A0 =C2=A0 <a href=3D"
http://highlandsun.c=
om/hyc/" target=3D"_blank">
http://highlandsun.com/hyc/</a><br>
=C2=A0 Chief Architect, OpenLDAP =C2=A0<a href=3D"
http://www.openldap.org/p=
roject/" target=3D"_blank">
http://www.openldap.org/<u></u>project/</a><br>
</font></span></blockquote></div><br><br clear=3D"all"><div><br></div>-- <b=
r>Best regards,<br>=C2=A0 =C2=A0=C2=A0 Roman
</div></div>
--f46d043c7d22cec95604e479b9f8--