From romange@gmail.com Wed Aug 21 18:56:33 2013 From: romange@gmail.com To: openldap-bugs@openldap.org Subject: Re: (ITS#7667) performance degradation when using MDB_INTEGERKEY Date: Wed, 21 Aug 2013 18:56:32 +0000 Message-ID: <201308211856.r7LIuW3w072835@boole.openldap.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1864419972610169393==" --===============1864419972610169393== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable --f46d043c7d22cec95604e479b9f8 Content-Type: text/plain; charset=3DUTF-8 Thanks! I was aware of little endian transformation. I did not know that the change of insertion order affects write performance of the database that much. On Wed, Aug 21, 2013 at 12:54 AM, Howard Chu wrote: > romange(a)gmail.com wrote: > >> --001a11c1e98008372804e46726c2 >> Content-Type: text/plain; charset=3DUTF-8 >> >> >> Hi, I extracted a small dataset that shows the problem. >> you can download it from here: >> https://docs.google.com/file/**d/**0B6o29pwkWoERdnFSaUtMNDljemc/** >> edit?usp=3Dsharing >> >> I modified mdb_copy.c to demonstrate the difference. copy it to source dir >> from here >> https://docs.google.com/file/**d/**0B6o29pwkWoERd3VuUm1DN0FpcUU/** >> edit?usp=3Dsharing >> >> build and >> run "time ./mdb_copy foo foo2" >> after this change the flag at line 64 and run it again. >> at my computer the difference is 17s vs 1.7s for 3 million items. >> > > This test doesn't prove the existence of a bug. You're running on a > Little-Endian machine, therefore data that is in sorted order as a string > is in hashed order when used as an integer. Your data insert turns into a > worst-case insert order in this case, causing the worst possible random > access strides through memory. Assuming the two orders to be equivalent is > a pretty common mistake for DB programmers. Microsoft has done the same > thing in ActiveDirectory, I mentioned it here a few years ago > http://www.openldap.org/lists/**openldap-devel/200711/**msg00002.html > > If you had run this test on a Big-Endian machine, like SPARC, the insert > order would be identical either way, and INTEGERKEY result would have been > faster. > > Closing this ITS, no bug. > > >> >> >> On Tue, Aug 20, 2013 at 9:04 PM, Quanah Gibson-Mount > >wrote: >> >> --On Sunday, August 18, 2013 11:46 AM +0000 romange(a)gmail.com wrote: >>> >>> Full_Name: Roman Gershman >>> >>>> Version: >>>> OS: linux 3.8.0-25-generic >>>> URL: >>>> Submission from: (NULL) (212.150.97.210) >>>> >>>> >>> Please provide further information, specifically: >>> >>> The size of values >>> Insert order >>> Sample code if possible >>> >>> Thanks, >>> Quanah >>> >>> >>> -- >>> >>> Quanah Gibson-Mount >>> Lead Engineer >>> Zimbra, Inc >>> -------------------- >>> Zimbra :: the leader in open source messaging and collaboration >>> >>> >> >> >> > > -- > -- Howard Chu > CTO, Symas Corp. http://www.symas.com > Director, Highland Sun http://highlandsun.com/hyc/ > Chief Architect, OpenLDAP http://www.openldap.org/**project/ > --=20 Best regards, Roman --f46d043c7d22cec95604e479b9f8 Content-Type: text/html; charset=3DUTF-8 Content-Transfer-Encoding: quoted-printable
Thanks! I was aware of little= e=3D ndian transformation. I did not know that the change of insertion order aff=3D ects write performance of the database that much.=3DC2=3DA0


<=3D br>
On Wed, Aug 21, 2013 at 12:54 AM, Howard Chu <hyc(a)symas= .com> wrote:
romange(a)gmail.com wrote:
--001a11c1e98008372804e46726c2
Content-Type: text/plain; charset=3D3DUTF-8


Hi, I extracted a small dataset that shows the problem.
you can download it from here:
https://docs.google.com/file/d/<= u>=3D 0B6o29pwkWoERdnFSaUtMNDljemc/edit?usp=3D3Dsharing

I modified mdb_copy.c to demonstrate the difference. copy it to source dir<=3D br> from here
https://docs.google.com/file/d/<= u>=3D 0B6o29pwkWoERd3VuUm1DN0FpcUU/edit?usp=3D3Dsharing

build and
run "time ./mdb_copy foo foo2"
after this change the flag at line 64 and run it again.
at my computer the difference is 17s vs 1.7s for 3 million items.

This test doesn't prove the existence of a bug. You're running on a=3D Little-Endian machine, therefore data that is in sorted order as a string =3D is in hashed order when used as an integer. Your data insert turns into a w=3D orst-case insert order in this case, causing the worst possible random acce=3D ss strides through memory. Assuming the two orders to be equivalent is a pr=3D etty common mistake for DB programmers. Microsoft has done the same thing i=3D n ActiveDirectory, I mentioned it here a few years ago http://www.openldap.org/lists/openldap-devel/200711/msg00002=3D .html

If you had run this test on a Big-Endian machine, like SPARC, the insert or=3D der would be identical either way, and INTEGERKEY result would have been fa=3D ster.

Closing this ITS, no bug.




On Tue, Aug 20, 2=3D 013 at 9:04 PM, Quanah Gibson-Mount <quanah(a)zimbra.com>wrote:

--On Sunday, August 18, 2013 11:46 AM +0000 romange(a)gmail.com wrote:

=3DC2=3DA0 Full_Name: Roman Gershman
Version:
OS: linux 3.8.0-25-generic
URL:
Submission from: (NULL) (212.150.97.210)


Please provide further information, specifically:

The size of values
Insert order
Sample code if possible

Thanks,
Quanah


--

Quanah Gibson-Mount
Lead Engineer
Zimbra, Inc
--------------------
Zimbra :: =3DC2=3DA0the leader in open source messaging and collaboration






--
=3DC2=3DA0 -- Howard Chu
=3DC2=3DA0 CTO, Symas Corp. =3DC2=3DA0 =3DC2=3DA0 =3DC2=3DA0 =3DC2=3DA0 =3DC2= =3DA0 http://www.symas.com
=3DC2=3DA0 Director, Highland Sun =3DC2=3DA0 =3DC2=3DA0 http://highlandsun.com/hyc/
=3DC2=3DA0 Chief Architect, OpenLDAP =3DC2=3DA0http://www.openldap.org/project/



-- Best regards,
=3DC2=3DA0 =3DC2=3DA0=3DC2=3DA0 Roman
--f46d043c7d22cec95604e479b9f8-- --===============1864419972610169393==--