Re: (ITS#7672) LMDB: mdb_dbi_flags fails with newly created DataBase
by hyc@symas.com
sog(a)msg.com.mx wrote:
> Full_Name: Salvador Ortiz
> Version: 24
> OS: Linux
> URL:
> Submission from: (NULL) (187.162.45.111)
>
>
> Using mdb_dbi_flags with newly created database fails or lies.
mdb_dbi_flags() has been changed to take a txn instead of an env pointer.
Fixed in mdb.master.
> In the case of MAIN _DBI, if I set some flags in mdb_dbi_open, the flags isn't
> propagated to the environment, I think the fix is simple:
>
> --- a/libraries/liblmdb/mdb.c
> +++ b/libraries/liblmdb/mdb.c
> @@ -7881,6 +7881,7 @@ int mdb_dbi_open(MDB_txn *txn, const char *name, unsigned
> int flags, MDB_dbi *db
> /* make sure flag changes get committed */
> if ((txn->mt_dbs[MAIN_DBI].md_flags | f2) !=
> txn->mt_dbs[MAIN_DBI].md_flags) {
> txn->mt_dbs[MAIN_DBI].md_flags |= f2;
> + txn->mt_env->me_dbflags[MAIN_DBI] =
> txn->mt_dbs[MAIN_DBI].md_flags;
> txn->mt_flags |= MDB_TXN_DIRTY;
> }
> }
>
> But in the case of a newly created named database, env->me_numdbs isn't adjusted
> until the transaction is committed, so mdb_env_flags fails.
>
> The more that I think about it, seems that the proper way to get the flags of an
> opened db is with a new API, something like:
>
> int mbd_get_flags(MDB_txn *, MDB_dbi dbi, unsigned int *flags)
> {
> if (txn == NULL || arg == NULL || dbi >= txn->mt_numdbs)
> return EINVAL;
>
> *flags = txn->mt_dbflags[dbi];
> return MDB_SUCCESS;
> }
>
>
> Comments?
>
>
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
10 years
(ITS#7672) LMDB: mdb_dbi_flags fails with newly created DataBase
by sog@msg.com.mx
Full_Name: Salvador Ortiz
Version: 24
OS: Linux
URL:
Submission from: (NULL) (187.162.45.111)
Using mdb_dbi_flags with newly created database fails or lies.
In the case of MAIN _DBI, if I set some flags in mdb_dbi_open, the flags isn't
propagated to the environment, I think the fix is simple:
--- a/libraries/liblmdb/mdb.c
+++ b/libraries/liblmdb/mdb.c
@@ -7881,6 +7881,7 @@ int mdb_dbi_open(MDB_txn *txn, const char *name, unsigned
int flags, MDB_dbi *db
/* make sure flag changes get committed */
if ((txn->mt_dbs[MAIN_DBI].md_flags | f2) !=
txn->mt_dbs[MAIN_DBI].md_flags) {
txn->mt_dbs[MAIN_DBI].md_flags |= f2;
+ txn->mt_env->me_dbflags[MAIN_DBI] =
txn->mt_dbs[MAIN_DBI].md_flags;
txn->mt_flags |= MDB_TXN_DIRTY;
}
}
But in the case of a newly created named database, env->me_numdbs isn't adjusted
until the transaction is committed, so mdb_env_flags fails.
The more that I think about it, seems that the proper way to get the flags of an
opened db is with a new API, something like:
int mbd_get_flags(MDB_txn *, MDB_dbi dbi, unsigned int *flags)
{
if (txn == NULL || arg == NULL || dbi >= txn->mt_numdbs)
return EINVAL;
*flags = txn->mt_dbflags[dbi];
return MDB_SUCCESS;
}
Comments?
10 years
Re: (ITS#7670) mdb_cursor_del() behaviour improvement
by hyc@symas.com
spam(a)markandruth.co.uk wrote:
> Full_Name: Mark Zealey
> Version: git
> OS: linux
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (88.253.191.53)
>
>
> As per email thread with Howard, mdb_cursor_del() has a somewhat strange
> behaviour when you reach the end of a page. To my mind it should skip to the
> next entry. This should also be documented as at the moment there is no sign in
> the docs of what happens to the cursor after a delete.
Fixed now in mdb.master, cursor_del will always try to advance to the next item.
> Mark Zealey wrote:
>> On 23/08/13 00:07, Howard Chu wrote:
>>> ... as I said already, it does exactly what you said. When you've
>>> deleted the last item on the page the cursor no longer points at a
>>> valid node, so GET_CURRENT returns EINVAL.
>>
>> OK I think that's a pretty big gotcha - it would be great to either make
>> the cursor behaviour consistent (ie automatically skip to next record at
>> end of a page) or to very clearly document this! It's pretty different
>> from kyoto/bdb interfaces in that (to my mind at least) it requires a
>> bit too much knowledge from the developer about the underlying structure
>> of the database which should be abstracted by the interface.
>
> Yeah, this should probably be changed. Could you submit an ITS for this? Thanks.
>
>
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
10 years
Re: (ITS#7666) segfault when searching regex minus than 3 characters over translucent
by hyc@symas.com
I tried to use your configuration to reproduce your error but saw no crash.
Probably there are other elements of the configuration or test data missing,
or the exact sequence of steps you followed is missing.
theju ju wrote:
> # schema.perso/c.schema
>
> attributetype ( 1.3.6.1.4.1.10000.13.2.20
> NAME 'Application'
> DESC 'Acces sur les application'
> EQUALITY caseIgnoreMatch
> SUBSTR caseIgnoreSubstringsMatch
> SYNTAX 1.3.6.1.4.1.1466.115.121.1.15 )
>
> objectclass ( 1.3.6.1.4.1.10013.2.2.1.0.0
> NAME 'cPerson'
> SUP 'inetOrgPerson'
> STRUCTURAL
> MUST ( uid )
> MAY ( Application) )
>
>
>
> #slapd.conf
>
> include /etc/openldap/schema/core.schema
> include /etc/openldap/schema/cosine.schema
> include /etc/openldap/schema/nis.schema
> include /etc/openldap/schema/inetorgperson.schema
> include /etc/openldap/schema.perso/c.schema
>
> pidfile /var/run/slapd/slapd.pid
> argsfile /var/run/slapd/slapd.args
> loglevel 2
>
> allow bind_v2
>
> # The maximum number of entries that is returned for a search operation
> sizelimit 500000
>
> # The tool-threads parameter sets the actual amount of cpu's that is used
> # for indexing.
> tool-threads 1
>
>
> database bdb
>
> # The base of your directory in database #1
> suffix "ou=People,dc=c,dc=fr"
>
> # rootdn directive for specifying a superuser on the database. This is needed
> # for syncrepl.
> rootdn "cn=admin,ou=People,dc=c,dc=fr"
> rootpw "password"
>
>
> # Where the database file are physically stored for database #1
> directory "/var/lib/ldap-people"
>
> dbconfig set_cachesize 0 536870912 0
> dbconfig set_flags DB_LOG_AUTOREMOVE
> dbconfig set_lk_max_objects 1500
> dbconfig set_lk_max_locks 1500
> dbconfig set_lk_max_lockers 1500
>
>
> index objectClass eq,pres
> index ou,cn,mail,surname,givenname eq,pres,sub
> index uid eq,pres
> index Application eq,pres,sub
>
>
> overlay translucent
>
> # on demande que les resultats des 2 annuaires soient mergés
> translucent_no_glue off
> translucent_strict off
>
> #liste des attribut a chercher sur l'overlay
> translucent_local Application
> #liste des attributs a chercher sur le master
> translucent_remote
> sn,GivenName,mail,street,Postalcode,l,uid,facsimileTelephoneNumber
>
> #activation du bind local
> translucent_bind_local on
>
> # activation de la possibilité de changer le mot de passe
> translucent_pwmod_local on
>
> uri ldap://ldapr.c.fr <http://ldapr.c.fr>
> lastmod off
> acl-bind binddn="cn=admin,ou=People,dc=c,dc=fr" credentials="password"
>
> access to attrs=userPassword,shadowLastChange
> by dn="cn=admin,ou=People,dc=c,dc=fr" write
> by anonymous auth
> by self write
> by * none
>
> access to dn.base=""
> by * read
>
>
> Ex user :
>
> dn: uid=w.k.1,ou=c,ou=People,dc=c,dc=fr
> displayName: K W
> givenName: W
> postalCode: 44095
> objectClass: cPerson
> uid: w.k.1
> mail: w.k(a)mail.fr <mailto:w.k@mail.fr>
> cn: K W
> telephoneNumber: 06 06 06 06 06
> o: C
> l: MON
> sn: KNAP
> Application: contrat:ABC221:082534
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
10 years
(ITS#7671) LMDB cursor_next key
by hyc@OpenLDAP.org
Full_Name: Howard Chu
Version: 2.4.36
OS:
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (78.155.233.73)
Submitted by: hyc
When the next value to return is a duplicate value, cursor_get(MDB_NEXT_DUP)
skips returning the key, because it is assumed to already be known.
cursor_get(MDB_NEXT) also skips returning the key because it falls thru the same
code, but in this case we cannot assume the key is already known.
10 years
(ITS#7670) mdb_cursor_del() behaviour improvement
by spam@markandruth.co.uk
Full_Name: Mark Zealey
Version: git
OS: linux
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (88.253.191.53)
As per email thread with Howard, mdb_cursor_del() has a somewhat strange
behaviour when you reach the end of a page. To my mind it should skip to the
next entry. This should also be documented as at the moment there is no sign in
the docs of what happens to the cursor after a delete.
Mark Zealey wrote:
> On 23/08/13 00:07, Howard Chu wrote:
>> ... as I said already, it does exactly what you said. When you've
>> deleted the last item on the page the cursor no longer points at a
>> valid node, so GET_CURRENT returns EINVAL.
>
> OK I think that's a pretty big gotcha - it would be great to either make
> the cursor behaviour consistent (ie automatically skip to next record at
> end of a page) or to very clearly document this! It's pretty different
> from kyoto/bdb interfaces in that (to my mind at least) it requires a
> bit too much knowledge from the developer about the underlying structure
> of the database which should be abstracted by the interface.
Yeah, this should probably be changed. Could you submit an ITS for this? Thanks.
10 years
Re: (ITS#7667) performance degradation when using MDB_INTEGERKEY
by romange@gmail.com
--047d7b66f59964903104e4826241
Content-Type: text/plain; charset=UTF-8
Just to give some numbers: if the keys are sent in order, the insertion
of 166,500,000 items finishes in about 3min.
When the order is random (keys are reversed): it inserted 34,000,000 items
in 548 min (the average time per item was 967usec or 1000 ops/sec).
I wonder how other DBs (leveldb?) behave when number of items grows beyond
several millions.
On Wed, Aug 21, 2013 at 9:55 PM, Roman Gershman <romange(a)gmail.com> wrote:
> Thanks! I was aware of little endian transformation. I did not know that
> the change of insertion order affects write performance of the database
> that much.
>
>
>
> On Wed, Aug 21, 2013 at 12:54 AM, Howard Chu <hyc(a)symas.com> wrote:
>
>> romange(a)gmail.com wrote:
>>
>>> --001a11c1e98008372804e46726c2
>>> Content-Type: text/plain; charset=UTF-8
>>>
>>>
>>> Hi, I extracted a small dataset that shows the problem.
>>> you can download it from here:
>>> https://docs.google.com/file/**d/**0B6o29pwkWoERdnFSaUtMNDljemc/**
>>> edit?usp=sharing<https://docs.google.com/file/d/0B6o29pwkWoERdnFSaUtMNDljemc/edit?usp=sharing>
>>>
>>> I modified mdb_copy.c to demonstrate the difference. copy it to source
>>> dir
>>> from here
>>> https://docs.google.com/file/**d/**0B6o29pwkWoERd3VuUm1DN0FpcUU/**
>>> edit?usp=sharing<https://docs.google.com/file/d/0B6o29pwkWoERd3VuUm1DN0FpcUU/edit?usp=sharing>
>>>
>>> build and
>>> run "time ./mdb_copy foo foo2"
>>> after this change the flag at line 64 and run it again.
>>> at my computer the difference is 17s vs 1.7s for 3 million items.
>>>
>>
>> This test doesn't prove the existence of a bug. You're running on a
>> Little-Endian machine, therefore data that is in sorted order as a string
>> is in hashed order when used as an integer. Your data insert turns into a
>> worst-case insert order in this case, causing the worst possible random
>> access strides through memory. Assuming the two orders to be equivalent is
>> a pretty common mistake for DB programmers. Microsoft has done the same
>> thing in ActiveDirectory, I mentioned it here a few years ago
>> http://www.openldap.org/lists/**openldap-devel/200711/**msg00002.html<http://www.openldap.org/lists/openldap-devel/200711/msg00002.html>
>>
>> If you had run this test on a Big-Endian machine, like SPARC, the insert
>> order would be identical either way, and INTEGERKEY result would have been
>> faster.
>>
>> Closing this ITS, no bug.
>>
>>
>>>
>>>
>>> On Tue, Aug 20, 2013 at 9:04 PM, Quanah Gibson-Mount <quanah(a)zimbra.com
>>> >wrote:
>>>
>>> --On Sunday, August 18, 2013 11:46 AM +0000 romange(a)gmail.com wrote:
>>>>
>>>> Full_Name: Roman Gershman
>>>>
>>>>> Version:
>>>>> OS: linux 3.8.0-25-generic
>>>>> URL:
>>>>> Submission from: (NULL) (212.150.97.210)
>>>>>
>>>>>
>>>> Please provide further information, specifically:
>>>>
>>>> The size of values
>>>> Insert order
>>>> Sample code if possible
>>>>
>>>> Thanks,
>>>> Quanah
>>>>
>>>>
>>>> --
>>>>
>>>> Quanah Gibson-Mount
>>>> Lead Engineer
>>>> Zimbra, Inc
>>>> --------------------
>>>> Zimbra :: the leader in open source messaging and collaboration
>>>>
>>>>
>>>
>>>
>>>
>>
>> --
>> -- Howard Chu
>> CTO, Symas Corp. http://www.symas.com
>> Director, Highland Sun http://highlandsun.com/hyc/
>> Chief Architect, OpenLDAP http://www.openldap.org/**project/<http://www.openldap.org/project/>
>>
>
>
>
> --
> Best regards,
> Roman
>
--
Best regards,
Roman
--047d7b66f59964903104e4826241
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr">Just to give some numbers: if the keys are sent in order, =
the insertion of=C2=A0166,500,000 items finishes in about 3min.<div>When th=
e order is random (keys are reversed): it inserted 34,000,000 items in 548 =
min (the average time per item was 967usec or 1000 ops/sec).</div>
<div><br></div><div>I wonder how other DBs (leveldb?) behave when number of=
items grows beyond several millions.</div></div><div class=3D"gmail_extra"=
><br><br><div class=3D"gmail_quote">On Wed, Aug 21, 2013 at 9:55 PM, Roman =
Gershman <span dir=3D"ltr"><<a href=3D"mailto:romange@gmail.com" target=
=3D"_blank">romange(a)gmail.com</a>></span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_extra">=
Thanks! I was aware of little endian transformation. I did not know that th=
e change of insertion order affects write performance of the database that =
much.=C2=A0</div>
<div class=3D"gmail_extra">
<br></div><div class=3D"gmail_extra"><br></div><div class=3D"gmail_extra"><=
div><div class=3D"h5"><br><div class=3D"gmail_quote">On Wed, Aug 21, <a hre=
f=3D"tel:2013" value=3D"+9722013" target=3D"_blank">2013</a> at 12:54 AM, H=
oward Chu <span dir=3D"ltr"><<a href=3D"mailto:hyc@symas.com" target=3D"=
_blank">hyc(a)symas.com</a>></span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><a href=3D"mailto:romange@gmail.com" target=
=3D"_blank">romange(a)gmail.com</a> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
--001a11c1e98008372804e46726c2<br>
Content-Type: text/plain; charset=3DUTF-8<div><br>
<br>
Hi, I extracted a small dataset that shows the problem.<br>
you can download it from here:<br>
<a href=3D"https://docs.google.com/file/d/0B6o29pwkWoERdnFSaUtMNDljemc/edit=
?usp=3Dsharing" target=3D"_blank">https://docs.google.com/file/<u></u>d/<u>=
</u>0B6o29pwkWoERdnFSaUtMNDljemc/<u></u>edit?usp=3Dsharing</a><br>
<br>
I modified mdb_copy.c to demonstrate the difference. copy it to source dir<=
br>
from here<br>
<a href=3D"https://docs.google.com/file/d/0B6o29pwkWoERd3VuUm1DN0FpcUU/edit=
?usp=3Dsharing" target=3D"_blank">https://docs.google.com/file/<u></u>d/<u>=
</u>0B6o29pwkWoERd3VuUm1DN0FpcUU/<u></u>edit?usp=3Dsharing</a><br>
<br>
build and<br>
run "time ./mdb_copy foo foo2"<br>
after this change the flag at line 64 and run it again.<br>
at my computer the difference is 17s vs 1.7s for 3 million items.<br>
</div></blockquote>
<br>
This test doesn't prove the existence of a bug. You're running on a=
Little-Endian machine, therefore data that is in sorted order as a string =
is in hashed order when used as an integer. Your data insert turns into a w=
orst-case insert order in this case, causing the worst possible random acce=
ss strides through memory. Assuming the two orders to be equivalent is a pr=
etty common mistake for DB programmers. Microsoft has done the same thing i=
n ActiveDirectory, I mentioned it here a few years ago <a href=3D"http://ww=
w.openldap.org/lists/openldap-devel/200711/msg00002.html" target=3D"_blank"=
>http://www.openldap.org/lists/<u></u>openldap-devel/200711/<u></u>msg00002=
.html</a><br>
<br>
If you had run this test on a Big-Endian machine, like SPARC, the insert or=
der would be identical either way, and INTEGERKEY result would have been fa=
ster.<br>
<br>
Closing this ITS, no bug.<div><div><br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
<br>
<br>
<br>
On Tue, Aug 20, <a href=3D"tel:2013" value=3D"+9722013" target=3D"_blank">2=
013</a> at 9:04 PM, Quanah Gibson-Mount <<a href=3D"mailto:quanah@zimbra=
.com" target=3D"_blank">quanah(a)zimbra.com</a>>wrote:<br>
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
--On Sunday, August 18, <a href=3D"tel:2013" value=3D"+9722013" target=3D"_=
blank">2013</a> 11:46 AM +0000 <a href=3D"mailto:romange@gmail.com" target=
=3D"_blank">romange(a)gmail.com</a> wrote:<br>
<br>
=C2=A0 Full_Name: Roman Gershman<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
Version:<br>
OS: linux 3.8.0-25-generic<br>
URL:<br>
Submission from: (NULL) (212.150.97.210)<br>
<br>
</blockquote>
<br>
Please provide further information, specifically:<br>
<br>
The size of values<br>
Insert order<br>
Sample code if possible<br>
<br>
Thanks,<br>
Quanah<br>
<br>
<br>
--<br>
<br>
Quanah Gibson-Mount<br>
Lead Engineer<br>
Zimbra, Inc<br>
--------------------<br>
Zimbra :: =C2=A0the leader in open source messaging and collaboration<br>
<br>
</blockquote>
<br>
<br>
<br>
</blockquote>
<br>
<br>
-- <br></div></div><span><font color=3D"#888888">
=C2=A0 -- Howard Chu<br>
=C2=A0 CTO, Symas Corp. =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 <a href=3D"http:=
//www.symas.com" target=3D"_blank">http://www.symas.com</a><br>
=C2=A0 Director, Highland Sun =C2=A0 =C2=A0 <a href=3D"http://highlandsun.c=
om/hyc/" target=3D"_blank">http://highlandsun.com/hyc/</a><br>
=C2=A0 Chief Architect, OpenLDAP =C2=A0<a href=3D"http://www.openldap.org/p=
roject/" target=3D"_blank">http://www.openldap.org/<u></u>project/</a><br>
</font></span></blockquote></div><br><br clear=3D"all"><div><br></div></div=
></div><span class=3D"HOEnZb"><font color=3D"#888888">-- <br>Best regards,<=
br>=C2=A0 =C2=A0=C2=A0 Roman
</font></span></div></div>
</blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>Best regards=
,<br>=C2=A0 =C2=A0=C2=A0 Roman
</div>
--047d7b66f59964903104e4826241--
10 years, 1 month
Re: (ITS#7667) performance degradation when using MDB_INTEGERKEY
by romange@gmail.com
--f46d043c7d22cec95604e479b9f8
Content-Type: text/plain; charset=UTF-8
Thanks! I was aware of little endian transformation. I did not know that
the change of insertion order affects write performance of the database
that much.
On Wed, Aug 21, 2013 at 12:54 AM, Howard Chu <hyc(a)symas.com> wrote:
> romange(a)gmail.com wrote:
>
>> --001a11c1e98008372804e46726c2
>> Content-Type: text/plain; charset=UTF-8
>>
>>
>> Hi, I extracted a small dataset that shows the problem.
>> you can download it from here:
>> https://docs.google.com/file/**d/**0B6o29pwkWoERdnFSaUtMNDljemc/**
>> edit?usp=sharing<https://docs.google.com/file/d/0B6o29pwkWoERdnFSaUtMNDljemc/edit?usp=sharing>
>>
>> I modified mdb_copy.c to demonstrate the difference. copy it to source dir
>> from here
>> https://docs.google.com/file/**d/**0B6o29pwkWoERd3VuUm1DN0FpcUU/**
>> edit?usp=sharing<https://docs.google.com/file/d/0B6o29pwkWoERd3VuUm1DN0FpcUU/edit?usp=sharing>
>>
>> build and
>> run "time ./mdb_copy foo foo2"
>> after this change the flag at line 64 and run it again.
>> at my computer the difference is 17s vs 1.7s for 3 million items.
>>
>
> This test doesn't prove the existence of a bug. You're running on a
> Little-Endian machine, therefore data that is in sorted order as a string
> is in hashed order when used as an integer. Your data insert turns into a
> worst-case insert order in this case, causing the worst possible random
> access strides through memory. Assuming the two orders to be equivalent is
> a pretty common mistake for DB programmers. Microsoft has done the same
> thing in ActiveDirectory, I mentioned it here a few years ago
> http://www.openldap.org/lists/**openldap-devel/200711/**msg00002.html<http://www.openldap.org/lists/openldap-devel/200711/msg00002.html>
>
> If you had run this test on a Big-Endian machine, like SPARC, the insert
> order would be identical either way, and INTEGERKEY result would have been
> faster.
>
> Closing this ITS, no bug.
>
>
>>
>>
>> On Tue, Aug 20, 2013 at 9:04 PM, Quanah Gibson-Mount <quanah(a)zimbra.com
>> >wrote:
>>
>> --On Sunday, August 18, 2013 11:46 AM +0000 romange(a)gmail.com wrote:
>>>
>>> Full_Name: Roman Gershman
>>>
>>>> Version:
>>>> OS: linux 3.8.0-25-generic
>>>> URL:
>>>> Submission from: (NULL) (212.150.97.210)
>>>>
>>>>
>>> Please provide further information, specifically:
>>>
>>> The size of values
>>> Insert order
>>> Sample code if possible
>>>
>>> Thanks,
>>> Quanah
>>>
>>>
>>> --
>>>
>>> Quanah Gibson-Mount
>>> Lead Engineer
>>> Zimbra, Inc
>>> --------------------
>>> Zimbra :: the leader in open source messaging and collaboration
>>>
>>>
>>
>>
>>
>
> --
> -- Howard Chu
> CTO, Symas Corp. http://www.symas.com
> Director, Highland Sun http://highlandsun.com/hyc/
> Chief Architect, OpenLDAP http://www.openldap.org/**project/<http://www.openldap.org/project/>
>
--
Best regards,
Roman
--f46d043c7d22cec95604e479b9f8
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr"><div class=3D"gmail_extra">Thanks! I was aware of little e=
ndian transformation. I did not know that the change of insertion order aff=
ects write performance of the database that much.=C2=A0</div><div class=3D"=
gmail_extra">
<br></div><div class=3D"gmail_extra"><br></div><div class=3D"gmail_extra"><=
br><div class=3D"gmail_quote">On Wed, Aug 21, <a href=3D"tel:2013" value=3D=
"+9722013" target=3D"_blank">2013</a> at 12:54 AM, Howard Chu <span dir=3D"=
ltr"><<a href=3D"mailto:hyc@symas.com" target=3D"_blank">hyc(a)symas.com</=
a>></span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><a href=3D"mailto:romange@gmail.com" target=
=3D"_blank">romange(a)gmail.com</a> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
--001a11c1e98008372804e46726c2<br>
Content-Type: text/plain; charset=3DUTF-8<div><br>
<br>
Hi, I extracted a small dataset that shows the problem.<br>
you can download it from here:<br>
<a href=3D"https://docs.google.com/file/d/0B6o29pwkWoERdnFSaUtMNDljemc/edit=
?usp=3Dsharing" target=3D"_blank">https://docs.google.com/file/<u></u>d/<u>=
</u>0B6o29pwkWoERdnFSaUtMNDljemc/<u></u>edit?usp=3Dsharing</a><br>
<br>
I modified mdb_copy.c to demonstrate the difference. copy it to source dir<=
br>
from here<br>
<a href=3D"https://docs.google.com/file/d/0B6o29pwkWoERd3VuUm1DN0FpcUU/edit=
?usp=3Dsharing" target=3D"_blank">https://docs.google.com/file/<u></u>d/<u>=
</u>0B6o29pwkWoERd3VuUm1DN0FpcUU/<u></u>edit?usp=3Dsharing</a><br>
<br>
build and<br>
run "time ./mdb_copy foo foo2"<br>
after this change the flag at line 64 and run it again.<br>
at my computer the difference is 17s vs 1.7s for 3 million items.<br>
</div></blockquote>
<br>
This test doesn't prove the existence of a bug. You're running on a=
Little-Endian machine, therefore data that is in sorted order as a string =
is in hashed order when used as an integer. Your data insert turns into a w=
orst-case insert order in this case, causing the worst possible random acce=
ss strides through memory. Assuming the two orders to be equivalent is a pr=
etty common mistake for DB programmers. Microsoft has done the same thing i=
n ActiveDirectory, I mentioned it here a few years ago <a href=3D"http://ww=
w.openldap.org/lists/openldap-devel/200711/msg00002.html" target=3D"_blank"=
>http://www.openldap.org/lists/<u></u>openldap-devel/200711/<u></u>msg00002=
.html</a><br>
<br>
If you had run this test on a Big-Endian machine, like SPARC, the insert or=
der would be identical either way, and INTEGERKEY result would have been fa=
ster.<br>
<br>
Closing this ITS, no bug.<div><div><br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
<br>
<br>
<br>
On Tue, Aug 20, <a href=3D"tel:2013" value=3D"+9722013" target=3D"_blank">2=
013</a> at 9:04 PM, Quanah Gibson-Mount <<a href=3D"mailto:quanah@zimbra=
.com" target=3D"_blank">quanah(a)zimbra.com</a>>wrote:<br>
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
--On Sunday, August 18, <a href=3D"tel:2013" value=3D"+9722013" target=3D"_=
blank">2013</a> 11:46 AM +0000 <a href=3D"mailto:romange@gmail.com" target=
=3D"_blank">romange(a)gmail.com</a> wrote:<br>
<br>
=C2=A0 Full_Name: Roman Gershman<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
Version:<br>
OS: linux 3.8.0-25-generic<br>
URL:<br>
Submission from: (NULL) (212.150.97.210)<br>
<br>
</blockquote>
<br>
Please provide further information, specifically:<br>
<br>
The size of values<br>
Insert order<br>
Sample code if possible<br>
<br>
Thanks,<br>
Quanah<br>
<br>
<br>
--<br>
<br>
Quanah Gibson-Mount<br>
Lead Engineer<br>
Zimbra, Inc<br>
--------------------<br>
Zimbra :: =C2=A0the leader in open source messaging and collaboration<br>
<br>
</blockquote>
<br>
<br>
<br>
</blockquote>
<br>
<br>
-- <br></div></div><span><font color=3D"#888888">
=C2=A0 -- Howard Chu<br>
=C2=A0 CTO, Symas Corp. =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 <a href=3D"http:=
//www.symas.com" target=3D"_blank">http://www.symas.com</a><br>
=C2=A0 Director, Highland Sun =C2=A0 =C2=A0 <a href=3D"http://highlandsun.c=
om/hyc/" target=3D"_blank">http://highlandsun.com/hyc/</a><br>
=C2=A0 Chief Architect, OpenLDAP =C2=A0<a href=3D"http://www.openldap.org/p=
roject/" target=3D"_blank">http://www.openldap.org/<u></u>project/</a><br>
</font></span></blockquote></div><br><br clear=3D"all"><div><br></div>-- <b=
r>Best regards,<br>=C2=A0 =C2=A0=C2=A0 Roman
</div></div>
--f46d043c7d22cec95604e479b9f8--
10 years, 1 month
Re: (ITS#7667) performance degradation when using MDB_INTEGERKEY
by hyc@symas.com
romange(a)gmail.com wrote:
> --001a11c1e98008372804e46726c2
> Content-Type: text/plain; charset=UTF-8
>
> Hi, I extracted a small dataset that shows the problem.
> you can download it from here:
> https://docs.google.com/file/d/0B6o29pwkWoERdnFSaUtMNDljemc/edit?usp=sharing
>
> I modified mdb_copy.c to demonstrate the difference. copy it to source dir
> from here
> https://docs.google.com/file/d/0B6o29pwkWoERd3VuUm1DN0FpcUU/edit?usp=sharing
>
> build and
> run "time ./mdb_copy foo foo2"
> after this change the flag at line 64 and run it again.
> at my computer the difference is 17s vs 1.7s for 3 million items.
This test doesn't prove the existence of a bug. You're running on a
Little-Endian machine, therefore data that is in sorted order as a string is
in hashed order when used as an integer. Your data insert turns into a
worst-case insert order in this case, causing the worst possible random access
strides through memory. Assuming the two orders to be equivalent is a pretty
common mistake for DB programmers. Microsoft has done the same thing in
ActiveDirectory, I mentioned it here a few years ago
http://www.openldap.org/lists/openldap-devel/200711/msg00002.html
If you had run this test on a Big-Endian machine, like SPARC, the insert order
would be identical either way, and INTEGERKEY result would have been faster.
Closing this ITS, no bug.
>
>
>
> On Tue, Aug 20, 2013 at 9:04 PM, Quanah Gibson-Mount <quanah(a)zimbra.com>wrote:
>
>> --On Sunday, August 18, 2013 11:46 AM +0000 romange(a)gmail.com wrote:
>>
>> Full_Name: Roman Gershman
>>> Version:
>>> OS: linux 3.8.0-25-generic
>>> URL:
>>> Submission from: (NULL) (212.150.97.210)
>>>
>>
>> Please provide further information, specifically:
>>
>> The size of values
>> Insert order
>> Sample code if possible
>>
>> Thanks,
>> Quanah
>>
>>
>> --
>>
>> Quanah Gibson-Mount
>> Lead Engineer
>> Zimbra, Inc
>> --------------------
>> Zimbra :: the leader in open source messaging and collaboration
>>
>
>
>
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
10 years, 1 month