Le 10/03/2014 10:19 PM, Howard Chu a =E9crit :
> jcd(a)tribudubois.net wrote:
>> Full_Name: Jean-Christophe Dubois
>> Version: 2.4.40
>> OS: Linux
>> URL: ftp://ftp.openldap.org/incoming/
>> Submission from: (NULL) (78.235.240.156)
>>
>>
>> In mdb_node_move() csrc is passed to mdb_cassert() at line 7396 when=20
>> it seems it
>> should be cdst (as the operation is on cdst).
>>
>> https://gitorious.org/mdb/mdb/source/56c2c160be19c555e4c42e459c8608ffa=
ae7b150:libraries/liblmdb/mdb.c#L7396=20
>>
>
> Irrelevant. The cursor is only passed to provide an env pointer, and=20
> both cursors point to the same env. Closing this ITS.
Right. It is just that it would be nicer/more logical as it is not clear=20
beforehand what the pointer is passed for.
An who knows what additional thing could be done in mdb_cassert in the=20
future.
JC
>>
>> Patch available at URL below:
>>
>> https://github.com/jcdubois/lmdb/commit/41ed03c4584ac46dc233dcf60f93ad=
db09962093=20
>>
>>
>> JC
>>
>>
>
>
2014-10-03 3:13 GMT+04:00 Howard Chu <hyc(a)symas.com>:
>> commit fc409d89e0d9dde20f612e34c2a463c8a81ea000
>> Author: Leo Yuriev <leo(a)yuriev.ru>
>> Date: 2014-09-20 06:51:04 +0400
>>
>> EXTENSION - lmdb: more usefull info from mdb_stat tool.
>
>
> A bit ambiguous. me_tail_txnid is actually the ID of the oldest reader, n=
ot
> the "last" reader. I'm not convinced of the value of this patch, since yo=
u
> can already view the readers list.
I am agree that "tail" is NOT a best choice.
But the main value of this patch is not to show a txn of oldest
reader, but to show an info about pages usage.
Especially the amount of pages which are "blocked" by oldest (laggard)
reader, and how much pages are actually available.
2014-10-04 0:04 GMT+04:00 =D0=9B=D0=B5=D0=BE=D0=BD=D0=B8=D0=B4 =D0=AE=D1=80=
=D1=8C=D0=B5=D0=B2 <leo(a)yuriev.ru>:
> Fwd: (ITS#7841) high disk utilization
>
> 2014-10-03 3:13 GMT+04:00 Howard Chu <hyc(a)symas.com>:
>>> commit 841059330fd44769e93eb4b937c3ce42654fad6f
>>> Author: Leo Yuriev <leo(a)yuriev.ru>
>>> Date: 2014-09-20 07:16:15 +0400
>>>
>>> BUGFIX - lmdb: lock meta-pages in writemap-mode to avoid unexpect=
ed
>>> write,
>>> before the data pages would be synchronized.
>>>
>>> Without locking the meta-pages may be writen by OS before other
>>> data,
>>> in this case database would be inconsistent.
>>
>>
>> Seems unnecessary. Won't happen by default; could happen with MDB_NOSYNC=
but
>> that risk is already documented.
>
> We are using the combination:
> envflags writemap nosync lifo
> checkpoint 0 1
>
> If the checkpoint is set in seconds, it gives us the assurance
> consistent state database on disk.
> However, without this patch meta-pages can be written by the kernel
> before the data.
>
> In fact, for a full guarantee in case of death slapd process,
> meta-page should be written explicitly.
> But it requires a lot of changes and I do not do that.
>
>>> commit 0c168d0e63ed78d13df3fc8a42f3667335678639
>>> Author: Leo Yuriev <leo(a)yuriev.ru>
>>> Date: 2014-09-20 10:13:28 +0400
>>>
>>> FEATURE - lmdb: MDB_LIFORECLAIM & MDB_COALESCE modes.
>>>
>>> Reclaim FreeDB in LIFO order - this is a main feature.
>>> Also aim to coalesce small FreeDFB records.
>>
>> Will spend more time looking at this closer.
>
> I would be suggested, but do not insist, review this patch on github.
>
>>> commit 8ddd63161aeb2689822d1a8d27385d62e4e341ae
>>> Author: Leo Yuriev <leo(a)yuriev.ru>
>>> Date: 2014-09-19 22:47:19 +0400
>>>
>>> BUGFIX - lmdb: properly sync meta-pages in mdb_sync_env().
>>>
>>> Meta-pages may be updated during data-syncing in mdb_sync_env(),
>>> in this case database would be inconsistent.
>>>
>>> Check-and-retry if lead txn-id changed during flushing data in
>>> mdb_sync_env().
>>
>> Probably could simplify this, just obtain the write mutex unconditionall=
y,
>> then there's no need to loop or retry. But also, this depends on MDB_NOL=
OCK
>> - if that's set, then do no locking at all.
>
> I did so for reasons of performance and less a lock retention time.
>
> Retries will be if there an intensive flow of changes.
> In this case it will be a lot of updated pages, the record which will
> take some time.
>
> However, in subsequent iterations (if a transactions had committed
> while there was a record),
> the modified pages will be much fewer, and the sync will be quick.
>
> Thus (and it was seen in tests) even when a substantial amount of the
> transactions,
> usually only two iterations of the cycle,
> without locking and flow of changes are not suspended.
>
>>> commit 147f41a8110f28456bc32123bde86d47183f9c0a
>>> Author: Leo Yuriev <leo(a)yuriev.ru>
>>> Date: 2014-09-04 16:01:15 +0400
>>>
>>> FEATURE - lmdb: implementation of "checkpoint kbytes".
>>>
>>> Force flush when volume of the changes reached a configurable
>>> threshold.
>>
>>
>> Probably OK. Needs some typographical cleanup. Not sure "syncbytes" is a
>> good name.
>
> Agree.
> I just took the first choice and try to retaining the style.
> Ideas?
>
>>> commit fb82a0b688f4c31313d0790415feda8aaa18651c
>>> Author: Leo Yuriev <leo(a)yuriev.ru>
>>> Date: 2014-09-04 15:18:16 +0400
>>>
>>> CHANGE - lmdb-backend: checkpoint-interval in seconds instead of
>>> minutes.
>>
>>
>> Gratuitous change. We used minutes since the BDB backend uses minutes, a=
nd
>> the intention was to maintain parallel functionality. What's the
>> justification for this change?
>
> As I had wrote above, we are using the combination:
> envflags writemap nosync lifo
> checkpoint 0 1
>
> If the interval is specified in minutes, then it can not be set less
> than one minute.
> But it's too big amount of time to allow lost the updates.
>
> However, setting the synchronization interval of one second,
> we reduce the amount of losses in the event of an accident to an
> acceptable level,
> while the load on the storage system is acceptable even for a large
> flow of updates.
>
> As a result, I have not found a better solution than simply replace
> the minutes by the seconds.
>
>>> commit fc409d89e0d9dde20f612e34c2a463c8a81ea000
>>> Author: Leo Yuriev <leo(a)yuriev.ru>
>>> Date: 2014-09-20 06:51:04 +0400
>>>
>>> EXTENSION - lmdb: more usefull info from mdb_stat tool.
>>
>>
>> A bit ambiguous. me_tail_txnid is actually the ID of the oldest reader, =
not
>> the "last" reader. I'm not convinced of the value of this patch, since y=
ou
>> can already view the readers list.
>
> I am agree then "tail" is a best choice.
> But the main value of this patch is not to show a txn of oldest
> reader, but to show an info about pages usage.
> Especially the amount of pages which are "blocked" by oldest (laggard)
> reader, and how much pages are actually available.
>
>> --
>> -- Howard Chu
>> CTO, Symas Corp. http://www.symas.com
>> Director, Highland Sun http://highlandsun.com/hyc/
>> Chief Architect, OpenLDAP http://www.openldap.org/project/
>
> Thank you in advance.
> BR.
> Leonid Yuriev.
Fwd: (ITS#7841) high disk utilization
2014-10-03 3:13 GMT+04:00 Howard Chu <hyc(a)symas.com>:
>> commit 841059330fd44769e93eb4b937c3ce42654fad6f
>> Author: Leo Yuriev <leo(a)yuriev.ru>
>> Date: 2014-09-20 07:16:15 +0400
>>
>> BUGFIX - lmdb: lock meta-pages in writemap-mode to avoid unexpected
>> write,
>> before the data pages would be synchronized.
>>
>> Without locking the meta-pages may be writen by OS before other
>> data,
>> in this case database would be inconsistent.
>
>
> Seems unnecessary. Won't happen by default; could happen with MDB_NOSYNC but
> that risk is already documented.
We are using the combination:
envflags writemap nosync lifo
checkpoint 0 1
If the checkpoint is set in seconds, it gives us the assurance
consistent state database on disk.
However, without this patch meta-pages can be written by the kernel
before the data.
In fact, for a full guarantee in case of death slapd process,
meta-page should be written explicitly.
But it requires a lot of changes and I do not do that.
>> commit 0c168d0e63ed78d13df3fc8a42f3667335678639
>> Author: Leo Yuriev <leo(a)yuriev.ru>
>> Date: 2014-09-20 10:13:28 +0400
>>
>> FEATURE - lmdb: MDB_LIFORECLAIM & MDB_COALESCE modes.
>>
>> Reclaim FreeDB in LIFO order - this is a main feature.
>> Also aim to coalesce small FreeDFB records.
>
> Will spend more time looking at this closer.
I would be suggested, but do not insist, review this patch on github.
>> commit 8ddd63161aeb2689822d1a8d27385d62e4e341ae
>> Author: Leo Yuriev <leo(a)yuriev.ru>
>> Date: 2014-09-19 22:47:19 +0400
>>
>> BUGFIX - lmdb: properly sync meta-pages in mdb_sync_env().
>>
>> Meta-pages may be updated during data-syncing in mdb_sync_env(),
>> in this case database would be inconsistent.
>>
>> Check-and-retry if lead txn-id changed during flushing data in
>> mdb_sync_env().
>
> Probably could simplify this, just obtain the write mutex unconditionally,
> then there's no need to loop or retry. But also, this depends on MDB_NOLOCK
> - if that's set, then do no locking at all.
I did so for reasons of performance and less a lock retention time.
Retries will be if there an intensive flow of changes.
In this case it will be a lot of updated pages, the record which will
take some time.
However, in subsequent iterations (if a transactions had committed
while there was a record),
the modified pages will be much fewer, and the sync will be quick.
Thus (and it was seen in tests) even when a substantial amount of the
transactions,
usually only two iterations of the cycle,
without locking and flow of changes are not suspended.
>> commit 147f41a8110f28456bc32123bde86d47183f9c0a
>> Author: Leo Yuriev <leo(a)yuriev.ru>
>> Date: 2014-09-04 16:01:15 +0400
>>
>> FEATURE - lmdb: implementation of "checkpoint kbytes".
>>
>> Force flush when volume of the changes reached a configurable
>> threshold.
>
>
> Probably OK. Needs some typographical cleanup. Not sure "syncbytes" is a
> good name.
Agree.
I just took the first choice and try to retaining the style.
Ideas?
>> commit fb82a0b688f4c31313d0790415feda8aaa18651c
>> Author: Leo Yuriev <leo(a)yuriev.ru>
>> Date: 2014-09-04 15:18:16 +0400
>>
>> CHANGE - lmdb-backend: checkpoint-interval in seconds instead of
>> minutes.
>
>
> Gratuitous change. We used minutes since the BDB backend uses minutes, and
> the intention was to maintain parallel functionality. What's the
> justification for this change?
As I had wrote above, we are using the combination:
envflags writemap nosync lifo
checkpoint 0 1
If the interval is specified in minutes, then it can not be set less
than one minute.
But it's too big amount of time to allow lost the updates.
However, setting the synchronization interval of one second,
we reduce the amount of losses in the event of an accident to an
acceptable level,
while the load on the storage system is acceptable even for a large
flow of updates.
As a result, I have not found a better solution than simply replace
the minutes by the seconds.
>> commit fc409d89e0d9dde20f612e34c2a463c8a81ea000
>> Author: Leo Yuriev <leo(a)yuriev.ru>
>> Date: 2014-09-20 06:51:04 +0400
>>
>> EXTENSION - lmdb: more usefull info from mdb_stat tool.
>
>
> A bit ambiguous. me_tail_txnid is actually the ID of the oldest reader, not
> the "last" reader. I'm not convinced of the value of this patch, since you
> can already view the readers list.
I am agree then "tail" is a best choice.
But the main value of this patch is not to show a txn of oldest
reader, but to show an info about pages usage.
Especially the amount of pages which are "blocked" by oldest (laggard)
reader, and how much pages are actually available.
> --
> -- Howard Chu
> CTO, Symas Corp. http://www.symas.com
> Director, Highland Sun http://highlandsun.com/hyc/
> Chief Architect, OpenLDAP http://www.openldap.org/project/
Thank you in advance.
BR.
Leonid Yuriev.
Full_Name: Jean-Christophe Dubois
Version: 2.4.40
OS: Linux
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (78.235.240.156)
in mdb_env_copyfd0() the function fstat() is called but the return value is not
checked.
There is no reason not to check if the system call is successful as the result
is used just after.
Patch available at URL below:
https://github.com/jcduboi2F2Flmdb/commit/fda581a6dd2e56fac4cab2aa872753f6f…
JC
Full_Name: Leonid Yuriev
Version: 2.4.40
OS: RHEL7
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (31.130.36.33)
Solution for: ITS#7841 and "OpenLDAP + LMDB Back-End - request 300719-14-EXO"
When using LMDB as a backend under the heavy load with add/modify/delete
transactions, a huge number of disk writes is generated.
In generally this patchset give a bonus of 10-100 times write-performance at the
cost of consistency on disk in a one second.
1. Adds a configurable LIFO-policy for reclaiming of FreeDB records.
Thus, only a small subset of pages will be updated and re-written on disk
repetitive. This allow storage subsystem to effective combine such disk writes.
As a result write-performance grow up to 100 times in case of write-back cache
or "writemap" mode.
2. Checkpoints with consistency and a second exactness.
It is possible and very useful the following settings, for example:
envflags writemap nosync lifo
checkpoint 0 1
3. Related bugfixes and minor extensions.
--
The attached files is derived from OpenLDAP Software. All of the modifications
to OpenLDAP Software represented in the following patch(es) were developed by
Peter-Service LLC, Moscow, Russia. Peter-Service LLC has not assigned rights
and/or interest in this work to any party. I, Leonid Yuriev am authorized by
Peter-Service LLC, my employer, to release this work under the following terms.
Peter-Service LLC hereby places the following modifications to OpenLDAP Software
(and only these modifications) into the public domain. Hence, these
modifications may be freely used and/or redistributed for any purpose with or
without attribution and/or other notice.
commit 841059330fd44769e93eb4b937c3ce42654fad6f
Author: Leo Yuriev <leo(a)yuriev.ru>
Date: 2014-09-20 07:16:15 +0400
BUGFIX - lmdb: lock meta-pages in writemap-mode to avoid unexpected write,
before the data pages would be synchronized.
Without locking the meta-pages may be writen by OS before other data,
in this case database would be inconsistent.
commit 6240c3350e8bd86337c7e41722cf6a38881f15e7
Author: Leo Yuriev <leo(a)yuriev.ru>
Date: 2014-09-12 01:32:13 +0400
BUGFIX - lmdb: reordering of instructions which update the txn in a
meta-page.
Without "volatile" or memory-barrier compiler may reorder instructions
for update the "mm_txnid" field in meta-page in "writemap" mode.
From the reader's point of view this cause a short
time interval when the transaction is corrupted.
commit accef62de7fe5660f870f4c5da319a2a8098b2fb
Author: Leo Yuriev <leo(a)yuriev.ru>
Date: 14-0-09-21 02:29:50 +0400
BUGFIX - lmdb: 'volatile' to important fields, which
may be updated by readers asynchronously.
Without 'volatile' compiler may eliminate a mdb_find_oldest() calls.
commit bb83e03cf1b8bceee64550229c3becbdd5400680
Author: Leo Yuriev <leo(a)yuriev.ru>
Date: 2014-09-19 20:18:17 +0400
FEATURE - lmdb-backend: support config for 'lifo' and 'coalesce' envflags.
commit 0c168d0e63ed78d13df3fc8a42f3667335678639
Author: Leo Yuriev <leo(a)yuriev.ru>
Date: 202014-09-20 10:13:28 +0400
FEATURE - lmdb: MDB_LIFORECLAIM & MDB_COALESCE modes.
Reclaim FreeDB in LIFO order - this is a main feature.
Also aim to coalesce small FreeDFB records.
commit 8ddd63161aeb2689822d1a8d27385d62e4e341ae
Author: Leo Yuriev <leo(a)yuriev.ru>
Date: 2014-09-19 22:47:19 +0400
BUGFIX - lmdb: properly sync meta-pages in mdb_sync_env().
Meta-pages may be updated during data-syncing in mdb_sync_env(),
in this case database would be inconsistent.
Check-and-retry if lead txn-id changed during flushing data in
mdb_sync_env().
commit 908677f989588d06b9f00620576dea3c5c8675d7
Author: Leo Yuriev <leo(a)yuriev.ru>
Date: 2014-09-04 16:10:05 +0400
FEATURE - lmdb-backend: support for "checkpoint kbytes" config-option.
commit 147f41a8110f28456bc32123bde86d47183f9c0a
Author: Leo Yuriev <leo(a)yuriev.ru>
Date: 2014-09-04 16:01:15 +0400
FEATURE - lmdb: implementation of "checkpoint kbytes".
A0A Force flush when volume of the changes reached a configurable
threshold.
commit fb82a0b688f4c31313d0790415feda8aaa18651c
Author: Leo Yuriev <leo(a)yuriev.ru>
Date: 2014-09-04 15:18:16 +0400
CHANGE - lmdb-backend: checkpoint-interval in seconds instead of minutes.
commit fc409d89e0d9dde20f612e34c2a463c8a81ea000
Author: Leo Yuriev <leo(a)yuriev.ru>
Date: 2014-09-20 06:51:04 +0400
EXTENSION - lmdb: more usefull info from mdb_stat tool.
commit ccc7da690ffbff440643295b945fdf7886f48c97
Author: Leo Yuriev <leo(a)yuriev.ru>
Date: 2014-09-05 00:19:16 +0400
TRIVIA - lmdb: clean testdb-dir while "make test".
--On Friday, October 03, 2014 2:45 AM +0000 engin.lee(a)gmail.com wrote:
> Full_Name: Engin Lee
> Version: LMDB 0.9.14 2014/9/30
> OS: Linux
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (59.124.230.221)
Likely <http://www.openldap.org/its/index.cgi/?findid=7956> ? This was
just fixed in mdb.master
--Quanah
--
Quanah Gibson-Mount
Server Architect
Zimbra, Inc.
--------------------
Zimbra :: the leader in open source messaging and collaboration