Fwd: (ITS#7841) high disk utilization
2014-10-03 3:13 GMT+04:00 Howard Chu hyc@symas.com:
commit 841059330fd44769e93eb4b937c3ce42654fad6f Author: Leo Yuriev leo@yuriev.ru Date: 2014-09-20 07:16:15 +0400
BUGFIX - lmdb: lock meta-pages in writemap-mode to avoid unexpected
write, before the data pages would be synchronized.
Without locking the meta-pages may be writen by OS before other
data, in this case database would be inconsistent.
Seems unnecessary. Won't happen by default; could happen with MDB_NOSYNC but that risk is already documented.
We are using the combination: envflags writemap nosync lifo checkpoint 0 1
If the checkpoint is set in seconds, it gives us the assurance consistent state database on disk. However, without this patch meta-pages can be written by the kernel before the data.
In fact, for a full guarantee in case of death slapd process, meta-page should be written explicitly. But it requires a lot of changes and I do not do that.
commit 0c168d0e63ed78d13df3fc8a42f3667335678639 Author: Leo Yuriev leo@yuriev.ru Date: 2014-09-20 10:13:28 +0400
FEATURE - lmdb: MDB_LIFORECLAIM & MDB_COALESCE modes. Reclaim FreeDB in LIFO order - this is a main feature. Also aim to coalesce small FreeDFB records.
Will spend more time looking at this closer.
I would be suggested, but do not insist, review this patch on github.
commit 8ddd63161aeb2689822d1a8d27385d62e4e341ae Author: Leo Yuriev leo@yuriev.ru Date: 2014-09-19 22:47:19 +0400
BUGFIX - lmdb: properly sync meta-pages in mdb_sync_env(). Meta-pages may be updated during data-syncing in mdb_sync_env(), in this case database would be inconsistent. Check-and-retry if lead txn-id changed during flushing data in
mdb_sync_env().
Probably could simplify this, just obtain the write mutex unconditionally, then there's no need to loop or retry. But also, this depends on MDB_NOLOCK
- if that's set, then do no locking at all.
I did so for reasons of performance and less a lock retention time.
Retries will be if there an intensive flow of changes. In this case it will be a lot of updated pages, the record which will take some time.
However, in subsequent iterations (if a transactions had committed while there was a record), the modified pages will be much fewer, and the sync will be quick.
Thus (and it was seen in tests) even when a substantial amount of the transactions, usually only two iterations of the cycle, without locking and flow of changes are not suspended.
commit 147f41a8110f28456bc32123bde86d47183f9c0a Author: Leo Yuriev leo@yuriev.ru Date: 2014-09-04 16:01:15 +0400
FEATURE - lmdb: implementation of "checkpoint kbytes". Force flush when volume of the changes reached a configurable
threshold.
Probably OK. Needs some typographical cleanup. Not sure "syncbytes" is a good name.
Agree. I just took the first choice and try to retaining the style. Ideas?
commit fb82a0b688f4c31313d0790415feda8aaa18651c Author: Leo Yuriev leo@yuriev.ru Date: 2014-09-04 15:18:16 +0400
CHANGE - lmdb-backend: checkpoint-interval in seconds instead of
minutes.
Gratuitous change. We used minutes since the BDB backend uses minutes, and the intention was to maintain parallel functionality. What's the justification for this change?
As I had wrote above, we are using the combination: envflags writemap nosync lifo checkpoint 0 1
If the interval is specified in minutes, then it can not be set less than one minute. But it's too big amount of time to allow lost the updates.
However, setting the synchronization interval of one second, we reduce the amount of losses in the event of an accident to an acceptable level, while the load on the storage system is acceptable even for a large flow of updates.
As a result, I have not found a better solution than simply replace the minutes by the seconds.
commit fc409d89e0d9dde20f612e34c2a463c8a81ea000 Author: Leo Yuriev leo@yuriev.ru Date: 2014-09-20 06:51:04 +0400
EXTENSION - lmdb: more usefull info from mdb_stat tool.
A bit ambiguous. me_tail_txnid is actually the ID of the oldest reader, not the "last" reader. I'm not convinced of the value of this patch, since you can already view the readers list.
I am agree then "tail" is a best choice. But the main value of this patch is not to show a txn of oldest reader, but to show an info about pages usage. Especially the amount of pages which are "blocked" by oldest (laggard) reader, and how much pages are actually available.
-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Thank you in advance. BR. Leonid Yuriev.