MDB write concurrency
by Howard Chu
If anyone is interested in doing some research...
Instead of enforcing single-writer concurrency, we can investigate
single-committer concurrency. The idea goes something like this:
1) at the beginning of a write txn, record the ID of the last committed txn
2) at commit time, if the last commit txn is unchanged, proceed as usual
3) if last committed has changed, look for write conflicts:
- since every page that was touched in a txn goes onto the freelist, we can
quickly identify which pages have been updated by other commits
- likewise, since we maintain a dirty page list for each txn, we can quickly
identify which pages were updated in the current txn
For every dirty leaf page in the current txn, search for its page ID showing
up in the freelist entries of the intervening committed txns. If the same page
ID is found, then fail the commit.
For dirty branch pages - what steps are required? Due to the copy-on-write
design, every leaf page update causes every superior branch page to be
updated. In the absence of leaf inserts or deletes, all of these branch page
updates are actually non-conflicting. But I'm not sure we can readily
determine this fact - we would need to compare NUMKEYS() between the old page
and the new page, and we don't record the association from old page to new
page. We would have to look inside the committed txn's tree structure to find it.
In addition, for each dirty page, we would need to record a trail of how to
reach that page from the root, so that we can follow these trails when
comparing nodes in the committed txn's tree. Even if all of the leaf page
updates are found to be non-conflicting, we still need find which parent pages
need to be updated to point at them.
It's quite possible that determining if the txn conflicts or not may be so
much work that it overwhelms any potential boost in throughput from the added
write concurrency, especially if it requires so much more state to be
remembered in the progress of a write txn. But we won't know for sure until
someone has modeled the code.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
10 years, 9 months
cn=config attributes, default values
by Howard Chu
openldap-commit2devel(a)OpenLDAP.org wrote:
> - Log -----------------------------------------------------------------
> commit 842d1b5a17d19e17bcc420d972c310a416b2000b
> Author: Howard Chu <hyc(a)openldap.org>
> Date: Sun Aug 19 12:49:02 2012 -0700
>
> Added delete support
>
> -----------------------------------------------------------------------
>
> Summary of changes:
> servers/slapd/back-meta/config.c | 233 ++++++++++++++++++++++++++++++++++++-
> servers/slapd/back-meta/init.c | 2 +
> 2 files changed, 228 insertions(+), 7 deletions(-)
This reminds me, we still don't have a clear policy on how cn=config should
present settings that have their default value. Personally I would prefer that
settings at their default value not be displayed. Unfortunately the semantics
get rather muddled.
Deleting a value should always mean returning it to its default setting. In
the case of back-meta, per-target configuration can be initially inherited
from the base configuration. The question then is, when you've allowed a
target config to take the setting from the base, do you expect future changes
to the base to also change the targets? It's similar to the referential
integrity problem. My feeling is that it's not worth the trouble to maintain
such a thing. Which probably means we should always return all attributes and
values in cn=config all the time, so that all values are explicitly configured.
Other opinions?
> ---
> http://www.openldap.org/devel/gitweb.cgi?p=openldap.git
>
>
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
10 years, 9 months
Re: slapd-meta doesn't continue with multiple uri's
by Howard Chu
masarati(a)aero.polimi.it wrote:
>> By the way, I'm beginning to look at converting back-meta to dynamic
>> config.
>> Did you ever make any start at this?
>
> No, please go ahead. I'm sorry the need to use nested entries is too
> complex for me to deal with based on my current (lack of) time.
OK. The basic framework is in place now, Add and Emit appear to work. I
haven't done Delete yet. If you have any suggestions for sanity-checking the
current code, that would be helpful. Much of it is copy/pasted from slapd-ldap
and slapo-rwm.
I see a few puzzling inconsistencies, like the existence of acl-passwd and
acl-authcDN keywords that don't actually have any functional code behind them.
I would guess they should have been replaced with acl-bind but there's no
implementation of that anywhere either.
Also wondering if the idassert-passthru from back-ldap ought to be added here.
The manpage is quite out of date, it still says to look at slapd-ldap(5) for
the mapping/rewrite docs, but that text was dropped and moved to slapo-rwm(5).
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
10 years, 9 months
MDB microbenchmark
by Howard Chu
Was reading thru Google's leveldb stuff and found their benchmark page
http://leveldb.googlecode.com/svn/trunk/doc/benchmark.html
I adapted their sqlite test driver for MDB, attached.
On my laptop I get:
violino:/home/software/leveldb> ./db_bench_mdb
MDB: version MDB 0.9.0: ("September 1, 2011")
Date: Mon Jul 2 07:17:09 2012
CPU: 4 * Intel(R) Core(TM)2 Extreme CPU Q9300 @ 2.53GHz
CPUCache: 6144 KB
Keys: 16 bytes each
Values: 100 bytes each (50 bytes after compression)
Entries: 1000000
RawSize: 110.6 MB (estimated)
FileSize: 62.9 MB (estimated)
------------------------------------------------
fillseq : 9.740 micros/op; 11.4 MB/s
fillseqsync : 8.182 micros/op; 13.5 MB/s (10000 ops)
fillseqbatch : 0.502 micros/op; 220.5 MB/s
fillrandom : 11.558 micros/op; 9.6 MB/s
fillrandint : 9.593 micros/op; 10.3 MB/s
fillrandibatch : 6.288 micros/op; 15.8 MB/s
fillrandsync : 8.399 micros/op; 13.2 MB/s (10000 ops)
fillrandbatch : 7.206 micros/op; 15.4 MB/s
overwrite : 14.253 micros/op; 7.8 MB/s
overwritebatch : 9.075 micros/op; 12.2 MB/s
readrandom : 0.261 micros/op;
readseq : 0.079 micros/op; 1392.5 MB/s
readreverse : 0.085 micros/op; 1301.9 MB/s
fillrand100K : 106.695 micros/op; 894.0 MB/s (1000 ops)
fillseq100K : 93.626 micros/op; 1018.8 MB/s (1000 ops)
readseq100K : 0.095 micros/op; 1005185.9 MB/s
readrand100K : 0.368 micros/op;
Compared to the leveldb:
violino:/home/software/leveldb> ./db_bench
LevelDB: version 1.5
Date: Mon Jul 2 07:18:35 2012
CPU: 4 * Intel(R) Core(TM)2 Extreme CPU Q9300 @ 2.53GHz
CPUCache: 6144 KB
Keys: 16 bytes each
Values: 100 bytes each (50 bytes after compression)
Entries: 1000000
RawSize: 110.6 MB (estimated)
FileSize: 62.9 MB (estimated)
WARNING: Snappy compression is not enabled
------------------------------------------------
fillseq : 1.752 micros/op; 63.1 MB/s
fillsync : 13.877 micros/op; 8.0 MB/s (1000 ops)
fillrandom : 2.836 micros/op; 39.0 MB/s
overwrite : 3.723 micros/op; 29.7 MB/s
readrandom : 5.390 micros/op; (1000000 of 1000000 found)
readrandom : 4.811 micros/op; (1000000 of 1000000 found)
readseq : 0.228 micros/op; 485.1 MB/s
readreverse : 0.520 micros/op; 212.9 MB/s
compact : 439250.000 micros/op;
readrandom : 3.269 micros/op; (1000000 of 1000000 found)
readseq : 0.197 micros/op; 560.4 MB/s
readreverse : 0.438 micros/op; 252.5 MB/s
fill100K : 504.147 micros/op; 189.2 MB/s (1000 ops)
crc32c : 4.134 micros/op; 944.9 MB/s (4K per op)
snappycomp : 6863.000 micros/op; (snappy failure)
snappyuncomp : 8145.000 micros/op; (snappy failure)
acquireload : 0.439 micros/op; (each op is 1000 loads)
Interestingly enough, MDB wins on one or two write tests. It clearly wins on
all of the read tests. MDB databases don't require compaction, so that's
another win. MDB doesn't do compression, so those tests are disabled.
I haven't duplicated all of the test scenarios described on the web page yet,
you can do that yourself with the attached code. It's pretty clear that
nothing else even begins to approach MDB's read speed.
MDB sequential write speed is dominated by the memcpy's required for
copy-on-write page updates. There's not much that can be done to eliminate
that, besides batching writes. For random writes the memcmp's on the key
comparisons become more of an issue. The fillrandi* tests use an integer key
instead of a string-based key, to show the difference due to key comparison
overhead.
For the synchronous writes, MDB is also faster, because it doesn't need to
synchronously write a transaction logfile.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
10 years, 9 months
Re: RE24 testing call#3 (OpenLDAP 2.4.32)
by Aaron Richton
mdb didn't even handle test000...just in the db_open.
Assertion failed: p != NULL, file ./../../../libraries/libmdb/mdb.c, line 3391
current thread: t@1
[1] __lwp_kill(0x0, 0x6, 0xffffffffffffffe6, 0x0, 0x0, 0x0), at 0x7fffffff7f8a900c
[2] raise(0x6, 0x0, 0xffffffff7fffe180, 0x0, 0x0, 0x0), at 0x7fffffff7f859150
[3] abort(0x4f, 0x0, 0x4f, 0x7efefeff, 0x81010100, 0xff00), at 0x7fffffff7f83eac8
[4] __assert(0x10034bd88, 0x10034bd98, 0xd3f, 0x100909480, 0x100909480, 0x64), at 0x7fffffff7f83edcc
=>[5] mdb_page_get(txn = 0x100b10dc0, pgno = 8589934594U, ret = 0x1009094d8), line 3391 in "mdb.c"
[6] mdb_page_search(mc = 0x100909490, key = 0xffffffff7fffead8, flags = 0), line 3533 in "mdb.c"
[7] mdb_cursor_set(mc = 0x100909490, key = 0xffffffff7fffead8, data = 0xffffffff7fffeac8, op = MDB_SET, exactp = 0xffffffff7fffe9c8), line 3906 in "mdb.c"
[8] mdb_cursor_get(mc = 0x100909490, key = 0xffffffff7fffead8, data = 0xffffffff7fffeac8, op = MDB_SET), line 4100 in "mdb.c"
[9] mdb_ad_read(mdb = 0x1006b3730, txn = 0x100b10dc0), line 549 in "attr.c"
[10] mdb_db_open(be = 0x100694ff0, cr = 0xffffffff7fffee4c), line 231 in "init.c"
[11] backend_startup_one(be = 0x100694ff0, cr = 0xffffffff7fffee4c), line 224 in "backend.c"
[12] backend_startup(be = 0x100694ff0), line 325 in "backend.c"
[13] slap_startup(be = (nil)), line 219 in "init.c"
[14] main(argc = 8, argv = 0xffffffff7ffff298), line 991 in "main.c"
10 years, 10 months