--On Saturday, March 05, 2011 5:05 AM -0800 Howard Chu <hyc(a)symas.com>
wrote:
I've been working on a new "in-memory" B-tree library
that operates on an
mmap'd file. It is a copy-on-write design; it supports MVCC and is immune
to corruption and requires no recovery procedure. It is not an
append-only design, since that requires explicit compaction, and also is
not amenable to mmap usage. Also the append-only approach requires total
serialization of write operations, which would be quite poor for
throughput.
My experience with back-(bdb/hdb) and syncrepl was the only reliable way to
ensure consistent replication was to use delta-syncrepl which... serializes
write operations. In fact, not forcing serialized writes for
back-(bdb/hdb) was slower than serializing things, because of all the
contention in the database. I understand this may not hold true for
back-mdb, but thought I would note that currently our best performance is
already achieved by serialization, write-wise.
re: configuring the size of the DB file - this is most likely not a
value
that can be changed on an existing DB. I.e., if you configure a DB and
find that you need to grow it later, you will probably need to
slapcat/slapadd it again. At DB creation time the file is mmap'd with
address NULL so that the OS picks the address, and the address is
recorded in the DB. On subsequent opens the file is mmap'd at the
recorded address. If the size is changed, and the process' address space
is already full of other mappings, it may not be possible to simply grow
the mapping at its current address. Since the DB records contain actual
memory pointers based on the region address, any change in the mapping
address would render the DB unusable.
How exactly does the DB file size for back-mdb relate to the existing size
of the database? Do they have to match? I.e., is this more like the
DB_CONFIG cachesize, which can be more or less than the database size, or
are they supposed to be an exact match? We have plenty of customers who
have databases that are certainly not static in size. Particularly if you
are using an accesslog databases for delta-syncrepl or other operations.
--Quanah
--
Quanah Gibson-Mount
Sr. Member of Technical Staff
Zimbra, Inc
A Division of VMware, Inc.
--------------------
Zimbra :: the leader in open source messaging and collaboration