Re: openldap.git branch mdb updated. de5b605fef750992b586a44607b1a6c612d5bf6f
by Howard Chu
> - Log -----------------------------------------------------------------
> commit de5b605fef750992b586a44607b1a6c612d5bf6f
> Author: Howard Chu<highlandsun(a)gmail.com>
> Date: Sun Aug 28 04:06:03 2011 -0700
>
> Resync
>
> commit 762c9e432f5e1694e4ad2781ca95b5cc50e7886c
> Author: Howard Chu<highlandsun(a)gmail.com>
> Date: Sun Aug 28 04:04:09 2011 -0700
>
> bump mdb maxsize up to 32M to pass test060
With indexing enabled, my test DB grew to 15MB running test060. Prior to
enabling indexing it was well under the 10MB default. We might want to raise
the default from 10MB, but it's hard to imagine that whatever value we pick
will be useful in real life. Any site is going to need to tweak this anyway.
As of now back-mdb passes all of "make test". I still have other features I
plan to change, but it's fully functional now if you feel like beating on it.
>
> Note in slapd-mdb(5) that setting a huge size is desirable.
>
> -----------------------------------------------------------------------
>
> Summary of changes:
> doc/man/man5/slapd-mdb.5 | 5 +++++
> servers/slapd/back-mdb/libmdb | 2 +-
> tests/data/slapd.conf | 1 +
> 3 files changed, 7 insertions(+), 1 deletions(-)
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
12 years, 2 months
Re: openldap.git branch mdb updated. 5d600468011aaf773cf4cd7ed13a213d3e2e72ca
by Howard Chu
At this point the libmdb library code is behaving pretty well, and the
back-mdb is code-complete. It is not yet passing all of make test, but it does
pass the majority of tests so far. As expected, it delivers higher throughput
than back-bdb/hdb under heavy load, with a smaller memory footprint, and
nothing to tune. So far so good.
Remaining bugs revealed by the test suite need to be tracked down, then some
profiling/optimization work should be done. The current code is already fast,
but it is far from optimal.
TODO:
* the mdb library doesn't coordinate cursors with write operations. (So
performing a write invalidates any open cursors in the same txn.) This needs
to be fixed.
* cursors only support read operations. The back-mdb indexer would benefit
from being able to use put/del on a cursor, as back-bdb does. (And the
back-bdb/hdb code needs to be revisited; it closes/re-opens cursors very often
when it probably should just open once and keep using till an operation is
done. Of course this has to be balanced against the desire to keep BDB
resources locked for as little time as possible. Fortunately since MDB has no
locks the same worries don't apply.)
* entry_encode/decode re-work - currently back-mdb uses the same encoder as
back-bdb/hdb. I plan to replace this with an encoding that uses a separate
database of attributeDesc to integer mappings. This will reduce the overall DB
size and will accelerate decoding. I'm backing away from the idea of a fully
persistent DB with no encoding/decoding at all; maybe will try this further
down the road. (The full in-memory structures are quite large, and DB size
will increase dramatically. But this might be OK overall; or we might look at
shrinking slapd's Entry structure etc. - quite a huge amount of grunt work
involved.)
openldap-commit2devel(a)OpenLDAP.org wrote:
> A ref change was pushed to the OpenLDAP (openldap.git) repository.
> It will be available in the public mirror shortly.
>
> The branch, mdb has been updated
> via 5d600468011aaf773cf4cd7ed13a213d3e2e72ca (commit)
> via 8a6b9ea1a3274ee0525a060821db45ac5725861d (commit)
> from 06d590fffe107ac75e59df83516501ee1f5e46e6 (commit)
>
> Those revisions listed above that are new to this repository have
> not appeared on any other notification email; so we list those
> revisions in full, below.
>
> - Log -----------------------------------------------------------------
> commit 5d600468011aaf773cf4cd7ed13a213d3e2e72ca
> Author: Howard Chu<hyc(a)openldap.org>
> Date: Fri Aug 26 01:24:06 2011 -0700
>
> fix opinfo
>
> commit 8a6b9ea1a3274ee0525a060821db45ac5725861d
> Author: Howard Chu<hyc(a)openldap.org>
> Date: Fri Aug 26 01:18:49 2011 -0700
>
> Fix mdb_entry_get
>
> -----------------------------------------------------------------------
>
> Summary of changes:
> servers/slapd/back-mdb/bind.c | 10 ++--------
> servers/slapd/back-mdb/compare.c | 10 ++--------
> servers/slapd/back-mdb/id2entry.c | 4 +++-
> servers/slapd/back-mdb/operational.c | 10 ++--------
> servers/slapd/back-mdb/referral.c | 10 ++--------
> servers/slapd/back-mdb/search.c | 10 ++--------
> 6 files changed, 13 insertions(+), 41 deletions(-)
>
>
> ---
> http://www.openldap.org/devel/gitweb.cgi?p=openldap.git
>
>
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
12 years, 3 months
Re: openldap.git branch mdb updated. a4c38efe297e6b40f968c0ec45f1008ec338c80a
by Howard Chu
openldap-commit2devel(a)OpenLDAP.org wrote:
> A ref change was pushed to the OpenLDAP (openldap.git) repository.
> It will be available in the public mirror shortly.
>
> The branch, mdb has been updated
> via a4c38efe297e6b40f968c0ec45f1008ec338c80a (commit)
> from 874fe18ad97511c8564f4e6930c5df1f0aacfa7e (commit)
I've now got rudimentary slapadd/slapcat support built around MDB. This is
just with dn2id and id2entry, no attribute indexing supported yet. Comparing
to back-hdb on ada. Loading a 250K entry database, 147MB LDIF file
MDB time to slapadd was real 0m34.345s user 0m25.366s sys 0m2.492s
BDB time to slapadd was real 0m42.284s user 0m35.646s sys 0m2.424s
MDB uses significantly less CPU time and is significantly faster overall. The
resulting MDB database was around 561MB. The resulting BDB database was around
436MB, with another ~460MB or so occupied on disk by the BDB cache files.
Repeating the same test using a 1M entry database, 589MB LDIF file
MDB time to slapadd was real 2m8.615s user 1m42.074s sys 0m5.504s
BDB time to slapadd was real 2m49.811s user 2m41.874s sys 0m8.969s
The MDB database size was 2.2GB. The BDB database size was about 1.7GB plus
again as much for its BDB cache so total 3.5GB on disk.
All of these tests were run with slapadd -q, so BDB was only doing database
writes, no transaction logging. With fully synchronous writes, MDB is about
twice as slow as BDB. I'm not sure there's much we can do about this, since
generally any MDB write dirties more pages than a single BDB write.
At the moment back-mdb is functionally identical to back-hdb - all of the
dn2id layout is the same, etc. so this is a pretty apples-to-apples comparison.
The mdb library itself is now pretty much complete; at least I believe it
supports everything that back-mdb needs. I'll be porting the rest of the
back-mdb code over now.
> Those revisions listed above that are new to this repository have
> not appeared on any other notification email; so we list those
> revisions in full, below.
>
> - Log -----------------------------------------------------------------
> commit a4c38efe297e6b40f968c0ec45f1008ec338c80a
> Author: Howard Chu<highlandsun(a)gmail.com>
> Date: Fri Aug 19 18:20:06 2011 -0700
>
> Fix config typo, tweak slapadd -q
>
> -----------------------------------------------------------------------
>
> Summary of changes:
> servers/slapd/back-mdb/config.c | 4 ++--
> servers/slapd/back-mdb/libmdb | 2 +-
> servers/slapd/back-mdb/tools.c | 4 ++--
> 3 files changed, 5 insertions(+), 5 deletions(-)
>
>
> ---
> http://www.openldap.org/devel/gitweb.cgi?p=openldap.git
>
>
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
12 years, 3 months
No symlinks in Git please (was: openldap.git branch mdb created.) 227e6976db20f424d4f6abda2b73bfa53034a714
by Hallvard B Furuseth
I think we should not have symlinks in Git. I does strange things with
them. In particular on a system without symlinks, but also in my
checkout now:
$ git checkout mdb
$ cd servers/slapd/back-mdb/
$ ls -l mdb.c libmdb/mdb.c
ls: cannot access libmdb/mdb.c: No such file or directory
lrwxrwxrwx 1 hbf usit 12 2011-08-17 13:26 mdb.c -> libmdb/mdb.c
$ gitk --all
...but this does show the mdb source.
I suggest:
- treat the 'mdb' branch as a throw-away branch, instead of
merging it into master,
- let Makefile make the symlinks, as in libraries/liblunicode,
- make some Git hook which rejects commits with symlinks.
--
Hallvard
12 years, 3 months
B-tree code
by Howard Chu
Hi Martin,
Just thought you'd like to know about a project I've been working on for a
couple months. My current code started with your append-only B-tree source.
It's just about in usable shape now
https://gitorious.org/mdb .
Also I'll be presenting details at the LDAPCon in Hedelberg this October.
http://www.daasi.de/ldapcon2011/index.php?site=program
I started with your code, and removed the page cache. Instead the entire DB is
accessed thru a read-only mmap region. As such, there is no longer any cache
management at the DB level (it's all done by the OS/VM). I also removed the
prefix-compression logic, because it made rebalancing/merging unreliable. The
mmap approach avoids a ton of malloc/memcpy overhead. It also makes overflow
pages quite cheap to manage.
Instead of writing a new meta-page at the tail of the file, I ping-pong
between two meta pages at the head of the file. (Double-buffering.) This
provides most of the MVCC benefits of the append-only approach, but without
the wasted space or the need to search for the most recent meta page.
I also added tracking of outstanding read transactions, and tracking of free
pages. Reader tracking is done without locks; readers are never blocked when
accessing the DB (unless the OS itself is busy servicing page fauits).
This way it can quickly check when a copied page is no longer referenced, and
re-use the pages, so the DB no longer grows without bounds. This completely
removes the need for the compaction logic. Since active data is never
overwritten, the DB can never be corrupted, so no write-ahead logging is
needed, nor any recovery procedures.
I also added several ideas from BerkeleyDB, so that I can drop it into
OpenLDAP more easily. The DB is now a "DB environment" with support for
multiple databases within an environment. This was necessary because I didn't
want to have to manage multiple separate mmap's for multiple little index
databases and other misc. usages. Also the free list is itself a sub-DB in the
environment. I also added support for sorted-duplicate data items for a given
key, which OpenLDAP's back-hdb relies on.
I'm just now getting started adapting our back-hdb code to this mdb library.
It looks like the new backend will be vastly simplified, both in real code and
in configuration, so it will be much friendlier to sysadmins, while at the
same time giving superior performance to BerkeleyDB and excellent reliability.
Of course the code is still pretty raw, and I haven't done any heavy load
testing on it yet, so it remains to be seen how much of the promise is realized.
I was originally targeting a design where the mmap resides at a fixed memory
address. That way slapd can store its entries as-is, instead of flattening
them into a storable structure. There's a hook for a relocation function,
which would be used to relocate an entry if it gets shifted around during
adds/deletes/rebalances. I haven't implemented this yet because I'm not sure
it will actually work well in real use. For slapd it might be OK if all
entries wind up in overflow pages, since those pages aren't touched by tree
balancing activity. But if average entry sizes are small, it would become a
serious hassle.
I'd be interested to hear your comments on this.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
12 years, 3 months