back-mdb notes
by Howard Chu
Thought this was an interesting read:
http://www.varnish-cache.org/trac/wiki/ArchitectNotes
Too bad he talks about his approach being "2006 era" programming. In fact the
single-level store is 1964-era, from Multics.
http://en.wikipedia.org/wiki/Single_level_store
I guess they'll have to tweak Henry Spencer's quote ("Those who do not
understand UNIX are condemned to reinvent it, poorly.") to Multics instead...
I've been working on a new "in-memory" B-tree library that operates on an
mmap'd file. It is a copy-on-write design; it supports MVCC and is immune to
corruption and requires no recovery procedure. It is not an append-only
design, since that requires explicit compaction, and also is not amenable to
mmap usage. Also the append-only approach requires total serialization of
write operations, which would be quite poor for throughput.
The current approach simply reserves space for two root node pointers and flip
flops between them. So, multiple writes may be outstanding at once, but
commits are of course serialized; each commit causes the currently unused root
node pointer to become the currently valid root node pointer. Transaction
aborts are pretty much free; there's nothing to rollback. Read transactions
begin by snapshotting the current root pointer and then can run without any
interference from any other operations.
Public commits have been waiting for our official transition to git, but since
that's been going nowhere I will probably start publishing on github.com in
the next couple of weeks. (With St. Patrick's Day right around the corner it
may have to wait a bit.)
Unfortunately I realized that not all application-level caching can be
eliminated - with the hierarchical DB approach, we don't store full entry DNs
in the DB so they still need to be generated in main memory, and they probably
should be cached. But that's a detail to be addressed later; it may well be
that the cost of always constructing them on the fly (no caching) is acceptable.
This backend should perform much better in all aspects (memory, CPU, and I/O
usage) than the current BerkeleyDB code. It eliminates two levels of caching,
entries pulled from the DB require zero decoding, readers require no locks,
writes require no write-ahead-logging overhead. There are only two
configurable parameters (the pathname to the DB file, and the size) so this
will be far simpler for admins.
Potential downside - on a 32 bit machine with only 2GB of addressable memory
the maximum usable DB size is around 1.6GB. On a 64 bit machine, I doubt the
limits will pose any problem. ("64 bits should be enough for anyone...")
re: configuring the size of the DB file - this is most likely not a value that
can be changed on an existing DB. I.e., if you configure a DB and find that
you need to grow it later, you will probably need to slapcat/slapadd it again.
At DB creation time the file is mmap'd with address NULL so that the OS picks
the address, and the address is recorded in the DB. On subsequent opens the
file is mmap'd at the recorded address. If the size is changed, and the
process' address space is already full of other mappings, it may not be
possible to simply grow the mapping at its current address. Since the DB
records contain actual memory pointers based on the region address, any change
in the mapping address would render the DB unusable.
If this restriction turns out to be too impractical, we may have to resort to
just storing array offsets, but that will then imply a decoding phase and the
re-introduction of entry caching, which I really really want to avoid.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
12 years, 6 months
slapo-chain back-config vs. slapd.conf
by Ralf Haferkamp
While trying to implement back-config delete support for slapo-chain I
stumbled across some inconsistencies in slapo-chain's configuration
routines.
When using slapd.conf it is not possible to configure some settings for
slapo-chain's underlying back-ldap database. E.g. things like
chain "-sizelimit", "-restrict", "-limits" are just rejected.
OTOH when using cn=config slapd will accept all these settings just fine
and writes them to the database. They don't have any effect however.
It would be nice if cn=config and slapd.conf behaved more consistent
here. Either by both rejecting general database options (everything
that's not a specific back-ldap option) for the underlying back-ldap
databases or by correctly applying them.
I tend to think the latter approach could make sense. It would e.g.
allow to define different size and timelimits for chained operations or
would allow to setup a chain-overlay that only chains read operations
(by setting olcReadOnly on the underlying LDAP database).
I have already starting implementing parts of this. But if people think
it does not make much sense it would still be early enough to dump the
code and forget about it :).
Ralf
12 years, 6 months
AttributeDescriptions, OpenLDAP 2.5
by Howard Chu
I plan to replace all AttributeDescription pointer references with array
references instead. It makes back-mdb a lot more portable, and it also saves
us 4 bytes for every attribute on 64 bit machines.
At a microscopic level it imposes some extra overhead on each reference
(instead of a direct pointer deref, you have to do pointer+offset arithmetic)
but on modern CPUs simple address offsets like this come for free anyway.
A change of this nature will necessarily touch the majority of slapd code as
well as 3rd party overlays. Making this switch will mean that patches for RE24
will almost always be incompatible with patches for HEAD/RE25.
Very likely the same change should be made for the rest of the common data
types in slapd. (E.g., ObjectClasses, AttributeTypes, Syntaxes, etc.) But
since AttributeDescriptions are used in every Entry these will have the most
impact on slapd's memory footprint, the others are inconsequential.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
12 years, 6 months
back-ldap assertion failure, LDAP proxy to Windows AD
by Ted Cheng
We encountered a back-ldap assertion failure with the back-ldap as a proxy to a remote Active Directory on Windows 2003 R2. The assertion failure occurred when the slapd server was checking ACLs via the rwm overlay. Snippet of the stack trace:
Thread 1 (Thread 32267):
....
#2 0x0000003c354296e6 in __assert_fail () from /lib64/libc.so.6
#3 0x00002ac57daaf6c1 in ldap_back_dobind_int (lcp=0x42f70170, op=0x42f702f0,
rs=0x42f700a0, sendok=LDAP_BACK_GETCONN, retries=0, dolock=1)
at /home/build/sol-2.4.23.101221/sol24x/ldap24/servers/slapd/back-ldap/bind.c:1389
#4 0x00002ac57daafda0 in ldap_back_dobind (lcp=0x42f70170, op=0x42f702f0,
rs=0x42f700a0, sendok=LDAP_BACK_DONTSEND)
at /home/build/sol-2.4.23.101221/sol24x/ldap24/servers/slapd/back-ldap/bind.c:1572
#5 0x00002ac57daac7a7 in ldap_back_entry_get (op=0x42f702f0, ndn=0x42f701d0,
oc=0x0, at=0x135ad370, rw=0, ent=0x42f70a58)
Analysis of the assertion failure:
The ldap_back_entry_get() function, back-ldap/search.c, is called for ACL entries, via rwm overlay. The function sets op->o_do_not_cache to 1 before calling into ldap_back_dobind():
/* Tell getconn this is a privileged op */
do_not_cache = op->o_do_not_cache;
tag = op->o_tag;
/* do not cache */
op->o_do_not_cache = 1;
/* ldap_back_entry_get() is an entry lookup, so it does not need
* to know what the entry is being looked up for */
op->o_tag = LDAP_REQ_SEARCH;
rc = ldap_back_dobind( &lc, op, &rs, LDAP_BACK_DONTSEND );
The ldap_back_dobind() function calls ldap_back_dobind_int() for bind, back-ldap/bind.c. The following ldap_back_dobind_int() code is destined for assertion failure, if op->o_do_not_cache flag is set and there is no valid binddn and bindcred returned by ldap_back_getconn(). Setting an invalid LDAP URI for the remote AD Windows box is such a case.
ldap_back_dobind_int(…)
{
...
if (sendok & LDAP_BACK_GETCONN) {
…
lc = ldap_back_getconn(op, rs, sendoff, &binddn, &bindcred);
...
}
…
if ( LDAP_BACK_CONN_ISIDASSERT( lc ) ) {
if ( BER_BVISEMPTY( &binddn ) && BER_BVISEMPTY( &bindcred ) ) {
/* if we got here, it shouldn't return result */
rc = ldap_back_is_proxy_authz( op, rs,
LDAP_BACK_DONTSEND, &binddn, &bindcred );
/* ldap_back_is_proxy_authz always returns 0 when op->o_do_not_cache is set, see below */
assert( rc == 1 );------> assertion failure
}
rc = ldap_back_proxy_authz_bind( lc, op, rs, sendok,
&binddn, &bindcred );
…
}
}
When the op->o_do_not_cache flag is set, the ldap_back_is_proxy_authz() function always returns 0.
ldap_back_is_proxy_authz( ... )
{
...
int dobind = 0;
if ( op->o_conn == NULL || op->o_do_not_cache ) {
goto done;
}
...
done:;
return dobind; <--- always returns 0
}
Ted C. Cheng
Symas Corporation
12 years, 7 months
Re: commit: ldap/libraries/liblutil detach.c
by Hallvard B Furuseth
hyc(a)OpenLDAP.org writes:
> Modified Files:
> detach.c 1.25 -> 1.26
> Log Message:
> ITS#6848 Add -w option to wait for DB startup before parent exits
If someone using lutil_detach() for something else than slapd, this will
break their application. I suggest renaming the function and making
lutil_detach() a wrapper with the old behavior. Or move it into slapd,
to make clear that use for anything else is unsupported and subject to
change without notice.
--
Hallvard
12 years, 7 months