LMDB stuff
by Howard Chu
Was chatting with Emmanuel Lecharny (who is currently working on Mavibot for
ApacheDS, an MVCC backend similar to LMDB) and had an interesting realization:
we can avoid the current issue of long-lived reader txns preventing page
reclamation.
The key point is this - currently for any txn, we store in the freeDB a list
of the pageIDs that were freed in that txn (due to being copied or deleted).
All we need is to know that any of these pages has been copied *twice* since
the txn of the outstanding reader. At that point, any such page is so new that
the reader would never have seen it, and if no other readers are outstanding
then the page can be safely reclaimed.
Currently mavibot maintains this revision history in a dedicated Btree (much
like our freeDB but with more info contained). I'm thinking, since we're
already going to add a txnID to every page's page header, we can simply add a
2nd txnID, recording the txnID of the previous change to this page's ancestor.
Then, any page where this prevTxnID is >= the outstanding reader's txnID can
be reclaimed.
Still thinking about the actual implementation of this, it may make more sense
to store the prevTxnID in the freeDB than in each page header. Ideally we want
to be able to grab a chunk of pageIDs unambiguously, instead of having to
iterate thru each page and read its header to determine if it's safe.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
9 years, 1 month
Global modules and cn=config
by Quanah Gibson-Mount
Unfortunately, the current cn=config design makes it essentially impossible
to use global modules. For example, the pw-sha2 global module for adding
addtional hashing schemes cannot be used with cn=config. This is because
the olcPasswordHash value is loaded up when cn=config is bootstrapped,
prior to loading the global module. This means that the value fails sanity
checking, and slapd aborts. See also ITS#7802.
Ideas on how to address this chicken and egg issue welcome. ;)
--Quanah
--
Quanah Gibson-Mount
Architect - Server
Zimbra, Inc.
--------------------
Zimbra :: the leader in open source messaging and collaboration
9 years, 2 months
Indexing revisited
by Howard Chu
A few thoughts occurred to me today about our indexing code:
1) we compute a hash preset for each invocation, crunching the syntax and
matching rule's OID, among other things. (It used to be worse, we used to
recompute this for each individual value, even though it's a constant.)
There's no need to always recompute this on each invocation, we can compute it
once at first usage and reuse that result. It should speed up index
generation, particularly on smaller attribute values. I'm preparing a patch to
test this now.
2) using this precomputed hash, we can drop the syntax, mr, and prefix
arguments from the indexer function signature. That will also speed things up.
3) I note that the 'use' argument is also never used in our indexer
functions. Will drop this as well.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
9 years, 2 months
Compact log format for faster searching?
by Hallvard Breien Furuseth
Have anyone come up with a searchable, compact slapd-log format,
so the logs can be searched quicker than the current verbose format?
That is, the search tool would understand the compact format without
expanding it (fully) first, and only expand the final output.
Our syslog from loglevel "stats" compresses to 1/20 of the original,
but that's no help when we must search the entire uncompressed log.
It's easy to halve the log and still keep it mostly readable: Remove
syslog cruft and write it only when it changes, replace "conn= <op/fd>="
with base-32 "conn.<o/f>op", join up multiple SRCH attr= lines, etc.
Or down to 1/3 of the original in our case by replacing a few common
operations (filters, suffixes, etc), but that quickly makes the result
unreadable without a tool to translate back and forth. So beyond that
something is needed to translate back and forth, or an entirely new,
human-readable, compact format.
BTW, "perl -lne '/uid=xyzzy/i && print' log" is 10-15 times faster than
GNU "grep -i 'uid=xyzzy' log" on my Linux box.
And the initial zcat of a compressed log is faster than the perl/grep.
--
Hallvard
9 years, 2 months