RE24 Testing call #1 (OpenLDAP 2.4.36)
by Quanah Gibson-Mount
Current RE24 is ready for testing for the 2.4.36 release.
Thanks!
--Quanah
--
Quanah Gibson-Mount
Lead Engineer
Zimbra, Inc
--------------------
Zimbra :: the leader in open source messaging and collaboration
9 years, 10 months
Clusters and scaling, again
by Howard Chu
It's been a little over 4 years since we worked with MySQL using their
NDBCluster engine to develop back-ndb, in the hopes of developing a backend
that could leverage horizontal scaling systems. After Oracle acquired MySQL
(via acquisition of Sun) they terminated the project.
NDB had some promise, although reports from the field indicate that folks felt
it was too difficult to configure and deploy. It appears that NDB has not
gained much traction anywhere in the past few years, and with Oracle cutting
off our access to MySQL/NDB developers, back-ndb was effectively killed.
In OpenLDAP 2.5 we'll be trying again, this time using HyperDex.
http://hyperdex.org.
HyperDex has a number of desirable properties - trivial deployment, automatic
self-balancing of data, guaranteed data durability, configurable fault
tolerance... There is an extension for it providing distributed transaction
support as well (which I haven't looked into yet). It's also quite fast, much
faster than any of the other commonly known NoSQL systems out there. It's also
reliable by design, unlike systems like MongoDB (whose only consistency is
that it consistently loses data...)
I've been playing with it off and on for the past several months. I've also
written an LMDB backend for it, available on github.
https://github.com/hyc/HyperDex/tree/lmdb
So using HyperDex for an OpenLDAP backend has some greater appeal, because it
would be OpenLDAP code at both ends of the stack.
If you're interested in working with NoSQL and large scale databases with
OpenLDAP you should definitely take a look at this. I suspect it will be
easier to write a slapd backend for this API than for NDB, particularly since
it's free of the rigid column layouts that NDB/MySQL required.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
9 years, 10 months
Please test back-mdb in main
by Quanah Gibson-Mount
The work on large transactions, nested transactions, and reader tables in
lmdb has come to a point where we'd like to confirm that people do not find
issues when running slapd-mdb before rolling it into RE24.
If people could download and build current master of OpenLDAP and do
whatever tests against slapd-mdb you can think of in addition to the test
suite, particularly across multiple OSes, that would be much appreciated.
Thanks,
Quanah
--
Quanah Gibson-Mount
Lead Engineer
Zimbra, Inc
--------------------
Zimbra :: the leader in open source messaging and collaboration
9 years, 10 months
LMDB dead process detection
by Howard Chu
There's been a long-running discussion about the need to have APIs in liblmdb
for displaying the reader table and clearing out stale slots. Quite a few open
questions on the topic:
1) What should the API look like for examining the table?
My initial instinct is to provide an iterator function that returns info
about the next slot each time it's called. Not sure that this is necessary or
most convenient though.
Another possibility is just a one-shot function that walks the table itself
and dumps the output as a formatted string to stdout, stderr, or a custom
output callback.
2) What should APIs look like for clearing out a stale slot?
Should it just be implicit inside the library, with no externally visible
API? I.e., should the library periodically check on its own, with no outside
intervention? Or should there be an API that lets a user explicitly request a
particular slot to be freed? This latter sounds pretty dangerous, since
freeing a slot that's actually still in use would allow a reader's view of the
DB to be corrupted.
3) What approach should be used for automatic detection of stale slots?
Currently we record the process ID and thread ID of a reader in the table.
It's not clear to me that the thread ID has anything more than informational
value. Since we register a per-thread destructor for slots, exiting threads
should never be leaving stale slots in the first place. I'm also not sure that
there are good APIs for an outside caller to determine the liveness of a given
thread ID.
The process ID is also prone to wraparound; it's still very common for
Linux systems to use 15 bit process IDs. So just checking that a pid is still
alive doesn't guarantee that it's the same process that was using an LMDB
environment at any point in time. We have two main approaches to work around
this latter issue:
A) set a byte range lock for every process attached to the environment.
This is what slapd's alock.c already does, which is used with BDB- and LDBM-
based backends. This is fairly portable code, and has the desirable property
that file locks automatically go away when a process exits. But:
a) On Windows, the OS can take several minutes to clean up the locks of
an exited process. So just checking for presence of a lock could erroneously
consider a process to be alive long after it had actually died.
b) file lock syscalls are fairly slow to execute. If we are checking
liveness frequently, there will be a noticeable performance hit. Their
performance also degrades exponentially with the number of processes locking
concurrently, and degrades further still if networked filesystems are involved.
c) This approach won't tell us if a process is in Zombie state.
B) check process ID and process start time.
This appears to be a fairly reliable approach, and reasonably fast, but there
is no POSIX standard API for obtaining this process information. Methods for
obtaining the info are fairly well documented across a variety of platforms
(AIX, HPUX, multiple BSDs, Linux, Solaris, etc.) but they are all different.
It appears that we can implement this compactly for each of the systems, but
it means carrying around a dozen or so different implementations.
Also, assuming we want to support shared LMDB access across NFS (as discussed
in an earlier thread), it seems we're going to have to use a lock-based
solution anyway, since process IDs won't be meaningful across host boundaries.
We can implement approach (A) fairly easily, with no major repercussions. For
(B) we would need to add a field to the reader table records to store the
process start time. (Thus a lockfile format change.)
(note: performance of fcntl locks vs checking process start time was measured
with some simple code on my laptop running Linux. These functions are all
highly OS-dependent, so the perf ratios may vary quite a lot from system to
system.)
The relative performance may not even be an issue in general, since we would
only need to trigger a scan if a writer actually finds that some reader txn is
preventing it from using free pages from the freeDB. Most of the time this
wouldn't be happening. But if there were a legitimate long running read txn
(e.g., for mdb_env_copy) we may find ourselves checking fairly often.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
9 years, 10 months
BerkeleyDB support EOL?
by Gavin Henry
Hi all,
Hope every one is good?
What's the timescale for dropping bdb and defaulting to lmdb?
Kurt, what ever happened to that Google server?
Thanks.
--
Kind Regards,
Gavin Henry.
Managing Director.
T +44 (0) 1224 279484
M +44 (0) 7930 323266
F +44 (0) 1224 824887
E ghenry(a)suretec.co.uk
Open Source. Open Solutions(tm).
http://www.suretecsystems.com/
Suretec Systems is a limited company registered in Scotland. Registered
number: SC258005. Registered office: 24 Cormack Park, Rothienorman, Inverurie,
Aberdeenshire, AB51 8GL.
Subject to disclaimer at http://www.suretecgroup.com/disclaimer.html
Do you know we have our own VoIP provider called SureVoIP? See
http://www.surevoip.co.uk
Did you see our API? http://www.surevoip.co.uk/api
9 years, 10 months
LMDB proposed changes
by Howard Chu
Summarizing some discussions from IRC...
The hardcoded limit on the size of the dirty page list in a transaction is a
problem, there should not be limits on the effective size of a transaction.
The plan is to change LMDB's disk page format to include the txnID in the page
header. This way, when the dirty page list gets full we can flush it to disk
without losing track of which pages were dirtied. Then if a subsequent access
in the same txn revisits one of these pages, when we read it back from the DB
we'll know that it came from the current txn and doesn't need to be copied
again before making further modifications.
The P_DIRTY bit in the page header will no longer be needed - if the txnID
matches, the page can be used directly. If not, the page is clean and a new
page must be allocated before writing.
For WRITEMAP mode the dirty page list can be completely eliminated, the only
reason we keep it now is to know which pages' P_DIRTY bit we need to clear at
commit time.
Increasing the size of the page header by 8 bytes is a bit annoying, this will
require a full slapcat/slapadd reload of existing back-mdb databases. It would
be nice if we can avoid this but I don't see how.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
9 years, 10 months
NFS-shared LMDB?
by Howard Chu
It occurs to me that there is the potential to support an interesting use case
with LMDB when the database resides on remote shared storage. In the context
of slapd, you could run multiple read-only slapds concurrent with a single
read-write slapd on a single database.
The current liblmdb would need a couple small modifications to make this safe
- an option to use fcntl(LOCK) when obtaining a reader slot, and an msync()
when writing to a reader slot, to force reader lock table changes back to the
server before progressing on a read txn.
With an appropriate sharding director (like the feature recently added to
back-meta) you could arrange so that each slapd instance serves reads for a
distinct portion of the overall database. Then each host's memory would be
caching a distinct set of data, maximizing cache effectiveness. The DB size
could then grow arbitrarily large, and you simply add more machines/RAM/slapds
as needed to keep serving from cache.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
9 years, 10 months