openldap-devel June 2009

openldap-devel@openldap.org

16 participants
22 discussions

contextCSN of subordinate syncrepl DBs
by Rein Tollevik 22 Nov '09

22 Nov '09

I've been trying to figure out why syncrepl used on a backend that is subordinate to a glue database with the syncprov overlay should save the contextCSN in the suffix of the glue database rather than the suffix of the backend where syncrepl is used. But all I come up with are reasons why this should not be the case. So, unless anyone can enlighten me as to what I'm missing, I suggest that this be changed. The problem with the current design is that it makes it impossible to reliably replicate more than one subordinate db from the same remote server, as there are now race conditions where one of the subordinate backends could save an updated contextCSN value that is picked up by the other before it has finished its synchronization. An example of a configuration where more than one subordinate db replicated from the same server might be necessary is the central master described in my previous posting in http://www.openldap.org/lists/openldap-devel/200806/msg00041.html My idea as to how this race condition could be verified was to add enough entries to one of the backends (while the consumer was stopped) to make it possible to restart the consumer after the first backend had saved the updated contextCSN but before the second has finished its synchronization. But I was able to produce it by simply add or delete of an entry in one of the backends before starting the consumer. Far to often was the backend without any changes able to pick up and save the updated contextCSN from the producer before syncrepl on the second backend fetched its initial value. I.e it started with an updated contextCSN and didn't receive the changes that had taken place on the producer. If syncrepl stored the values in the suffix of their own database then they wouldn't interfere with each other like this. There is a similar problem in syncprov, as it must use the lowest contextCSN value (with a given sid) saved by the syncrepl backends configured within the subtree where syncprov is used. But to do that it also needs to distinguish the contextCSN values of each syncrepl backend, which it can't do when they all save them in the glue suffix. This also implies that syncprov must ignore contextCSN updates from syncrepl until all syncrepl backends has saved a value, and that syncprov on the provider must send newCookie sync info messages when it updates its contextCSN value when the changed entry isn't being replicated to a consumer. I.e as outlined in the message referred to above. Neither of these changes should interfere with ordinary multi-master configurations where syncrepl and syncprov are both use on the same (glue) database. I'll volunteer to implement and test the necessary changes if this is the right solution. But to know whether my analysis is correct or not I need feedback. So, comments please? -- Rein Tollevik Basefarm AS

2 6

contextCSN interaction between syncrepl and syncprov
by Rein Tollevik 17 Nov '09

17 Nov '09

The remaining errors and race condition that test058 demonstrates cannot be solved unless syncrepl is changed to always store the contextCSN in the suffix of the database where it is configured, not the suffix of its glue database as it does today. Assuming serverID 0 is reserved for the single master case, syncrepl and syncprov can in that case only be configured within the same database context if syncprov is a pure forwarding server I.e, it will not update any CSN value and syncrepl have no need to fetch any values from it. In the multi-master case it is only the contextCSN whose SID matches the current serverID that syncprov maintains, the other are all received by syncrepl. So, the only time syncrepl should need an updated CSN from syncprov is when it is about to present it to its peer, i.e when it initiates a refresh phase. Actually, a race condition that would render the state of the database undetermined could occur if syncrepl fetches an updated CSN from syncprov during the initial refresh phase. So, it should be sufficient to read the contextCSN values from the database before a new refresh phase is initiated, independent of whether syncprov is in use or not. Syncrepl will receive updates to the contextCSN value with its own SID from its peers, at least with ITS#5972 and ITS#5973 in place. I.e, the normal ignoring of updates tagged with a too old contextCSN value will continue to work. It should also be safe to ignore all updates tagged with a contextCSN or entryCSN value whose SID is the current servers non-zero serverID, provided a complete refresh cycle is known to have taken place. I.e, when a contextCSN value with the current non-zero serverID was read from the database before the refresh phase started, or after the persistent phase have been entered. The state of the database will be undetermined unless an initial refresh (i.e starting from an empty database or CSN set) have been run to completion. I cannot see how this can be avoided, and as far as I know it is so now too. It might be worth mentioning in the doc. though (unless it already is). Syncprov must continue to monitor the contextCSN updates from syncrepl. When it receives updates destined for the suffix of the database it itself is configured it must replace any CSN value whose SID matches its own non-zero serverID with the value it manages itself (which should be greater or equal to the value syncrepl tried to store unless something is seriously wrong). Updates to "foreign" contextCSN values (i.e those with a SID not matching the current non-zero serverID) should be imported into the set of contextCSN values syncprov itself maintain. Syncprov could also short-circuit the contextCSN update and delay it to its own checkpoint. I'm not sure what effect the checkpoint feature have today when syncrepl constantly updates the contextCSN.. Syncprov must, when syncrepl updates the contextCSN in the suffix of a subordinate DB, update its own knowledge of the "foreign" CSNs to be the *lowest* CSN with any given SID stored in all the subordinate DBs (where syncrepl is configured). And no update must take place unless a contextCSN value have been stored in *all* the syncrepl-enabled subordinate DBs. Any values matching the current non-zero serverID should be updated in this case too, but a new value should probably not be inserted. These changes should (unless I'm completely lost that is..) create a cleaner interface between syncrepl and syncprov without harming the current multi-master configurations, and make asymmetric multiple masters configurations like the one in test058 work. Comments please? Rein

2 4

back-mdb - futures...
by Howard Chu 11 Sep '09

11 Sep '09

Just some thoughts on what I'd like to see in a new memory-based backend... One of the complaints about back-bdb/hdb is the complexity in the tuning; there are a number of different components that need to be balanced against each other and the proper balance point varies depending on data size and workload. One of the directions we were investigating a couple years back was mechanisms for self-tuning of the caches. (This was essentially the thrust of Jong-Hyuk Choi's work with zoned allocs for the back-bdb entry cache; it would allow large chunks of the entry cache to be discarded on demand when system memory pressure increased.) Unfortunately Jong hasn't been active on the project in a while and it doesn't appear that anyone else was tracking that work. Self-tuning is still a goal but it seems to me to be attacking the wrong problem. One of the things that annoys me with the current BerkeleyDB based design is that we have 3 levels of cache operating at all times - filesystem, BDB, and slapd. This means at least 2 memory copy operations to get any piece of data from disk into working memory, and you have to play games with the OS to minimize the waste in the FS cache. (E.g. on Linux, tweak the swappiness setting.) Back in the 80s I spent a lot of time working on the Apollo DOMAIN OS, which was based on the M68K platform. One of their (many) claims to fame was the notion of a single-level store: the processor architecture supported a full 32 bit address space but it was uncommon for systems to have more than 24 bits worth of that populated, and nobody had anywhere near 1GB of disk space on their entire network. As such, every byte of available disk space could be directly mapped to a virtual memory address, and all disk I/O was done thru mmaps and demand paging. As a result, memory management was completely unified and memory usage was extremely efficient. These days you could still take that sort of approach, though on a 32 bit machine a DB limit of 1-2GB may not be so useful any more. However, with the ubiquity of 64 bit machines, the idea becomes quite attractive again. The basic idea is to construct a database that is always mmap'd to a fixed virtual address, and which returns its mmap'd data pages directly to the caller (instead of copying them to a newly allocated buffer). Given a fixed address, it becomes feasible to make the on-disk record format identical to the in-memory format. Today we have to convert from a BER-like encoding into our in-memory format, and while that conversion is fast it still takes up a measurable amount of time. (Which is one reason our slapd entry cache is still so much faster than just using BDB's cache.) So instead of storing offsets into a flattened data record, we store actual pointers (since they all simply reside in the mmap'd space). Using this directly mmap'd approach immediately eliminates the 3 layers of caching and brings it down to 1. As another benefit, the DB would require *zero* cache configuration/tuning - it would be entirely under the control of the OS memory manager, and its resident set size would grow or shrink dynamically without any outside intervention. It's not clear to me that we can modify BDB to operate in this manner. It currently supports mmap access for read-only DBs, but it doesn't map to fixed addresses and still does alloc/copy before returning data to the caller. Also, while BDB development continues, the new development is mainly occurring in areas that don't matter to us (e.g. BDB replication) and the areas we care about (B-tree performance) haven't really changed much in quite a while. I've mentioned B-link trees a few times before on this list; they have much lower lock contention than plain B-trees and thus can support even greater concurrency. I've also mentioned them to the BDB team a few times and as yet they have no plans to implement them. (Here's a good reference: http://www.springerlink.com/content/eurxct8ewt0h3rxm/ ) As such, it seems likely that we would have to write our own DB engine to pursue this path. (Clearly such an engine must still provide full ACID transaction support, so this is a non-trivial undertaking.) Whether and when we embark on this is unclear; this is somewhat of an "ideal" design and as always, "good enough" is the enemy of "perfect" ... This isn't a backend we can simply add to the current slapd source base, so it's probably an OpenLDAP 3.x target: In order to have a completely canonical record on disk, we also need pointers to AttributeDescriptions to be recorded in each entry and those AttributeDescription pointers must also be persistent. Which means that our current AttributeDescription cache must be modified to also allocate its records from a fixed mmap'd region. (And we'll have to include a schema-generation stamp, so that if schema elements are deleted we can force new AD pointers to be looked up when necessary.) (Of course, given the self-contained nature of the AD cache, we can probably modify its behavior in this way without impacting any other slapd code...) There's also a potential risk to leaving all memory management up to the OS - the native memory manager on some OS's (e.g. Windows) is abysmal, and the CLOCK-based cache replacement code we now use in the entry cache is more efficient than the LRU schemes that some older OS versions use. So we may get into this and decide we still need to play games with mlock() etc. to control the cache management. That would be an unfortunate complication, but it would still allow us to do simpler tuning than we currently need. Still, establishing a 1:1 correspondence between virtual memory addresses and disk addresses is a big win for performance, scalability, and reduced complexity (== greater reliability)... (And yes, by the way, we have planning for LDAPCon2009 this September in the works; I imagine the Call For Papers will go out in a week or two. So now's a good time to pull up whatever other ideas you've had in the back of your mind for a while...) -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

5 12

FAQ-O-MATIC: Obsolete information in "How do I add an index after populating the database?"
by Michael Ströder 09 Jul '09

09 Jul '09

This article would need a major overhaul: "How do I add an index after populating the database?" http://www.openldap.org/faq/data/cache/136.html Should I just remove the obsolete text snippets and update it? Ciao, Michael.

5 6

GSSAPI signing/encryption for unsuspectingly applications (its not a bug)
by mikbec 02 Jul '09

02 Jul '09

Patch related to "(ITS#6110) GSSAPI signing/encryption for unsuspectingly applications" is more an enhancement than a bug report. Please have a look at patch on ftp://ftp.openldap.org/incoming/mike-becher-090512.libraries-libldap.patch or ITS report on http://www.openldap.org/its/index.cgi/Incoming?id=6110;selectid=6110 In short that patch: 1) adds call of ldap_gssapi_bind_s() at the beginning of ldap_sasl_interactive_bind_s() which can be turn on or off by an GSSAPI OPTION (manual update of ldap.conf (5) included) to provide GSSAPI signing/encryption for applications that use (and only know) ldap_sasl_interactive_bind_s(), 2) adds the missed implementation of "switch off" functionality of all other GSSAPI OPTIONS. 3) corrects one string length problem in guess_service_principal() and 4) corrects one hostname resolving problem in guess_service_principal(). Sorry for that kind of announcement but I hope now it is on the right mailing list. best regards Mike

5 11

RE24 testing call (2.4.17) round 2
by Quanah Gibson-Mount 01 Jul '09

01 Jul '09

Please test RE24 and report any issues. All known regressions are now believed fixed. Thanks! --Quanah -- Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration

7 13

Re: commit: ldap/servers/slapd/overlays ppolicy.c
by Howard Chu 30 Jun '09

30 Jun '09

hyc(a)OpenLDAP.org wrote: > Update of /repo/OpenLDAP/pkg/ldap/servers/slapd/overlays > > Modified Files: > ppolicy.c 1.125 -> 1.126 > > Log Message: > Fix check_password with {cleartext} passwords As I note in the added comment, the check_password interface kinda sucks here. Instead of making the external module malloc its error message string, we should have passed in a struct berval with a buffer and length(buffer) already set. I'd like to change this for 2.5, it will also simplify ITS#6082. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

1 0

Re: commit: ldap/servers/slapd connection.c
by Howard Chu 28 Jun '09

28 Jun '09

hallvard(a)OpenLDAP.org wrote: > Update of /repo/OpenLDAP/pkg/ldap/servers/slapd > > Modified Files: > connection.c 1.440 -> 1.441 > > Log Message: > Fix Debug format in last commit Oops. That was one of my private testing bits, not intended for commit. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

1 0

Re: RE24 testing call (2.4.17) round 2
by Gavin Henry 28 Jun '09

28 Jun '09

I'm away all this week so will try remotely and report back. Cheers. On 27/06/2009, Quanah Gibson-Mount <quanah(a)zimbra.com> wrote: > Please test RE24 and report any issues. All known regressions are now > believed fixed. Thanks! > > --Quanah > > -- > > Quanah Gibson-Mount > Principal Software Engineer > Zimbra, Inc > -------------------- > Zimbra :: the leader in open source messaging and collaboration > -- Sent from my mobile device http://www.suretecsystems.com/services/openldap/ http://www.suretectelecom.com

1 0

Re: commit: ldap/servers/slapd/back-ldap back-ldap.h bind.c
by Howard Chu 22 Jun '09

22 Jun '09

quanah(a)OpenLDAP.org wrote: > Update of /repo/OpenLDAP/pkg/ldap/servers/slapd/back-ldap > > Modified Files: > Tag: OPENLDAP_REL_ENG_2_4 > back-ldap.h 1.88.2.13 -> 1.88.2.14 > bind.c 1.162.2.21 -> 1.162.2.22 > > Log Message: > ITS#6167 This ITS is still Open/Incoming; was this commit really supposed to be released, or is the ITS just not up to date? > CVS Web URLs: > http://www.openldap.org/devel/cvsweb.cgi/servers/slapd/back-ldap/ > http://www.openldap.org/devel/cvsweb.cgi/servers/slapd/back-ldap/back-ldap.h > http://www.openldap.org/devel/cvsweb.cgi/servers/slapd/back-ldap/bind.c > > Changes are generally available on cvs.openldap.org (and CVSweb) > within 30 minutes of being committed. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

2 1

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

openldap-devel June 2009