openldap-devel November 2008

openldap-devel@openldap.org

20 participants
40 discussions

contextCSN of subordinate syncrepl DBs
by Rein Tollevik 22 Nov '09

22 Nov '09

I've been trying to figure out why syncrepl used on a backend that is subordinate to a glue database with the syncprov overlay should save the contextCSN in the suffix of the glue database rather than the suffix of the backend where syncrepl is used. But all I come up with are reasons why this should not be the case. So, unless anyone can enlighten me as to what I'm missing, I suggest that this be changed. The problem with the current design is that it makes it impossible to reliably replicate more than one subordinate db from the same remote server, as there are now race conditions where one of the subordinate backends could save an updated contextCSN value that is picked up by the other before it has finished its synchronization. An example of a configuration where more than one subordinate db replicated from the same server might be necessary is the central master described in my previous posting in http://www.openldap.org/lists/openldap-devel/200806/msg00041.html My idea as to how this race condition could be verified was to add enough entries to one of the backends (while the consumer was stopped) to make it possible to restart the consumer after the first backend had saved the updated contextCSN but before the second has finished its synchronization. But I was able to produce it by simply add or delete of an entry in one of the backends before starting the consumer. Far to often was the backend without any changes able to pick up and save the updated contextCSN from the producer before syncrepl on the second backend fetched its initial value. I.e it started with an updated contextCSN and didn't receive the changes that had taken place on the producer. If syncrepl stored the values in the suffix of their own database then they wouldn't interfere with each other like this. There is a similar problem in syncprov, as it must use the lowest contextCSN value (with a given sid) saved by the syncrepl backends configured within the subtree where syncprov is used. But to do that it also needs to distinguish the contextCSN values of each syncrepl backend, which it can't do when they all save them in the glue suffix. This also implies that syncprov must ignore contextCSN updates from syncrepl until all syncrepl backends has saved a value, and that syncprov on the provider must send newCookie sync info messages when it updates its contextCSN value when the changed entry isn't being replicated to a consumer. I.e as outlined in the message referred to above. Neither of these changes should interfere with ordinary multi-master configurations where syncrepl and syncprov are both use on the same (glue) database. I'll volunteer to implement and test the necessary changes if this is the right solution. But to know whether my analysis is correct or not I need feedback. So, comments please? -- Rein Tollevik Basefarm AS

2 6

dITStructureRules/nameForms in subschema subentry for informational purpose
by Michael Ströder 21 Mar '09

21 Mar '09

HI! Discussed this very briefly with Howard at LDAPcon 2007 based on an idea of Steve: Support for dITStructureRules and nameForms is still in OpenLDAP's TODO. In the meanwhile slapd could accept definitions for both in slapd.conf and simply pass them on to a schema-aware LDAP client for informational purpose without enforcing them. Same function like rootDSE <file> in slapd.conf. Opinions? Ciao, Michael. -- Michael Ströder E-Mail: michael(a)stroeder.com http://www.stroeder.com

2 4

slapadd performance degradation from 2.3.43 to 2.4.12
by Ralf Haferkamp 26 Nov '08

26 Nov '08

Hi, While doing a few slapadd testruns comparing the RE23 with the RE24 version I ran into a strange issue. I ran test with different LDIFs (100k, 500k and 1000k Entries) and especially with the 500k and 1000k LDIF, slapadd from 2.3.43 was significantly faster than the 2.4.12 version. 2.3.43 loaded the 500k database in 13m54s while it took 33m49s with 2.4.12. The 1000k testcase to 29m41s on 2.3 (still faster than the 500k on 2.4). I didn't finish the 2.4 run with the 1000k database I stopped it after about an hour. I used exactly the same configuration on exactly the same hardware/os for the tests (a HP Proliant DL 580 G3 with 4 3.33GHz Xeons, 8GB of RAM, SLES10-SP2, ext3 filesystem). BerkelyDB Version was 4.5.20 with the following DB_CONFIG: set_cachesize 0 4294967295 1 set_lg_regionmax 262144 set_lg_bsize 2097152 cachesize in slapd.conf was large enough to hold the entire database and tool- threads was set to 4. I did some profiling (with valgrinds callgrind tool) to find out where all the time is spend and it revealed that 2.4 spend a significantly larger amount of systime in the pwrite() function than 2.3. Most of that seemed to come from the bdb_tool_trickle_task() that calls libdb's memp_trickle() function. Just to verify this I ran a testbuild with the trickle_task disabled(). And slapadd's performance was back to a normal level, comparable to the 2.3.43 release. AFAIK the trickle_task() was introduced into 2.4 to increase slapadd throughput but has exactly the opposite effect on my test system. Did anybody else make similar experiences? Or do you see anything thats obviously wrong with my testcases? -- Ralf

4 17

moduleload <statically linked module>
by Hallvard B Furuseth 25 Nov '08

25 Nov '08

Could we make "moduleload <statically linked module>" valid if the module name contains no "." or directory separator? Or introduce a new "module" keyword which works that way? That would simplify slapd.conf scripting, and documentation examples could simply include moduleload and would be correct without any caveats about whether the modules are statically linked. Also, moduleload seems to take an undocumented argument list which is passed to the module init routine. See module.c:module_load(). That ought to work whether or not the module is dynamically loaded. I don't think it does now. -- Hallvard

2 1

OpenLDAP 2.4.x: change of shared library version every release
by Xin LI 24 Nov '08

24 Nov '08

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, I just noticed that 2.4.12 and 2.4.13 both bumped the shared library version. Did we actually changed the API that makes the version bump necessary? Cheers, - -- Xin LI <delphij(a)delphij.net> http://www.delphij.net/ FreeBSD - The Power to Serve! -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (FreeBSD) iEYEARECAAYFAkkrL8EACgkQi+vbBBjt66A08gCfT6uIMZA4x9C+depPBG9NJBhp f4YAn3ilz2KxAZSTZgWnEcwAtPHCZMnr =XmW4 -----END PGP SIGNATURE-----

2 1

OpenLDAP 3.0 Roadmap
by Gavin Henry 24 Nov '08

24 Nov '08

To start the new thread. Thanks. -- Kind Regards, Gavin Henry. OpenLDAP Engineering Team. E ghenry(a)OpenLDAP.org Community developed LDAP software. http://www.openldap.org/project/

1 0

Leak in syncprov_matchops()?
by Pierangelo Masarati 23 Nov '08

23 Nov '08

==30484== 1,371 bytes in 43 blocks are definitely lost in loss record 12 of 13 ==30484== at 0x40053C0: malloc (vg_replace_malloc.c:149) ==30484== by 0x8258EFB: ber_memalloc_x (memory.c:226) ==30484== by 0x80A743C: ch_malloc (ch_malloc.c:54) ==30484== by 0x81747E5: bdb_entry_get (id2entry.c:421) ==30484== by 0x8104832: overlay_entry_get_ov (backover.c:365) ==30484== by 0x81FF53A: syncprov_matchops (syncprov.c:1151) ==30484== by 0x820175E: syncprov_op_mod (syncprov.c:1870) ==30484== by 0x8104F47: overlay_op_walk (backover.c:660) ==30484== by 0x810517C: over_op_func (backover.c:722) ==30484== by 0x81052AA: over_op_delete (backover.c:774) ==30484== by 0x80FA708: syncrepl_entry (syncrepl.c:2283) ==30484== by 0x80F5AC5: do_syncrep2 (syncrepl.c:877) ==30484== by 0x80F72C2: do_syncrepl (syncrepl.c:1301) ==30484== by 0x8086439: connection_read_thread (connection.c:1218) ==30484== by 0x8222E50: ldap_int_thread_pool_wrapper (tpool.c:663) ==30484== by 0xDEB46A: start_thread (in /lib/libpthread-2.5.so) ==30484== by 0x423FDBD: clone (in /lib/libc-2.5.so) This was the result of a concurrency test while testing MMR. Not sure about how to reproduce it, but I've seen lots of deadlock resolutions, so probably there's some exception we're not handling fine with respect to memory allocation (HEAD as of today). p. Ing. Pierangelo Masarati OpenLDAP Core Team SysNet s.r.l. via Dossi, 8 - 27100 Pavia - ITALIA http://www.sys-net.it ----------------------------------- Office: +39 02 23998309 Mobile: +39 333 4963172 Fax: +39 0382 476497 Email: ando(a)sys-net.it -----------------------------------

2 3

valgrind complains about use of uninitialized var
by Pierangelo Masarati 22 Nov '08

22 Nov '08

==30484== Conditional jump or move depends on uninitialised value(s) ==30484== at 0x81740C2: bdb_entry_release (id2entry.c:271) Apparently, boi_flags is not initialized. Where should it be initialized, and how, is still obscure to me (HEAD as of today). p. Ing. Pierangelo Masarati OpenLDAP Core Team SysNet s.r.l. via Dossi, 8 - 27100 Pavia - ITALIA http://www.sys-net.it ----------------------------------- Office: +39 02 23998309 Mobile: +39 333 4963172 Fax: +39 0382 476497 Email: ando(a)sys-net.it -----------------------------------

2 1

Strange issue with contextCSN
by Pierangelo Masarati 22 Nov '08

22 Nov '08

I'm running concurrency tests of MMR, and I see some strange issues: 1) loss of sync after multi-concurrent load (multiple concurrent ops on each server, and modifications to the same data subset on all servers, in order to trigger conflicts). I'm still trying to see if there is any pattern or clue about what failed (like finding some explanation in the logs). This happens once in a while after many operations. I don't expect this to be necessarily a bug; it might be the consequence of conflicts. Of course, it would be nice if slapd allows to clearly identify where the conflict occurred, to support manual resolution. 2) loss of sync after single-concurrent load (multiple concurrent ops on a single server). This is really inesplicable (to me), as there should be no conflict. The only possible explanation I see (but need to investigate further) is that an entry is added on a server, sync'd to another one and, in the meanwhile, deleted on the first one before its own sync gets back. This happens very seldom. 3) whay puzzled me a bit is that when I load a single server, I'd expect to end up with a single contextCSN containing the SID of that server. This is correct for the server I load, but the others, even when they get correctly sync'd, contain a contextCSN for each server in the MMR pool, and the contextCSN with the other SIDs don't get propagated to the server that was loaded. It's not clear why those CSNs are generated, and how they get into the loop and propagate between servers that do not receive direct modifications. 4) another thing that puzzled me a bit is that in some cases, when all servers are loaded, and the contextCSNs are one for each SID and the same in all of the servers, they are sorted randomly, and differently; for example: bash-3.2$ diff -u testrun/server2.out testrun/server3.out --- testrun/server2.out 2008-11-22 17:17:28.000000000 +0100 +++ testrun/server3.out 2008-11-22 17:17:28.000000000 +0100 @@ -2497,8 +2497,8 @@ associatedDomain: example.com entryCSN: 20081122161630.753152Z#000000#001#000000 contextCSN: 20081122161708.935242Z#000000#001#000000 -contextCSN: 20081122161658.195350Z#000000#002#000000 contextCSN: 20081122161658.193983Z#000000#003#000000 +contextCSN: 20081122161658.195350Z#000000#002#000000 Not a big deal (except for the need to sort values to compare them), but I'd expect them to be exactly in the same order... I'm going to pack my suite of tests and put them on ftp.openldap.org (and eventually add them to OpenLDAP's test suite, specifically meant to test MMR), but first I need to polish them a little bit and enucleate those that present issues, in order to open specific ITSes. p. Ing. Pierangelo Masarati OpenLDAP Core Team SysNet s.r.l. via Dossi, 8 - 27100 Pavia - ITALIA http://www.sys-net.it ----------------------------------- Office: +39 02 23998309 Mobile: +39 333 4963172 Fax: +39 0382 476497 Email: ando(a)sys-net.it -----------------------------------

1 0

LDAP_OPT_X_SASL_NOCANON breaks the tests
by Hallvard B Furuseth 20 Nov '08

20 Nov '08

e.g. test000-rootdse: "... Waiting 5 seconds for slapd to start... Could not set LDAP_OPT_X_SASL_NOCANON off" -- Hallvard

2 1

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

openldap-devel November 2008