Previously I described problems when using this procedure to bootstrap a mirrormode pair with a large LDIF. I've reproduced it with a much smaller subset of the database, and there must be something wrong with the procedure or a bug.
What I do is:
1) Took an slapcat generated LDIF from a 2.3.x setup 2) Removed all entryCSN and contextCSN lines. 3) Ran "slapadd -S 1 -q -w -l ~/load_noCSN.ldif" on server-1 4) Did a "slapcat > toserver2.ldif" on server-1 5) Started server-1 and let applications create and modify objects. 6) Moved toserver2.ldif to server-2. 7) Ran slapadd -q -l toserver2.ldif on server-2 8) Started server-2
Now - I would expect the objects created on step 5 to appear after a while on server-2. They are not. However, objects created on server-1 after both servers has been started are replicated. I would suspect a problem with the index created with slapadd, (olcLinearIndex is FALSE), so syncrepl only finds entries created via LDAP. But I see no reason why. ldapsearch can read the entries.
/Peter
Peter Mogensen wrote:
What I do is:
- Took an slapcat generated LDIF from a 2.3.x setup
- Removed all entryCSN and contextCSN lines.
- Ran "slapadd -S 1 -q -w -l ~/load_noCSN.ldif" on server-1
- Did a "slapcat > toserver2.ldif" on server-1
- Started server-1 and let applications create and modify objects.
- Moved toserver2.ldif to server-2.
- Ran slapadd -q -l toserver2.ldif on server-2
- Started server-2
Now - I would expect the objects created on step 5 to appear after a while on server-2. They are not.
I have discovered, that after load on server-1, the first thing I usually to is to modify the userPassword attribute of the database root object. userPassword is not used by syncrepl, only by an client application. syncrepl uses cn=config which is rootdn in both cn=config and the database.
This of course changes the entryCSN of the root object: Before the modification: entryCSN: 20091202100930.380800Z#000000#001#000000 contextCSN: 20091202100930.696427Z#000000#001#000000
After the modification: contextCSN: 20091202100930.696427Z#000000#001#000000 userPassword:: BASE64 entryCSN: 20091202101058.188515Z#000000#001#000000 modifiersName: cn=config modifyTimestamp: 20091202101058Z
What surprises me is that contextCSN does not change. Shouldn't contextCSn change when entryCSN of the root object changes?
/Peter
Peter Mogensen wrote:
This of course changes the entryCSN of the root object: Before the modification: entryCSN: 20091202100930.380800Z#000000#001#000000 contextCSN: 20091202100930.696427Z#000000#001#000000
After the modification: contextCSN: 20091202100930.696427Z#000000#001#000000 userPassword:: BASE64 entryCSN: 20091202101058.188515Z#000000#001#000000 modifiersName: cn=config modifyTimestamp: 20091202101058Z
What surprises me is that contextCSN does not change. Shouldn't contextCSn change when entryCSN of the root object changes?
Ahh.. this is of course because the above is output from slapcat and contextCSN has not been synced to disk. ldapsearch gives the updated contextCSN.
However, when I start server-2 the updated contextCSN is replicated to server-2, but the entryCSN of the changed root object is NOT! and neither is any changes.
Like this (not the same values as above):
slapcat Server-1 after modification: contextCSN: 20091202104225.793759Z#000000#001#000000 userPassWord:: NEW_VALUE entryCSN: 20091202104340.501585Z#000000#001#000000
ldapsearch for contextCSN on Server-1 after modification: contextCSN: 20091202104340.501585Z#000000#001#000000
Start server-2 and slapcat: userPassword:: OLD_VALUE entryCSN: 20091202104225.457092Z#000000#001#000000 contextCSN: 20091202104340.501585Z#000000#001#000000
So changes made on server-1 before server-2 is started does still not propagate to server-2.
/Peter
Peter Mogensen wrote:
What I do is:
- Took an slapcat generated LDIF from a 2.3.x setup
- Removed all entryCSN and contextCSN lines.
- Ran "slapadd -S 1 -q -w -l ~/load_noCSN.ldif" on server-1
- Did a "slapcat > toserver2.ldif" on server-1
- Started server-1 and let applications create and modify objects.
- Moved toserver2.ldif to server-2.
- Ran slapadd -q -l toserver2.ldif on server-2
- Started server-2
Now - I would expect the objects created on step 5 to appear after a while on server-2. They are not. However, objects created on server-1 after both servers has been started are replicated.
Could someone confirm this procedure as being right or wrong?
If it is not meant to work I'll stop searching for the problem.
/Peter
Peter Mogensen wrote:
Peter Mogensen wrote:
What I do is:
- Took an slapcat generated LDIF from a 2.3.x setup
- Removed all entryCSN and contextCSN lines.
- Ran "slapadd -S 1 -q -w -l ~/load_noCSN.ldif" on server-1
- Did a "slapcat > toserver2.ldif" on server-1
- Started server-1 and let applications create and modify objects.
- Moved toserver2.ldif to server-2.
- Ran slapadd -q -l toserver2.ldif on server-2
- Started server-2
Now - I would expect the objects created on step 5 to appear after a while on server-2. They are not. However, objects created on server-1 after both servers has been started are replicated.
Could someone confirm this procedure as being right or wrong?
If it is not meant to work I'll stop searching for the problem.
More info: If I skip step 6/7 and let server-2 start with an empty database, it DOES actually replicate all the data from server-1, including changes made in step 5 and before step 8 (start of server-2).
So the problem must be in the way server-2 i loaded in step 7 or the way it's started.
Advice needed.
/Peter
Peter Mogensen wrote:
Peter Mogensen wrote:
Peter Mogensen wrote:
What I do is:
- Took an slapcat generated LDIF from a 2.3.x setup
- Removed all entryCSN and contextCSN lines.
- Ran "slapadd -S 1 -q -w -l ~/load_noCSN.ldif" on server-1
- Did a "slapcat > toserver2.ldif" on server-1
- Started server-1 and let applications create and modify objects.
- Moved toserver2.ldif to server-2.
- Ran slapadd -q -l toserver2.ldif on server-2
- Started server-2
Now - I would expect the objects created on step 5 to appear after a while on server-2. They are not. However, objects created on server-1 after both servers has been started are replicated.
Could someone confirm this procedure as being right or wrong?
If it is not meant to work I'll stop searching for the problem.
More info: If I skip step 6/7 and let server-2 start with an empty database, it DOES actually replicate all the data from server-1, including changes made in step 5 and before step 8 (start of server-2).
However... It's only the initial replication which works. If I start server-2 empty, it will catch up with server-1, but I then stop server-2, make a modification on server-1 and restart server-2 the change will not propagate to server-2, - but the contextCSN will.
Please tell me I'm doing something wrong. This problem is starting to make me dizzy.
/Peter
In general it seems server-2 won't find changes to server-1 made while server-2 is down... which kinda defeats the purpose of mirrormode.
The only thing I can see in the log, which makes me wonder is entries like:
bdb_index_read: failed (-30989)
Attached is a log-output (None,trace,sync) after stopping, server-2, making a change on server-1 and starting server-2.
/Peter
Dec 4 11:58:54 server1 slapd[25776]: do_bind: SASL/EXTERNAL bind: dn="cn=config" sasl_ssf=0 Dec 4 11:58:54 server1 slapd[25776]: send_ldap_response: msgid=1 tag=97 err=0 Dec 4 11:58:54 server1 slapd[25776]: <== slap_sasl_bind: rc=0 Dec 4 11:58:54 server1 slapd[25776]: connection_get(15): got connid=1011 Dec 4 11:58:54 server1 slapd[25776]: connection_read(15): checking for input on id=1011 Dec 4 11:58:54 server1 slapd[25776]: op tag 0x63, time 1259927934 Dec 4 11:58:54 server1 slapd[25776]: conn=1011 op=1 do_search Dec 4 11:58:54 server1 slapd[25776]: >>> dnPrettyNormal: <dc=app,dc=example,dc=com> Dec 4 11:58:54 server1 slapd[25776]: <<< dnPrettyNormal: <dc=app,dc=example,dc=com>, <dc=app,dc=example,dc=com> Dec 4 11:58:54 server1 slapd[25776]: => get_ctrls Dec 4 11:58:54 server1 slapd[25776]: => get_ctrls: oid="1.3.6.1.4.1.4203.1.9.1.1" (noncritical) Dec 4 11:58:54 server1 slapd[25776]: => get_ctrls: oid="2.16.840.1.113730.3.4.2" (critical) Dec 4 11:58:54 server1 slapd[25776]: <= get_ctrls: n=2 rc=0 err="" Dec 4 11:58:54 server1 slapd[25776]: => bdb_search Dec 4 11:58:54 server1 slapd[25776]: bdb_dn2entry("dc=app,dc=example,dc=com") Dec 4 11:58:54 server1 slapd[25776]: send_ldap_result: conn=1011 op=1 p=3 Dec 4 11:58:54 server1 slapd[25776]: srs csn 20091204114115.348793Z#000000#001#000000 Dec 4 11:58:54 server1 slapd[25776]: log csn 20091204114042.556305Z#000000#001#000000 Dec 4 11:58:54 server1 slapd[25776]: cmp -1, too old Dec 4 11:58:54 server1 slapd[25776]: log csn 20091204114115.348793Z#000000#001#000000 Dec 4 11:58:54 server1 slapd[25776]: cmp 0, too old Dec 4 11:58:54 server1 slapd[25776]: log csn 20091204115801.657341Z#000000#001#000000 Dec 4 11:58:54 server1 slapd[25776]: => bdb_search Dec 4 11:58:54 server1 slapd[25776]: bdb_dn2entry("dc=app,dc=example,dc=com") Dec 4 11:58:54 server1 slapd[25776]: search_candidates: base="dc=app,dc=example,dc=com" (0x00000001) scope=2 Dec 4 11:58:54 server1 slapd[25776]: => bdb_dn2idl("dc=app,dc=example,dc=com") Dec 4 11:58:54 server1 slapd[25776]: => bdb_equality_candidates (entryUUID) Dec 4 11:58:54 server1 slapd[25776]: => key_read Dec 4 11:58:54 server1 slapd[25776]: <= bdb_index_read 1 candidates Dec 4 11:58:54 server1 slapd[25776]: <= bdb_equality_candidates: id=1, first=1, last=1 Dec 4 11:58:54 server1 slapd[25776]: => bdb_presence_candidates (objectClass) Dec 4 11:58:54 server1 slapd[25776]: bdb_search_candidates: id=1 first=1 last=1 Dec 4 11:58:54 server1 slapd[25776]: send_ldap_result: conn=1011 op=1 p=3 Dec 4 11:58:54 server1 slapd[25776]: => bdb_search Dec 4 11:58:54 server1 slapd[25776]: bdb_dn2entry("dc=app,dc=example,dc=com") Dec 4 11:58:54 server1 slapd[25776]: search_candidates: base="dc=app,dc=example,dc=com" (0x00000001) scope=2 Dec 4 11:58:54 server1 slapd[25776]: => bdb_dn2idl("dc=app,dc=example,dc=com") Dec 4 11:58:54 server1 slapd[25776]: => bdb_equality_candidates (entryCSN) Dec 4 11:58:54 server1 slapd[25776]: => key_read Dec 4 11:58:54 server1 slapd[25776]: <= bdb_index_read: failed (-30989) Dec 4 11:58:54 server1 slapd[25776]: <= bdb_equality_candidates: id=0, first=0, last=0 Dec 4 11:58:54 server1 slapd[25776]: bdb_search_candidates: id=0 first=1 last=0 Dec 4 11:58:54 server1 slapd[25776]: bdb_search: no candidates Dec 4 11:58:54 server1 slapd[25776]: send_ldap_result: conn=1011 op=1 p=3 Dec 4 11:58:54 server1 slapd[25776]: => bdb_search Dec 4 11:58:54 server1 slapd[25776]: bdb_dn2entry("dc=app,dc=example,dc=com") Dec 4 11:58:54 server1 slapd[25776]: search_candidates: base="dc=app,dc=example,dc=com" (0x00000001) scope=2 Dec 4 11:58:54 server1 slapd[25776]: => bdb_dn2idl("dc=app,dc=example,dc=com") Dec 4 11:58:54 server1 slapd[25776]: => bdb_inequality_candidates (entryCSN) Dec 4 11:58:54 server1 slapd[25776]: => key_read Dec 4 11:58:54 server1 slapd[25776]: <= bdb_index_read 86 candidates Dec 4 11:58:54 server1 slapd[25776]: <= bdb_inequality_candidates: id=86, first=2, last=87 Dec 4 11:58:54 server1 slapd[25776]: bdb_search_candidates: id=-1 first=2 last=87 Dec 4 11:58:54 server1 slapd[25776]: send_ldap_result: conn=1011 op=1 p=3 Dec 4 11:58:54 server1 slapd[25776]: => bdb_search Dec 4 11:58:54 server1 slapd[25776]: bdb_dn2entry("dc=app,dc=example,dc=com") Dec 4 11:58:54 server1 slapd[25776]: search_candidates: base="dc=app,dc=example,dc=com" (0x00000001) scope=2 Dec 4 11:58:54 server1 slapd[25776]: => bdb_dn2idl("dc=app,dc=example,dc=com") Dec 4 11:58:54 server1 slapd[25776]: => bdb_inequality_candidates (entryCSN) Dec 4 11:58:54 server1 slapd[25776]: => key_read Dec 4 11:58:54 server1 slapd[25776]: <= bdb_index_read 1 candidates Dec 4 11:58:54 server1 slapd[25776]: => key_read Dec 4 11:58:54 server1 slapd[25776]: <= bdb_index_read: failed (-30989) Dec 4 11:58:54 server1 slapd[25776]: <= bdb_inequality_candidates: id=1, first=1, last=1 Dec 4 11:58:54 server1 slapd[25776]: => bdb_presence_candidates (objectClass) Dec 4 11:58:54 server1 slapd[25776]: bdb_search_candidates: id=1 first=1 last=1 Dec 4 11:58:54 server1 slapd[25776]: Entry dc=app,dc=example,dc=com changed by peer, ignored Dec 4 11:58:54 server1 slapd[25776]: send_ldap_result: conn=1011 op=1 p=3 Dec 4 11:58:54 server1 slapd[25776]: syncprov_search_response: cookie=rid=003,sid=001,csn=20091204115801.657341Z#000000#001#000000 Dec 4 11:58:54 server1 slapd[25776]: send_ldap_intermediate: err=0 oid=1.3.6.1.4.1.4203.1.9.1.4 len=64 Dec 4 11:58:54 server1 slapd[25776]: send_ldap_response: msgid=2 tag=121 err=0 Dec 4 11:58:54 server1 slapd[25776]: slap_listener_activate(9): Dec 4 11:58:54 server1 slapd[25776]: >>> slap_listener(ldaps://server1.example.com:636/)
--On Friday, December 04, 2009 1:12 PM +0100 Peter Mogensen apm@mutex.dk wrote:
In general it seems server-2 won't find changes to server-1 made while server-2 is down... which kinda defeats the purpose of mirrormode.
What openldap release are you using for this testing? I've quite lost track.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
Quanah Gibson-Mount wrote:
--On Friday, December 04, 2009 1:12 PM +0100 Peter Mogensen apm@mutex.dk wrote:
In general it seems server-2 won't find changes to server-1 made while server-2 is down... which kinda defeats the purpose of mirrormode.
What openldap release are you using for this testing? I've quite lost track.
2.4.20 (on debian Lenny) same results with both DB4.6 and DB4.8
/Peter
openldap-technical@openldap.org