All,
I want to ask the list about this before I try to open an
ITS to make sure that I am understanding everything correctly. We are running
OpenLDAP 2.4.11. I selectively tried to back post ITS 5709 to our source,
because we were losing replications. Applying this seemed to help and reduced
the number of lost replications. We are running in mirror mode using
refreshAndPersist, and doing a high volume of adds to the master, on the order
of 100/s. We have run numerous iterations of the same test with very aggressive
NTP updates that are keeping both the master and consumer within 50
microseconds of one another. Which I saw recommended as a possible solution in
a previous message thread. This seemed to make little to no difference in the
replication loss.
From looking at the code I was thinking that the lost
replications might be due to entries being queued on the master side in non-ascending
order which I was seeing preceding the replication that would be rejected on
the consumer side. What I thought was happening is that the logic that
traverses the queue to mark committed CSNs and updates the contextCSN was
getting out of sync because of this, and orphaning replications that were still
pending, because they are too old, but in reality they have never been added to
the consumer.
I just pulled the latest code from RE24 and reran the test,
the latest code is better than before with just the back post of 5709 on
2.4.11, but we are still losing a small percentage of the replications with the
“CSN too old” message. With the latest code I am still seeing
a correlation between the out of sync queuing on the master and the
replications that are rejected on the consumer.
During this run NTP was keeping the 2 systems within 10
microseconds of each other, with the most aggressive synch interval that is
configurable at 16 seconds.
Below I have log snippets and some of the relevant
configuration information. If more is desired then please let me know and I
will provide it.
#### MASTER #####
Nov 14 14:43:05 ng04be03 slapd[7582]:
slap_graduate_commit_csn: removing 0x2b4c9568b0
20081114194304.892065Z#000000#001#000000
Nov 14 14:43:05 ng04be03 slapd[7582]: slap_queue_csn: queing
0x42803100 20081114194305.078713Z#000000#001#000000
Nov 14 14:43:05 ng04be03 slapd[7582]: conn=14 op=17167 ADD
dn="uniqueIdentifier=Evad_Added_tele_5450408582,ou=subscribers,ou=SINGP,o=ricuc.com"
Nov 14 14:43:05 ng04be03 slapd[7582]: slap_queue_csn: queing
0x4680b100 20081114194305.078878Z#000000#001#000000
Nov 14 14:43:05 ng04be03 slapd[7582]: slap_queue_csn: queing
0x43004100 20081114194305.078653Z#000000#001#000000
Nov 14 14:43:05 ng04be03 slapd[7582]: conn=12 op=13844
RESULT tag=105 err=0 text=
Nov 14 14:43:05 ng04be03 slapd[7582]:
slap_graduate_commit_csn: removing 0x2b4c87e670
20081114194305.068251Z#000000#001#000000
Nov 14 14:45:02 ng04be03 slapd[7582]: slap_queue_csn: queing
0x41000100 20081114194502.917316Z#000000#001#000000
Nov 14 14:45:02 ng04be03 slapd[7582]: conn=10 op=19719 ADD
dn="uniqueIdentifier=Evad_Added_tele_5450009858,ou=subscribers,ou=SINGP,o=ricuc.com"
Nov 14 14:45:02 ng04be03 slapd[7582]: slap_queue_csn: queing
0x43805100 20081114194502.917523Z#000000#001#000000
Nov 14 14:45:02 ng04be03 slapd[7582]: slap_queue_csn: queing
0x4780d100 20081114194502.917288Z#000000#001#000000
Nov 14 14:45:02 ng04be03 slapd[7582]: conn=12 op=17496
RESULT tag=105 err=0 text=
Nov 14 14:45:02 ng04be03 slapd[7582]:
slap_graduate_commit_csn: removing 0x2b5a7f8340 20081114194502.917316Z#000000#001#000000
Nov 14 14:45:02 ng04be03 slapd[7582]: conn=13 op=19983 ADD
dn="uniqueIdentifier=Evad_Added_tele_5450509990,ou=subscribers,ou=SINGP,o=ricuc.com"
Nov 14 14:45:02 ng04be03 slapd[7582]: conn=10 op=19719
RESULT tag=105 err=0 text=
Nov 14 14:45:02 ng04be03 slapd[7582]:
slap_graduate_commit_csn: removing 0x2b5ae77160
20081114194502.917523Z#000000#001#000000
Nov 14 14:45:02 ng04be03 slapd[7582]: conn=14 op=19598 ADD
dn="umbillingnumber=5450409797,uniqueIdentifier=Evad_Added_tele_5450409797,ou=subscribers,ou=SINGP,o=ricuc.com"
Nov 14 14:45:02 ng04be03 slapd[7582]: slap_queue_csn: queing
0x41000100 20081114194502.936884Z#000000#001#000000
Nov 14 14:45:02 ng04be03 slapd[7582]: conn=11 op=16763
RESULT tag=105 err=0 text=
Nov 14 14:45:02 ng04be03 slapd[7582]: slap_queue_csn: queing
0x43805100 20081114194502.947725Z#000000#001#000000
Nov 14 14:45:02 ng04be03 slapd[7582]:
slap_graduate_commit_csn: removing 0x2b5ad51170
20081114194502.917288Z#000000#001#000000
### CONSUMER ###
Nov 14 14:43:36 ng04be04 slapd[24622]: syncrepl_entry:
rid=002 be_add (0)
Nov 14 14:43:36 ng04be04 slapd[24622]: do_syncrep2:
cookie=rid=002,sid=002,csn=20081114194305.078653Z#000000#001#000000
Nov 14 14:43:36 ng04be04 slapd[24622]: do_syncrep2: rid=002
CSN too old, ignoring 20081114194305.078653Z#000000#001#000000
Nov 14 14:43:36 ng04be04 slapd[24622]: do_syncrep2:
cookie=rid=002,sid=002
Nov 14 14:43:36 ng04be04 slapd[24622]: syncrepl_entry: rid=002
LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD)
Nov 14 14:43:36 ng04be04 slapd[24622]: syncrepl_entry:
rid=002 be_search (0)
Nov 14 14:45:39 ng04be04 slapd[24622]: slap_queue_csn:
queing 0x2b4737c990 20081114194502.917523Z#000000#001#000000
Nov 14 14:45:39 ng04be04 slapd[24622]:
slap_graduate_commit_csn: removing 0x2b473ca890
20081114194502.917523Z#000000#001#000000
Nov 14 14:45:39 ng04be04 slapd[24622]: do_syncrep2:
cookie=rid=002,sid=002,csn=20081114194502.917288Z#000000#001#000000
Nov 14 14:45:39 ng04be04 slapd[24622]: do_syncrep2: rid=002
CSN too old, ignoring 20081114194502.917288Z#000000#001#000000
Nov 14 14:45:39 ng04be04 slapd[24622]: do_syncrep2:
cookie=rid=002,sid=002,csn=20081114194502.936884Z#000000#001#000000
### Replication Config ###
dn: olcDatabase={2}hdb,cn=config
objectClass: olcDatabaseConfig
objectClass: olcHdbConfig
...
olcSyncrepl: {0}rid=2 provider=ldap://ldap.server.com
bindmethod=si
mple timeout=0 network-timeout=0
binddn="cn=Directory Manager,o=ricuc.com" cr
edentials="secret" starttls=no
filter="(objectclass=*)" searchbase="o=ricuc.com"
scope=sub schemachecking=off type=refreshandpersist
retry="60 +"
olcMirrorMode: TRUE
dn: olcOverlay={0}syncprov,olcDatabase={2}hdb,cn=config
objectClass: olcOverlayConfig
objectClass: olcSyncProvConfig
olcOverlay: {0}syncprov
olcSpCheckpoint: 100 600
olcSpSessionlog: 100
### Hardware ###
Dual Quad Core Xeon 2.83GHz
32GB RAM
8x15000rpm RAID10
Separate LUNS for db and txn logs
Kris Burton
Software Engineer
________________________________________
Acision. Innovation. Assured.
www.acision.com
Glen