All, I want to ask the list about this before I try to open an ITS to make sure that I am understanding everything correctly. We are running OpenLDAP 2.4.11. I selectively tried to back post ITS 5709 to our source, because we were losing replications. Applying this seemed to help and reduced the number of lost replications. We are running in mirror mode using refreshAndPersist, and doing a high volume of adds to the master, on the order of 100/s. We have run numerous iterations of the same test with very aggressive NTP updates that are keeping both the master and consumer within 50 microseconds of one another. Which I saw recommended as a possible solution in a previous message thread. This seemed to make little to no difference in the replication loss.
From looking at the code I was thinking that the lost replications might be due to entries being queued on the master side in non-ascending order which I was seeing preceding the replication that would be rejected on the consumer side. What I thought was happening is that the logic that traverses the queue to mark committed CSNs and updates the contextCSN was getting out of sync because of this, and orphaning replications that were still pending, because they are too old, but in reality they have never been added to the consumer.
I just pulled the latest code from RE24 and reran the test, the latest code is better than before with just the back post of 5709 on 2.4.11, but we are still losing a small percentage of the replications with the "CSN too old" message. With the latest code I am still seeing a correlation between the out of sync queuing on the master and the replications that are rejected on the consumer.
During this run NTP was keeping the 2 systems within 10 microseconds of each other, with the most aggressive synch interval that is configurable at 16 seconds.
Below I have log snippets and some of the relevant configuration information. If more is desired then please let me know and I will provide it.
#### MASTER #####
Nov 14 14:43:05 ng04be03 slapd[7582]: slap_graduate_commit_csn: removing 0x2b4c9568b0 20081114194304.892065Z#000000#001#000000 Nov 14 14:43:05 ng04be03 slapd[7582]: slap_queue_csn: queing 0x42803100 20081114194305.078713Z#000000#001#000000 Nov 14 14:43:05 ng04be03 slapd[7582]: conn=14 op=17167 ADD dn="uniqueIdentifier=Evad_Added_tele_5450408582,ou=subscribers,ou=SINGP,o=ricuc.com" Nov 14 14:43:05 ng04be03 slapd[7582]: slap_queue_csn: queing 0x4680b100 20081114194305.078878Z#000000#001#000000 Nov 14 14:43:05 ng04be03 slapd[7582]: slap_queue_csn: queing 0x43004100 20081114194305.078653Z#000000#001#000000 Nov 14 14:43:05 ng04be03 slapd[7582]: conn=12 op=13844 RESULT tag=105 err=0 text= Nov 14 14:43:05 ng04be03 slapd[7582]: slap_graduate_commit_csn: removing 0x2b4c87e670 20081114194305.068251Z#000000#001#000000
Nov 14 14:45:02 ng04be03 slapd[7582]: slap_queue_csn: queing 0x41000100 20081114194502.917316Z#000000#001#000000 Nov 14 14:45:02 ng04be03 slapd[7582]: conn=10 op=19719 ADD dn="uniqueIdentifier=Evad_Added_tele_5450009858,ou=subscribers,ou=SINGP,o=ricuc.com" Nov 14 14:45:02 ng04be03 slapd[7582]: slap_queue_csn: queing 0x43805100 20081114194502.917523Z#000000#001#000000 Nov 14 14:45:02 ng04be03 slapd[7582]: slap_queue_csn: queing 0x4780d100 20081114194502.917288Z#000000#001#000000 Nov 14 14:45:02 ng04be03 slapd[7582]: conn=12 op=17496 RESULT tag=105 err=0 text= Nov 14 14:45:02 ng04be03 slapd[7582]: slap_graduate_commit_csn: removing 0x2b5a7f8340 20081114194502.917316Z#000000#001#000000 Nov 14 14:45:02 ng04be03 slapd[7582]: conn=13 op=19983 ADD dn="uniqueIdentifier=Evad_Added_tele_5450509990,ou=subscribers,ou=SINGP,o=ricuc.com" Nov 14 14:45:02 ng04be03 slapd[7582]: conn=10 op=19719 RESULT tag=105 err=0 text= Nov 14 14:45:02 ng04be03 slapd[7582]: slap_graduate_commit_csn: removing 0x2b5ae77160 20081114194502.917523Z#000000#001#000000 Nov 14 14:45:02 ng04be03 slapd[7582]: conn=14 op=19598 ADD dn="umbillingnumber=5450409797,uniqueIdentifier=Evad_Added_tele_5450409797,ou=subscribers,ou=SINGP,o=ricuc.com" Nov 14 14:45:02 ng04be03 slapd[7582]: slap_queue_csn: queing 0x41000100 20081114194502.936884Z#000000#001#000000 Nov 14 14:45:02 ng04be03 slapd[7582]: conn=11 op=16763 RESULT tag=105 err=0 text= Nov 14 14:45:02 ng04be03 slapd[7582]: slap_queue_csn: queing 0x43805100 20081114194502.947725Z#000000#001#000000 Nov 14 14:45:02 ng04be03 slapd[7582]: slap_graduate_commit_csn: removing 0x2b5ad51170 20081114194502.917288Z#000000#001#000000
### CONSUMER ###
Nov 14 14:43:36 ng04be04 slapd[24622]: syncrepl_entry: rid=002 be_add (0) Nov 14 14:43:36 ng04be04 slapd[24622]: do_syncrep2: cookie=rid=002,sid=002,csn=20081114194305.078653Z#000000#001#000000 Nov 14 14:43:36 ng04be04 slapd[24622]: do_syncrep2: rid=002 CSN too old, ignoring 20081114194305.078653Z#000000#001#000000 Nov 14 14:43:36 ng04be04 slapd[24622]: do_syncrep2: cookie=rid=002,sid=002 Nov 14 14:43:36 ng04be04 slapd[24622]: syncrepl_entry: rid=002 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD) Nov 14 14:43:36 ng04be04 slapd[24622]: syncrepl_entry: rid=002 be_search (0)
Nov 14 14:45:39 ng04be04 slapd[24622]: slap_queue_csn: queing 0x2b4737c990 20081114194502.917523Z#000000#001#000000 Nov 14 14:45:39 ng04be04 slapd[24622]: slap_graduate_commit_csn: removing 0x2b473ca890 20081114194502.917523Z#000000#001#000000 Nov 14 14:45:39 ng04be04 slapd[24622]: do_syncrep2: cookie=rid=002,sid=002,csn=20081114194502.917288Z#000000#001#000000 Nov 14 14:45:39 ng04be04 slapd[24622]: do_syncrep2: rid=002 CSN too old, ignoring 20081114194502.917288Z#000000#001#000000 Nov 14 14:45:39 ng04be04 slapd[24622]: do_syncrep2: cookie=rid=002,sid=002,csn=20081114194502.936884Z#000000#001#000000
### Replication Config ###
dn: olcDatabase={2}hdb,cn=config objectClass: olcDatabaseConfig objectClass: olcHdbConfig ... olcSyncrepl: {0}rid=2 provider=ldap://ldap.server.com bindmethod=si mple timeout=0 network-timeout=0 binddn="cn=Directory Manager,o=ricuc.com" cr edentials="secret" starttls=no filter="(objectclass=*)" searchbase="o=ricuc.com" scope=sub schemachecking=off type=refreshandpersist retry="60 +" olcMirrorMode: TRUE
dn: olcOverlay={0}syncprov,olcDatabase={2}hdb,cn=config objectClass: olcOverlayConfig objectClass: olcSyncProvConfig olcOverlay: {0}syncprov olcSpCheckpoint: 100 600 olcSpSessionlog: 100
### Hardware ### Dual Quad Core Xeon 2.83GHz 32GB RAM 8x15000rpm RAID10 Separate LUNS for db and txn logs
Kris Burton Software Engineer ________________________________________
Acision. Innovation. Assured. www.acision.com
4870 Sadler Road Suite 200
Glen Allen, VA 23060 USA
This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.