Hi folks,
We've been running the latest openldap stable on one of our client sites and have hit an issue earlier in the day with syncrepl changes not being propagated to all consumers. The live system is configured with one master and 4 slaves connected across various LAN/WAN links with all servers running 2.4.21, db-4.7 on RHEL4.
Earlier in the day, we renamed a corrupted ou (that was accidentally created with hidden control characters) to ou=salvage. Since the slapd.conf backend on the master was set to "database bdb", this was done by cloning the data into a new ou=salvage and deleting the old version. This data was cleaned up by the client by copying from ou=salvage into its correct location, and then finally they attempted to delete the complete ou=salvage part of the tree. However, after the deletion phase we found that 2 of our slaves were inconsistent with the master.
As an example, on one slave we noticed that one DN with cn=001901717 had not been deleted from ou=salvage. The relevant parts of the log on the master and slave look like this:
master:
May 17 12:35:08 master slapd[3692]: conn=222514 op=3089 SRCH attr=objectclass May 17 12:35:08 master slapd[3692]: conn=222514 op=3089 SEARCH RESULT tag=101 err=0 nentries=0 text= May 17 12:35:08 master slapd[3692]: conn=222514 op=3090 DEL dn="cn=001801489,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" May 17 12:35:08 master slapd[3692]: slap_queue_csn: queing 0x824fee10 20100517113508.089178Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=149,csn=20100517113508.089178Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3090 RESULT tag=107 err=0 text= May 17 12:35:08 master slapd[3692]: slap_graduate_commit_csn: removing 0x82f2e250 20100517113508.089178Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=107,csn=20100517113508.089178Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=136,csn=20100517113508.089178Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=103,csn=20100517113508.089178Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=154,csn=20100517113508.089178Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3091 SRCH base="cn=001800462,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" scope=1 deref=2 filter="(objectClass=*)" May 17 12:35:08 master slapd[3692]: conn=222514 op=3091 SRCH attr=objectclass May 17 12:35:08 master slapd[3692]: conn=222514 op=3091 SEARCH RESULT tag=101 err=0 nentries=0 text= May 17 12:35:08 master slapd[3692]: conn=222514 op=3092 DEL dn="cn=001800462,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" May 17 12:35:08 master slapd[3692]: slap_queue_csn: queing 0x8e151e10 20100517113508.144277Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=149,csn=20100517113508.144277Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3092 RESULT tag=107 err=0 text= May 17 12:35:08 master slapd[3692]: slap_graduate_commit_csn: removing 0x7d00cc58 20100517113508.144277Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=107,csn=20100517113508.144277Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=136,csn=20100517113508.144277Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=103,csn=20100517113508.144277Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=154,csn=20100517113508.144277Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3093 SRCH base="cn=001801399,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" scope=1 deref=2 filter="(objectClass=*)" May 17 12:35:08 master slapd[3692]: conn=222514 op=3093 SRCH attr=objectclass May 17 12:35:08 master slapd[3692]: conn=222514 op=3093 SEARCH RESULT tag=101 err=0 nentries=0 text= May 17 12:35:08 master slapd[3692]: conn=222514 op=3094 DEL dn="cn=001801399,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" May 17 12:35:08 master slapd[3692]: slap_queue_csn: queing 0x8c6f6e10 20100517113508.192643Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=149,csn=20100517113508.192643Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3094 RESULT tag=107 err=0 text= May 17 12:35:08 master slapd[3692]: slap_graduate_commit_csn: removing 0x82588ca0 20100517113508.192643Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=107,csn=20100517113508.192643Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=136,csn=20100517113508.192643Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=103,csn=20100517113508.192643Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=154,csn=20100517113508.192643Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3095 SRCH base="cn=001800467,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" scope=1 deref=2 filter="(objectClass=*)" May 17 12:35:08 master slapd[3692]: conn=222514 op=3095 SRCH attr=objectclass May 17 12:35:08 master slapd[3692]: conn=222514 op=3095 SEARCH RESULT tag=101 err=0 nentries=0 text= May 17 12:35:08 master slapd[3692]: conn=222514 op=3096 DEL dn="cn=001800467,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" May 17 12:35:08 master slapd[3692]: slap_queue_csn: queing 0x8cffae10 20100517113508.234887Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=107,csn=20100517113508.234887Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=136,csn=20100517113508.234887Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=103,csn=20100517113508.234887Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=154,csn=20100517113508.234887Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=149,csn=20100517113508.234887Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3096 RESULT tag=107 err=0 text= May 17 12:35:08 master slapd[3692]: slap_graduate_commit_csn: removing 0x7d802e90 20100517113508.234887Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3097 SRCH base="cn=001901714,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" scope=1 deref=2 filter="(objectClass=*)" May 17 12:35:08 master slapd[3692]: conn=222514 op=3097 SRCH attr=objectclass May 17 12:35:08 master slapd[3692]: conn=222514 op=3097 SEARCH RESULT tag=101 err=0 nentries=0 text= May 17 12:35:08 master slapd[3692]: conn=222514 op=3098 DEL dn="cn=001901714,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" May 17 12:35:08 master slapd[3692]: slap_queue_csn: queing 0x87ef3e10 20100517113508.277430Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=149,csn=20100517113508.277430Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3098 RESULT tag=107 err=0 text= May 17 12:35:08 master slapd[3692]: slap_graduate_commit_csn: removing 0x7e95abe8 20100517113508.277430Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=107,csn=20100517113508.277430Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=136,csn=20100517113508.277430Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=103,csn=20100517113508.277430Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=154,csn=20100517113508.277430Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3099 SRCH base="cn=001901717,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" scope=1 deref=2 filter="(objectClass=*)" May 17 12:35:08 master slapd[3692]: conn=222514 op=3099 SRCH attr=objectclass May 17 12:35:08 master slapd[3692]: conn=222514 op=3099 SEARCH RESULT tag=101 err=0 nentries=0 text= May 17 12:35:08 master slapd[3692]: conn=222514 op=3100 DEL dn="cn=001901717,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" May 17 12:35:08 master slapd[3692]: slap_queue_csn: queing 0x81cfce10 20100517113508.331930Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3101 ABANDON msg=3101 May 17 12:35:08 master slapd[3692]: conn=222514 op=3102 UNBIND May 17 12:35:08 master slapd[3692]: slap_graduate_commit_csn: removing 0x7e550d68 20100517113508.331930Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 fd=411 closed
slave:
May 17 12:35:32 slave slapd[2393]: slap_graduate_commit_csn: removing 0x8b867e0 20100517113508.192643Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149 be_delete cn=001801399,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com (0) May 17 12:35:32 slave slapd[2393]: slap_queue_csn: queing 0x89bfb70 20100517113508.192643Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: slap_graduate_commit_csn: removing 0x8b69c68 20100517113508.192643Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: do_syncrep2: rid=149 cookie=rid=149,csn=20100517113508.234887Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_DELETE) May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149 be_search (0) May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149 cn=001800467,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com May 17 12:35:32 slave slapd[2393]: slap_queue_csn: queing 0x8af8bf0 20100517113508.234887Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: slap_graduate_commit_csn: removing 0x86d0a60 20100517113508.234887Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149 be_delete cn=001800467,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com (0) May 17 12:35:32 slave slapd[2393]: slap_queue_csn: queing 0x8af8bf0 20100517113508.234887Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: slap_graduate_commit_csn: removing 0x86893e8 20100517113508.234887Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: do_syncrep2: rid=149 cookie=rid=149,csn=20100517113508.277430Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_DELETE) May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149 be_search (0) May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149 cn=001901714,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com May 17 12:35:32 slave slapd[2393]: slap_queue_csn: queing 0x86c0a38 20100517113508.277430Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: slap_graduate_commit_csn: removing 0x8abc018 20100517113508.277430Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149 be_delete cn=001901714,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com (0) May 17 12:35:32 slave slapd[2393]: slap_queue_csn: queing 0x86c0a38 20100517113508.277430Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: slap_graduate_commit_csn: removing 0x8aedb68 20100517113508.277430Z#000000#000#000000
(logging stops for about 3s before another connection is made)
AFAICT it seems that something happened between the master and slave that caused an ABANDON to be issued and the connection dropped to that slave. The deletion was successful on the master, but the cn=001901717 entry still remains on the slave.
My questions would therefore be:
1) Are there any known issues with syncrepl and deletion in 2.4.21 that may be fixed in 2.4.22?
2) What does an ABANDON actually mean? Did the client or server terminate the connection and can we find out why? When slave reconnected back to the master, why was the incomplete transaction from the master not pushed back to the slave?
3) Would a simple restart of slapd on the slave cause it to resynchronise correctly with the master?
4) Is there any more information that we can provide from the logs on the 2 inconsistent slaves?
Note I'm not sure whether to open an ITS on this since it is a live system and hence the scope for testing/debugging is much more limited. But I can try my best to obtain extra information if asked.
Many thanks,
Mark.
On 17/05/2010 17:45, Mark Cave-Ayland wrote:
Hi folks,
We've been running the latest openldap stable on one of our client sites and have hit an issue earlier in the day with syncrepl changes not being propagated to all consumers. The live system is configured with one master and 4 slaves connected across various LAN/WAN links with all servers running 2.4.21, db-4.7 on RHEL4.
Earlier in the day, we renamed a corrupted ou (that was accidentally created with hidden control characters) to ou=salvage. Since the slapd.conf backend on the master was set to "database bdb", this was done by cloning the data into a new ou=salvage and deleting the old version. This data was cleaned up by the client by copying from ou=salvage into its correct location, and then finally they attempted to delete the complete ou=salvage part of the tree. However, after the deletion phase we found that 2 of our slaves were inconsistent with the master.
As an example, on one slave we noticed that one DN with cn=001901717 had not been deleted from ou=salvage. The relevant parts of the log on the master and slave look like this:
master:
May 17 12:35:08 master slapd[3692]: conn=222514 op=3089 SRCH attr=objectclass May 17 12:35:08 master slapd[3692]: conn=222514 op=3089 SEARCH RESULT tag=101 err=0 nentries=0 text= May 17 12:35:08 master slapd[3692]: conn=222514 op=3090 DEL dn="cn=001801489,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" May 17 12:35:08 master slapd[3692]: slap_queue_csn: queing 0x824fee10 20100517113508.089178Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=149,csn=20100517113508.089178Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3090 RESULT tag=107 err=0 text= May 17 12:35:08 master slapd[3692]: slap_graduate_commit_csn: removing 0x82f2e250 20100517113508.089178Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=107,csn=20100517113508.089178Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=136,csn=20100517113508.089178Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=103,csn=20100517113508.089178Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=154,csn=20100517113508.089178Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3091 SRCH base="cn=001800462,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" scope=1 deref=2 filter="(objectClass=*)" May 17 12:35:08 master slapd[3692]: conn=222514 op=3091 SRCH attr=objectclass May 17 12:35:08 master slapd[3692]: conn=222514 op=3091 SEARCH RESULT tag=101 err=0 nentries=0 text= May 17 12:35:08 master slapd[3692]: conn=222514 op=3092 DEL dn="cn=001800462,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" May 17 12:35:08 master slapd[3692]: slap_queue_csn: queing 0x8e151e10 20100517113508.144277Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=149,csn=20100517113508.144277Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3092 RESULT tag=107 err=0 text= May 17 12:35:08 master slapd[3692]: slap_graduate_commit_csn: removing 0x7d00cc58 20100517113508.144277Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=107,csn=20100517113508.144277Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=136,csn=20100517113508.144277Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=103,csn=20100517113508.144277Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=154,csn=20100517113508.144277Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3093 SRCH base="cn=001801399,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" scope=1 deref=2 filter="(objectClass=*)" May 17 12:35:08 master slapd[3692]: conn=222514 op=3093 SRCH attr=objectclass May 17 12:35:08 master slapd[3692]: conn=222514 op=3093 SEARCH RESULT tag=101 err=0 nentries=0 text= May 17 12:35:08 master slapd[3692]: conn=222514 op=3094 DEL dn="cn=001801399,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" May 17 12:35:08 master slapd[3692]: slap_queue_csn: queing 0x8c6f6e10 20100517113508.192643Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=149,csn=20100517113508.192643Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3094 RESULT tag=107 err=0 text= May 17 12:35:08 master slapd[3692]: slap_graduate_commit_csn: removing 0x82588ca0 20100517113508.192643Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=107,csn=20100517113508.192643Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=136,csn=20100517113508.192643Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=103,csn=20100517113508.192643Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=154,csn=20100517113508.192643Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3095 SRCH base="cn=001800467,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" scope=1 deref=2 filter="(objectClass=*)" May 17 12:35:08 master slapd[3692]: conn=222514 op=3095 SRCH attr=objectclass May 17 12:35:08 master slapd[3692]: conn=222514 op=3095 SEARCH RESULT tag=101 err=0 nentries=0 text= May 17 12:35:08 master slapd[3692]: conn=222514 op=3096 DEL dn="cn=001800467,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" May 17 12:35:08 master slapd[3692]: slap_queue_csn: queing 0x8cffae10 20100517113508.234887Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=107,csn=20100517113508.234887Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=136,csn=20100517113508.234887Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=103,csn=20100517113508.234887Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=154,csn=20100517113508.234887Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=149,csn=20100517113508.234887Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3096 RESULT tag=107 err=0 text= May 17 12:35:08 master slapd[3692]: slap_graduate_commit_csn: removing 0x7d802e90 20100517113508.234887Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3097 SRCH base="cn=001901714,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" scope=1 deref=2 filter="(objectClass=*)" May 17 12:35:08 master slapd[3692]: conn=222514 op=3097 SRCH attr=objectclass May 17 12:35:08 master slapd[3692]: conn=222514 op=3097 SEARCH RESULT tag=101 err=0 nentries=0 text= May 17 12:35:08 master slapd[3692]: conn=222514 op=3098 DEL dn="cn=001901714,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" May 17 12:35:08 master slapd[3692]: slap_queue_csn: queing 0x87ef3e10 20100517113508.277430Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=149,csn=20100517113508.277430Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3098 RESULT tag=107 err=0 text= May 17 12:35:08 master slapd[3692]: slap_graduate_commit_csn: removing 0x7e95abe8 20100517113508.277430Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=107,csn=20100517113508.277430Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=136,csn=20100517113508.277430Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=103,csn=20100517113508.277430Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=154,csn=20100517113508.277430Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3099 SRCH base="cn=001901717,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" scope=1 deref=2 filter="(objectClass=*)" May 17 12:35:08 master slapd[3692]: conn=222514 op=3099 SRCH attr=objectclass May 17 12:35:08 master slapd[3692]: conn=222514 op=3099 SEARCH RESULT tag=101 err=0 nentries=0 text= May 17 12:35:08 master slapd[3692]: conn=222514 op=3100 DEL dn="cn=001901717,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" May 17 12:35:08 master slapd[3692]: slap_queue_csn: queing 0x81cfce10 20100517113508.331930Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3101 ABANDON msg=3101 May 17 12:35:08 master slapd[3692]: conn=222514 op=3102 UNBIND May 17 12:35:08 master slapd[3692]: slap_graduate_commit_csn: removing 0x7e550d68 20100517113508.331930Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 fd=411 closed
slave:
May 17 12:35:32 slave slapd[2393]: slap_graduate_commit_csn: removing 0x8b867e0 20100517113508.192643Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149 be_delete cn=001801399,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com (0) May 17 12:35:32 slave slapd[2393]: slap_queue_csn: queing 0x89bfb70 20100517113508.192643Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: slap_graduate_commit_csn: removing 0x8b69c68 20100517113508.192643Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: do_syncrep2: rid=149 cookie=rid=149,csn=20100517113508.234887Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_DELETE) May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149 be_search (0) May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149 cn=001800467,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com May 17 12:35:32 slave slapd[2393]: slap_queue_csn: queing 0x8af8bf0 20100517113508.234887Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: slap_graduate_commit_csn: removing 0x86d0a60 20100517113508.234887Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149 be_delete cn=001800467,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com (0) May 17 12:35:32 slave slapd[2393]: slap_queue_csn: queing 0x8af8bf0 20100517113508.234887Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: slap_graduate_commit_csn: removing 0x86893e8 20100517113508.234887Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: do_syncrep2: rid=149 cookie=rid=149,csn=20100517113508.277430Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_DELETE) May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149 be_search (0) May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149 cn=001901714,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com May 17 12:35:32 slave slapd[2393]: slap_queue_csn: queing 0x86c0a38 20100517113508.277430Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: slap_graduate_commit_csn: removing 0x8abc018 20100517113508.277430Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149 be_delete cn=001901714,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com (0) May 17 12:35:32 slave slapd[2393]: slap_queue_csn: queing 0x86c0a38 20100517113508.277430Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: slap_graduate_commit_csn: removing 0x8aedb68 20100517113508.277430Z#000000#000#000000
(logging stops for about 3s before another connection is made)
AFAICT it seems that something happened between the master and slave that caused an ABANDON to be issued and the connection dropped to that slave. The deletion was successful on the master, but the cn=001901717 entry still remains on the slave.
Actually, it looks like the dropped connection was the one that did the deletion, not the slave's connection:
May 17 12:35:08 master slapd[3692]: conn=222514 op=3100 DEL dn="cn=001901717,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" May 17 12:35:08 master slapd[3692]: slap_queue_csn: queing 0x81cfce10 20100517113508.331930Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3101 ABANDON msg=3101 May 17 12:35:08 master slapd[3692]: conn=222514 op=3102 UNBIND
This looks very much like ITS#6059.
My questions would therefore be:
- Are there any known issues with syncrepl and deletion in 2.4.21 that
may be fixed in 2.4.22?
There are issues fixed in 2.4.22 regarding syncrepl. See the release notes: http://www.openldap.org/software/release/changes.html
However, I'm pretty sure they don't address the issue discussed in ITS#6059.
- What does an ABANDON actually mean? Did the client or server
terminate the connection and can we find out why? When slave reconnected back to the master, why was the incomplete transaction from the master not pushed back to the slave?
An ABANDON happens when an operation is explicitly abandoned, or is cancelled, due to a lost connection for example. Typically, if you're running ldapmodify and hit CTRL-C, you get an ABANDON in the logs (well, if you are fast enough to hit it at the right time that is).
- Would a simple restart of slapd on the slave cause it to
resynchronise correctly with the master?
No, assuming my assumptions above are correct: if the syncrepl consumer didn't lose the connection to the master, other updates will have been propagated, and the consumer's cookie will have been updated, missing out the delete operation.
To resync, start your consumer(s) once with the -c rid=nnn command line option (with nnn being the rid of the syncrepl statement in config).
Hope this helps, Jonathan
- Is there any more information that we can provide from the logs on
the 2 inconsistent slaves?
Note I'm not sure whether to open an ITS on this since it is a live system and hence the scope for testing/debugging is much more limited. But I can try my best to obtain extra information if asked.
Many thanks,
Mark.
Jonathan Clarke wrote:
Hi Jonathan,
This looks very much like ITS#6059.
Interesting reading. So it seems that there is a known race condition with ABANDON. I did manage to find out that the person who ran the original delete thought it had hung, but I'm not sure whether that resulted in an automatic (timeout) or manual ABANDON.
(lots of questions cut)
- Would a simple restart of slapd on the slave cause it to
resynchronise correctly with the master?
No, assuming my assumptions above are correct: if the syncrepl consumer didn't lose the connection to the master, other updates will have been propagated, and the consumer's cookie will have been updated, missing out the delete operation.
To resync, start your consumer(s) once with the -c rid=nnn command line option (with nnn being the rid of the syncrepl statement in config).
Alas it doesn't seem to help :( While I can see the log file on the slave filling up with each entry DN as it checks it against the master, a quick grep has showed that the slave didn't even check the master for the existence of the bad DN "cn=001901717" to determine whether it should be kept - and consequently it wasn't removed from the slave when the synchronisation finished.
I'm wondering if this could be an openldap bug in that when invoking with the "-c rid=<foo>" syntax, the slave downloads each DN from the master and verifies it against its local DN - and hence won't remove extra entries still remaining on the slave that aren't present on the master?
Also another interesting thing I've noticed is that ou=salvage (and its immediate child DN ou=people) both have totally the wrong objectClass. If I point an LDAP browser (JXplorer) directly at ou=salvage on the slave then I see the following:
objectClass: top objectClass: glue
Although interestingly I see the correct objectClasses for ou=salvage if I do a simple ldapsearch on localhost on the slave. Weird.
ATB,
Mark.
openldap-technical@openldap.org