Hi,
I am using 2.4.26 on syncrepl master (provider) (package on CentOS 5.7 x86_64) and 2.4.22, 2.4.26 on two consumers respectively.
Last night, I edited a user account (hosted in LDAP) and when this tried to replicate to two consumers, both froze. This did not happen on another consumer (also 2.4.26) which was using replication over the Manager account. The two ones which froze are using a limited-privileged BindDN for replication which does not have access to user accounts (so, the user account should/would not be replicated on those two consumers).
On the master: Nov 23 23:12:04 ldap slapd[2295]: syncprov_sendresp: cookie=rid=333,csn=20111123211204.601542Z#000000#000#000000 Nov 23 23:12:04 ldap slapd[2295]: syncprov_sendresp: cookie=rid=222,csn=20111123211204.601542Z#000000#000#000000
On slave 222:
Nov 23 23:12:04 vdns slapd2.4[2145]: do_syncrep2: rid=222 cookie=rid=222,csn=20111123211204.601542Z#000000#000#000000 Nov 23 23:12:04 vdns slapd2.4[2145]: syncrepl_entry: rid=222 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_MODIFY) Nov 23 23:12:04 vdns slapd2.4[2145]: syncrepl_entry: rid=222 be_search (0) Nov 23 23:12:04 vdns slapd2.4[2145]: syncrepl_entry: rid=222 uid=userx,ou=people,dc=example,dc=com Nov 23 23:12:04 vdns slapd2.4[2145]: slap_queue_csn: queing 0x2aaab0019970 20111123211204.601542Z#000000#000#000000 and /var/log/messages: Nov 23 23:12:04 vdns kernel: slapd2.4[2164]: segfault at 00000001075c61a8 rip 0000000000480ecb rsp 00000000424e04c0 error 4
On slave 333: Nov 23 23:12:04 dns2 slapd[2364]: do_syncrep2: rid=333 cookie=rid=333,csn=20111123211204.601542Z#000000#000#000000 Nov 23 23:12:04 dns2 slapd[2364]: syncrepl_entry: rid=333 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_MODIFY) Nov 23 23:12:04 dns2 slapd[2364]: syncrepl_entry: rid=333 be_search (0) Nov 23 23:12:04 dns2 slapd[2364]: syncrepl_entry: rid=333 uid=userx,ou=people,dc=example,dc=com Nov 23 23:12:04 dns2 slapd[2364]: slap_queue_csn: queing 0x19e8cfa0 20111123211204.601542Z#000000#000#000000 and /var/log/messages: Nov 23 23:12:04 dns2 kernel: slapd[2736] general protection rip:4b5342 rsp:43c54530 error:0
I have not seen this behavior in months and months of use.
Any advice?
Thanks, Nick
On 24/11/2011 4:32 μμ, Nick Milas wrote:
I am using 2.4.26 on syncrepl master (provider) (package on CentOS 5.7 x86_64) and 2.4.22, 2.4.26 on two consumers respectively.
Additional data:
Provider:
DN: olcOverlay={2}syncprov,olcDatabase={1}hdb,cn=config objectClass: olcOverlayConfig objectClass: olcSyncProvConfig olcOverlay: {2}syncprov olcSpCheckpoint: 100 10 olcSpSessionlog: 100
Consumers:
syncrepl rid=222 provider=ldaps://ldap.example.com tls_reqcert=never type=refreshAndPersist retry="60 +" searchbase="dc=example,dc=com" schemachecking=off bindmethod=simple binddn="uid=usrx,ou=System,dc=example,dc=com" credentials="secret"
syncrepl rid=333 provider=ldaps://ldap.example.com type=refreshAndPersist tls_reqcert=never retry="60 +" searchbase="dc=example,dc=com" schemachecking=off bindmethod=simple binddn="uid=usrx,ou=System,dc=example,dc=com" credentials="secret"
Nick
Hello,
Since I had no feedback on this problem, should I guess that the issue has been tackled in latest version?
Would it be related to ITS 6892 (http://www.openldap.org/its/index.cgi/Software%20Bugs?id=6892)?
I would like to know if I should upgrade consumers to 2.4.27/28 with a high priority or not!
Please advise.
Thanks, Nick
On 24/11/2011 4:32 μμ, Nick Milas wrote:
Any advice?
--On Monday, November 28, 2011 11:07 AM +0200 Nick Milas nick@eurobjects.com wrote:
Hello,
Since I had no feedback on this problem, should I guess that the issue has been tackled in latest version?
Would it be related to ITS 6892 (http://www.openldap.org/its/index.cgi/Software%20Bugs?id=6892)?
I would like to know if I should upgrade consumers to 2.4.27/28 with a high priority or not!
Please advise.
You've provided zero information on why they are freezing (i.e., a backtrace with debugging symbols enabled on the servers where slapd has "frozen"). Of course, I would suggest (attempting) to reproduce the issue with the latest version, as there were numerous fixes related to syncrepl & syncprov in 2.4.27/28. I personally would advise against mixing versions, also. Make all servers run the same OpenLDAP version.
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
On 28/11/2011 11:37 πμ, Quanah Gibson-Mount wrote:
You've provided zero information on why they are freezing (i.e., a backtrace with debugging symbols enabled on the servers where slapd has "frozen").
Thanks Quanah,
Unfortunately, such error has not ever appeared until now, so I doubt I'll be able to reproduce, so as to create a backtrace. If, however, I manage to reproduce, I'll surely post the results. It's a production system, so testing is not so straightforward.
Error logging of the form: "Nov 23 23:12:04 dns2 kernel: slapd[2736] general protection rip:4b5342 rsp:43c54530 error:0" is practically useless?
Can I somehow run a (consumer) server in syncrepl debugging mode, in order to capture *in adequate detail* problems that MIGHT arise, despite a possible high debug logging volume (which would be manageable on a low-load box)?
Of course, I would suggest (attempting) to reproduce the issue with the latest version, as there were numerous fixes related to syncrepl & syncprov in 2.4.27/28. I personally would advise against mixing versions, also. Make all servers run the same OpenLDAP version.
Thanks, Nick
Nick,
Nick Milas schrieb (28.11.2011 11:04 Uhr):
On 28/11/2011 11:37 πμ, Quanah Gibson-Mount wrote:
Can I somehow run a (consumer) server in syncrepl debugging mode, in order to capture *in adequate detail* problems that MIGHT arise, despite a possible high debug logging volume (which would be manageable on a low-load box)?
you should enable core dumping on your server http://www.openldap.org/lists/openldap-technical/201108/msg00161.html
You can then load the core dump on a separate debugging system.
Marc
Nick Milas wrote:
On 28/11/2011 11:37 πμ, Quanah Gibson-Mount wrote:
You've provided zero information on why they are freezing (i.e., a backtrace with debugging symbols enabled on the servers where slapd has "frozen").
Thanks Quanah,
Unfortunately, such error has not ever appeared until now, so I doubt I'll be able to reproduce, so as to create a backtrace. If, however, I manage to reproduce, I'll surely post the results. It's a production system, so testing is not so straightforward.
Error logging of the form: "Nov 23 23:12:04 dns2 kernel: slapd[2736] general protection rip:4b5342 rsp:43c54530 error:0" is practically useless?
It shows that something crashed. It doesn't tell what or why. Without a backtrace there's nothing we can determine.
openldap-technical@openldap.org