I have a database which was working fine, but now whenever I go to perform any sort of modification on the database, the modification just hangs. No error, no timeout, just sits there. However while that first modification is hung, I can perform modifications of another database (other than cn=config) and they work just fine. I can even do further searches against cn=config while the modification is still hung. Also when I run slapd in the foreground, and then send it a SIGINT, it says "slapd shutdown: waiting for X operations/tasks to finish" and never ends. I end up having to SIGKILL it.
The database is part of a MMR group of servers I was building, but I shut down all the other servers to troubleshoot the issue (it makes no difference whether the other servers are up or down, and they all exhibit the same problem).
Where should I start looking to figure out whats going on (the specific operation/task thats hanging)? I can run slapd in debug mode with '-1', but theres a ton of info, and I dont know whats relevant, or which debug mode I should use other than '-1'.
Thanks
-Patrick
On May 23, 2012, at 3:43 PM, Patrick Hemmer openldap@stormcloud9.net wrote:
I have a database which was working fine, but now whenever I go to perform any sort of modification on the database, the modification just hangs. No error, no timeout, just sits there. However while that first modification is hung, I can perform modifications of another database (other than cn=config) and they work just fine. I can even do further searches against cn=config while the modification is still hung. Also when I run slapd in the foreground, and then send it a SIGINT, it says "slapd shutdown: waiting for X operations/tasks to finish" and never ends. I end up having to SIGKILL it.
The database is part of a MMR group of servers I was building, but I shut down all the other servers to troubleshoot the issue (it makes no difference whether the other servers are up or down, and they all exhibit the same problem).
Where should I start looking to figure out whats going on (the specific operation/task thats hanging)? I can run slapd in debug mode with '-1', but theres a ton of info, and I dont know whats relevant, or which debug mode I should use other than '-1'.
Thanks
-Patrick
Always state the version of OpenLDAP you are running. I know for a fact this is a known bug in older releases of OpenLDAP 2.4 because I reported it.
--Quanah
Sent: Wed May 23 2012 21:56:15 GMT-0400 (EDT) From: Quanah Gibson-Mount quanah@zimbra.com To: Patrick Hemmer openldap@stormcloud9.net "openldap-technical@openldap.org" openldap-technical@openldap.org Subject: Re: slapd hangs upon performing modification of cn=config
On May 23, 2012, at 3:43 PM, Patrick Hemmer <openldap@stormcloud9.net mailto:openldap@stormcloud9.net> wrote:
I have a database which was working fine, but now whenever I go to perform any sort of modification on the database, the modification just hangs. No error, no timeout, just sits there. However while that first modification is hung, I can perform modifications of another database (other than cn=config) and they work just fine. I can even do further searches against cn=config while the modification is still hung. Also when I run slapd in the foreground, and then send it a SIGINT, it says "slapd shutdown: waiting for X operations/tasks to finish" and never ends. I end up having to SIGKILL it.
The database is part of a MMR group of servers I was building, but I shut down all the other servers to troubleshoot the issue (it makes no difference whether the other servers are up or down, and they all exhibit the same problem).
Where should I start looking to figure out whats going on (the specific operation/task thats hanging)? I can run slapd in debug mode with '-1', but theres a ton of info, and I dont know whats relevant, or which debug mode I should use other than '-1'.
Thanks
-Patrick
Always state the version of OpenLDAP you are running. I know for a fact this is a known bug in older releases of OpenLDAP 2.4 because I reported it.
--Quanah
Sorry, completely forgot to put that. This is 2.4.31
-Patrick
Sent: Wed May 23 2012 18:28:08 GMT-0400 (EDT) From: Patrick Hemmer openldap@stormcloud9.net To: openldap-technical@openldap.org Subject: slapd hangs upon performing modification of cn=config
I have a database which was working fine, but now whenever I go to perform any sort of modification on the database, the modification just hangs. No error, no timeout, just sits there. However while that first modification is hung, I can perform modifications of another database (other than cn=config) and they work just fine. I can even do further searches against cn=config while the modification is still hung. Also when I run slapd in the foreground, and then send it a SIGINT, it says "slapd shutdown: waiting for X operations/tasks to finish" and never ends. I end up having to SIGKILL it.
The database is part of a MMR group of servers I was building, but I shut down all the other servers to troubleshoot the issue (it makes no difference whether the other servers are up or down, and they all exhibit the same problem).
Where should I start looking to figure out whats going on (the specific operation/task thats hanging)? I can run slapd in debug mode with '-1', but theres a ton of info, and I dont know whats relevant, or which debug mode I should use other than '-1'.
Thanks
-Patrick
Ok, so the problem went away. Didnt change a thing, just came back after a few hours, started slapd up, and it behaved (though I am still interested to know how to find stuck operations/tasks). However a new (maybe related?) issue has popped up. I tried to add olcSpReloadHint=TRUE to the syncprov overlay (all the replicas are > 2.3.11) and this change isnt replicating (other changes, including attribute adds, to other DNs in cn=config replicate fine, just not this one).
running the consumer slapd with `-d sync` I get the following:
4fbd9472 syncrepl_message_to_entry: rid=510 DN: olcOverlay={0}syncprov,olcDatabase={0}config,cn=config, UUID: 84beea9e-3987-1031-9a75-776905f7a32d 4fbd9472 syncrepl_entry: rid=510 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD) 4fbd9472 syncrepl_entry: rid=510 be_search (0) 4fbd9472 syncrepl_entry: rid=510 olcOverlay={0}syncprov,olcDatabase={0}config,cn=config 4fbd9472 syncprov_matchops: skipping original sid 033 4fbd9472 null_callback : error code 0x50 4fbd9472 syncrepl_entry: rid=510 be_modify olcOverlay={0}syncprov,olcDatabase={0}config,cn=config (80) 4fbd9472 syncrepl_entry: rid=510 be_modify failed (80) 4fbd9472 do_syncrepl: rid=510 rc 80 retrying
Sent: Wed May 23 2012 22:03:39 GMT-0400 (EDT) From: Patrick H. openldap@stormcloud9.net To: Patrick Hemmer openldap@stormcloud9.net openldap-technical@openldap.org Subject: Re: slapd hangs upon performing modification of cn=config
<snip>
I changed the subject as this is a different issue than originally reported. While they may be related, its safer to assume they're not.
Ok, so the problem went away. Didnt change a thing, just came back after a few hours, started slapd up, and it behaved (though I am still interested to know how to find stuck operations/tasks). However a new (maybe related?) issue has popped up. I tried to add olcSpReloadHint=TRUE to the syncprov overlay (all the replicas are > 2.3.11) and this change isnt replicating (other changes, including attribute adds, to other DNs in cn=config replicate fine, just not this one).
running the consumer slapd with `-d sync` I get the following:
4fbd9472 syncrepl_message_to_entry: rid=510 DN: olcOverlay={0}syncprov,olcDatabase={0}config,cn=config, UUID: 84beea9e-3987-1031-9a75-776905f7a32d 4fbd9472 syncrepl_entry: rid=510 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD) 4fbd9472 syncrepl_entry: rid=510 be_search (0) 4fbd9472 syncrepl_entry: rid=510 olcOverlay={0}syncprov,olcDatabase={0}config,cn=config 4fbd9472 syncprov_matchops: skipping original sid 033 4fbd9472 null_callback : error code 0x50 4fbd9472 syncrepl_entry: rid=510 be_modify olcOverlay={0}syncprov,olcDatabase={0}config,cn=config (80) 4fbd9472 syncrepl_entry: rid=510 be_modify failed (80) 4fbd9472 do_syncrepl: rid=510 rc 80 retrying
So I turned on full debug and got some more info
4fbe37ff syncrepl_entry: rid=510 be_search (0) 4fbe37ff syncrepl_entry: rid=510 olcOverlay={0}syncprov,olcDatabase={0}config,cn=config 4fbe37ff syncprov_matchops: skipping original sid 033 4fbe37ff <= acl_access_allowed: granted to database root 4fbe37ff send_ldap_result: conn=-1 op=0 p=3 4fbe37ff send_ldap_result: err=80 matched="" text="modify/delete: olcSpReloadHint: no such attribute" 4fbe37ff null_callback : error code 0x50 4fbe37ff syncrepl_entry: rid=510 be_modify olcOverlay={0}syncprov,olcDatabase={0}config,cn=config (80) 4fbe37ff syncrepl_entry: rid=510 be_modify failed (80)
So why does it think "olcSpReloadHint: no such attribute"? the syncprov overlay is enabled, and syncrepl is working as its the one trying to perform the modification.
--On May 24, 2012 10:41:32 AM -0400 "Patrick H." openldap@stormcloud9.net wrote:
Sent: Wed May 23 2012 22:03:39 GMT-0400 (EDT) From: Patrick H. openldap@stormcloud9.net To: Patrick Hemmer openldap@stormcloud9.net openldap-technical@openldap.org Subject: Re: slapd hangs upon performing modification of cn=config
<snip>
I changed the subject as this is a different issue than originally reported. While they may be related, its safer to assume they're not.
Ok, so the problem went away. Didnt change a thing, just came back after a few hours, started slapd up, and it behaved (though I am still interested to know how to find stuck operations/tasks). However a new (maybe related?) issue has popped up. I tried to add olcSpReloadHint=TRUE to the syncprov overlay (all the replicas are > 2.3.11) and this change isnt replicating (other changes, including attribute adds, to other DNs in cn=config replicate fine, just not this one).
I'm going to assume you mean all replicas are > 2.4.11, not 2.3.11.
All your replicas should be running the same version of OpenLDAP. All your replicas should be running a current version of OpenLDAP. Some of the fixes to Syncrepl have been on the client side, not just the master side.
If you can reproduce using 2.4.31 on all nodes, you should probably file an ITS.
--Quanah
Sent: Thu May 24 2012 11:59:11 GMT-0400 (EDT) From: Quanah Gibson-Mount quanah@zimbra.com To: Patrick H. openldap@stormcloud9.net openldap-technical@openldap.org Subject: Re: slapd wont replicate olcSpReloadHint attribute (was: slapd hangs upon performing modification of cn=config)
--On May 24, 2012 10:41:32 AM -0400 "Patrick H." openldap@stormcloud9.net wrote:
Sent: Wed May 23 2012 22:03:39 GMT-0400 (EDT) From: Patrick H. openldap@stormcloud9.net To: Patrick Hemmer openldap@stormcloud9.net openldap-technical@openldap.org Subject: Re: slapd hangs upon performing modification of cn=config
<snip>
I changed the subject as this is a different issue than originally reported. While they may be related, its safer to assume they're not.
Ok, so the problem went away. Didnt change a thing, just came back after a few hours, started slapd up, and it behaved (though I am still interested to know how to find stuck operations/tasks). However a new (maybe related?) issue has popped up. I tried to add olcSpReloadHint=TRUE to the syncprov overlay (all the replicas are > 2.3.11) and this change isnt replicating (other changes, including attribute adds, to other DNs in cn=config replicate fine, just not this one).
I'm going to assume you mean all replicas are > 2.4.11, not 2.3.11.
All your replicas should be running the same version of OpenLDAP. All your replicas should be running a current version of OpenLDAP. Some of the fixes to Syncrepl have been on the client side, not just the master side.
If you can reproduce using 2.4.31 on all nodes, you should probably file an ITS.
--Quanah
According to the slapo-syncprov man page its 2.3.11, not 2.4.11, in which this setting is properly supported. And yes, all servers are running the same version; 2.4.31
And yes, this behavior is reproduced on all nodes. I'll look at filing an ITS.
-Patrick
--On May 24, 2012 12:02:46 PM -0400 "Patrick H." openldap@stormcloud9.net wrote:
According to the slapo-syncprov man page its 2.3.11, not 2.4.11,
Yes, but you were saying all your replicas were > 2.3.11, which I should sure hope is the case. ;) I.e., putting in > 2.3.11 is meaningless information unless you are trying to say you are running OpenLDAP 2.3.
--Quanah
openldap-technical@openldap.org