Hi,
We use a single master and two read-only replicas; we use back-bdb on all systems. Each read-only replica replicates from the master with syncrepl, configured to refreshAndPersist. During a particularly heavy update load recently, replication on one of the read-only replicas started to fail due to a misconfigured DB_CONFIG. The replica wrote the following messages to its log repeatedly:
Dec 14 04:01:08 pip-dev slapd[12645]: bdb(dc=csupomona,dc=edu): Lock table is out of available lock entries Dec 14 04:01:08 pip-dev slapd[12645]: => bdb_idl_delete_key: c_get failed: Cannot allocate memory (12) Dec 14 04:01:08 pip-dev slapd[12645]: conn=-1 op=0: attribute "memberUid" index delete failure Dec 14 04:01:08 pip-dev slapd[12645]: null_callback : error code 0x50 Dec 14 04:01:08 pip-dev slapd[12645]: syncrepl_entry: rid=001 be_modify failed (80) Dec 14 04:01:08 pip-dev slapd[12645]: do_syncrepl: rid=001 rc 80 retrying
as it tried and failed to start replication again.
Shortly after, the master slapd crashed, writing nothing to its log indicating why (or even referencing the crash at all). We initially noticed this behavior with a 2.4.26 master and a 2.4.28 read-only replica (we came upon this problem while performing some maintenance, which is why there's a version mismatch). I reproduced the problem on a 2.4.28 master while researching ITS #7113 [1] (which describes this problem more precisely and in more detail). Has anyone else run into this issue? Is there a good way to insulate the master slapd from misconfigured replicas? Our replicas shouldn't break like this (we've tuned our DB_CONFIG to ensure that they don't in the future), and hopefully slapd can be modified so that the master doesn't crash even if replicas do break, but we'd rather not have to worry about our master crashing if our DB_CONFIG proves inadequate in the meantime.
[1] http://www.openldap.org/its/index.cgi/Incoming?id=7113
Thanks for any help,
Kevan Carstensen wrote:
Seems to be the same like ITS#6928:
http://www.openldap.org/its/index.cgi/Incoming?id=6928
I guess nobody ever took note of the debug trace I've grabbed from the provider server.
Ciao, Michael.
On Wed, Jan 04, 2012 at 08:02:25PM -0800, Michael Ströder wrote:
Seems to be the same like ITS#6928:
Yes, that's the identical server failure and a similar cause, a malfunctioning client.
It seems at least possible whatever connections/queries/operations that are occuring during these replica failure issues that accidentally kill the server could be intentionally performed by a malicious client, which would be a DoS security issue in OpenLDAP.
We have a test environment in which we can reproduce this server crash at will, and would be happy to provide whatever additional data or assistance is required to diagnose and resolve the underlying issue.
openldap-technical@openldap.org