Hi Andrew,
Le 02/09/2010 16:27, Andrew Findlay a écrit :
I am updating an OpenLDAP installation at present, and one of the improvements is to introduce a pair of master servers running in mirror mode. There will be several read-only servers as well, and I would like those to replicate from whichever master is currently available.
I did the obvious thing, and put two syncrepl clauses in the read-only server's config - one for each master server.
Starting with an empty database, the read-only server chews CPU badly and generates thousands of log messages like this:
Sep 2 14:57:34 nis0 slapd[9257]: => bdb_idl_insert_key: c_put id failed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30994) Sep 2 14:57:38 nis0 slapd[9257]: => bdb_idl_delete_key: c_del id failed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30994) Sep 2 14:57:38 nis0 slapd[9257]: conn=-1 op=0: attribute "entryCSN" index delete failure Sep 2 14:57:38 nis0 slapd[9257]: => bdb_idl_insert_key: c_put id failed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30994) Sep 2 14:57:38 nis0 slapd[9257]: => bdb_idl_delete_key: c_del id failed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30994) Sep 2 14:57:38 nis0 slapd[9257]: conn=-1 op=0: attribute "entryCSN" index delete failure .... Sep 2 14:56:10 nis0 slapd[4953]: => bdb_idl_insert_key: c_put id failed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30994) Sep 2 14:56:10 nis0 slapd[4953]: => bdb_idl_insert_key: c_put id failed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30994) Sep 2 14:56:11 nis0 slapd[4953]: => bdb_idl_insert_key: c_put id failed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30994) Sep 2 14:56:11 nis0 slapd[4953]: conn=-1 op=0: attribute "localAttribute" index add failure Sep 2 14:56:11 nis0 slapd[4953]: => bdb_idl_insert_key: c_put id failed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30994) Sep 2 14:56:11 nis0 slapd[4953]: => bdb_idl_insert_key: c_put id failed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30994)
It is very likely that each syncrepl connection will be retrieving the same entries at about the same time, so there is clearly potential for collisions.
DB_LOCK_DEADLOCK errors are only a warning: retries should occur until the operation completes. Of course, if they can be avoided, best avoid!
Question: is this topology sensible? If it is expected to work I will gather some debug data for an ITS. If not, I will have to drop back to plan B...
This is an interesting configuration. I would not have proceeded like this but, as Marc Patermann suggested, I would set up a virtual address that points to the currently available master, and configure one syncrepl clause using this address (and all other LDAP clients, in fact). Could this approach work for you?
Looking at the OpenLDAP configuration issue, I note your remark that several syncrepl statements are used in multi-master setups. I'm not entirely sure why this works better, but it may well be due to all servers having the syncprov overlay, which serializes modifications when a persistent search is in progress.
Using two syncrepl statements is certainly suboptimal, as all modifications will be replicated twice to all read-only servers. However, I don't see any reason why it shouldn't work, off the top of my head. Does slapd end up synchronizing everything?
Jonathan