I am updating an OpenLDAP installation at present, and one of the improvements is to introduce a pair of master servers running in mirror mode. There will be several read-only servers as well, and I would like those to replicate from whichever master is currently available.
I did the obvious thing, and put two syncrepl clauses in the read-only server's config - one for each master server.
Starting with an empty database, the read-only server chews CPU badly and generates thousands of log messages like this:
Sep 2 14:57:34 nis0 slapd[9257]: => bdb_idl_insert_key: c_put id failed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30994) Sep 2 14:57:38 nis0 slapd[9257]: => bdb_idl_delete_key: c_del id failed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30994) Sep 2 14:57:38 nis0 slapd[9257]: conn=-1 op=0: attribute "entryCSN" index delete failure Sep 2 14:57:38 nis0 slapd[9257]: => bdb_idl_insert_key: c_put id failed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30994) Sep 2 14:57:38 nis0 slapd[9257]: => bdb_idl_delete_key: c_del id failed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30994) Sep 2 14:57:38 nis0 slapd[9257]: conn=-1 op=0: attribute "entryCSN" index delete failure .... Sep 2 14:56:10 nis0 slapd[4953]: => bdb_idl_insert_key: c_put id failed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30994) Sep 2 14:56:10 nis0 slapd[4953]: => bdb_idl_insert_key: c_put id failed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30994) Sep 2 14:56:11 nis0 slapd[4953]: => bdb_idl_insert_key: c_put id failed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30994) Sep 2 14:56:11 nis0 slapd[4953]: conn=-1 op=0: attribute "localAttribute" index add failure Sep 2 14:56:11 nis0 slapd[4953]: => bdb_idl_insert_key: c_put id failed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30994) Sep 2 14:56:11 nis0 slapd[4953]: => bdb_idl_insert_key: c_put id failed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30994)
It is very likely that each syncrepl connection will be retrieving the same entries at about the same time, so there is clearly potential for collisions.
Question: is this topology sensible? If it is expected to work I will gather some debug data for an ITS. If not, I will have to drop back to plan B...
OpenLDAP 2.4.22 with BerkeleyDB.4.8 running on SLES 10.3 i586
Thanks
Andrew