Paul B. Henson wrote:
I upgraded one of the nodes of a four node MMR delta syncrepl openldap system today to 2.4.44 + the backported ITS 8432 patch (the other three nodes were still running 2.4.41, which all four had been running for quite some time with no issues) and within a few hours they started blowing up with the infinite replication issue referred to in ITS 8432, all the accesslogs were filled with the same modification repeated over and over. I ended up having to slapcat the db, restore the upgraded node to 2.4.41, and then reload the db on all of them to recover.
From what I understand the bug was supposed to have existed in versions
before 2.4.44, but I had never seen it in 2.4.41, and backing out to that version seems to have restored stability. I remember seeing another ITS regarding an issue even with the 8432 patch applied if you're using the memberOf overlay (although that number escapes me at the moment), but I thought that only applied if you were updating group memberships and the change that blew up my system was a password change.
Is it possible updating one node to the fixed version somehow triggered the bug in one of the other nodes? Do I need to upgrade all the nodes at the same time? Or are there possibly still edge cases where 2.4.44 with the patch are still broken? I did a test run of the upgrade in my dev environment, including the staged rollout with temporarily mixed versions, but it doesn't really have the same load and variety of access patterns that production sees.
The fix for #8432 only prevents the redundant mod from being processed on a particular node. If other nodes are still accepting the redundant op then yes, it will continue to propagate. So yes, you need the patched code on all nodes.
Thanks...