Hi there,
My ldap consists of a cluster of 3 providers that all replicate to each other, and a fleet of consumers replicating from them, and we have ppolicy installed on our providers and consumers both, though we're currently not using it to enforce any particular policies automatically.
I've recently discovered a situation where a script could fail to login with a non-rootDN service account to all three provider instances in short order. The providers seem to be able to figure things out quickly, but the consumers sometimes detect some ContextCSN inconsistency and when this happens consumers to enter a REFRESH state caused by updates to the operational attributes on the service account's dn entry in all three providers at nearly the same time. In production this can cause latency due to the extra CPU and network traffic of refreshing all consumers at once.
The only relevant documentation I've been able to find for this use case are from https://linux.die.net/man/5/slapo-ppolicy:
Note that the current IETF Password Policy proposal does not define how these operational attributes are expected to behave in a replication environment. In general, authentication attempts on a slave server only affect the copy of the operational attributes on that slave and will not affect any attributes for a user's entry on the master server. Operational attribute changes resulting from authentication attempts on a master server will usually replicate to the slaves (and also overwrite any changes that originated on the slave). These behaviors are not guaranteed and are subject to change when a formal specification emerges.
And the related ability for consumers to send updates up a replication chain using the chain overlay:
ppolicy_forward_updates Specify that policy state changes that result from Bind operations (such as recording failures, lockout, etc.) on a consumer should be forwarded to a master instead of being written directly into the consumer's local database. This setting is only useful on a replication consumer, and also requires the updateref setting and chain overlay to be appropriately configured.
tl;dr Ppolicy wrote operational attributes to a service account's dn to all of my provider instances at the same time when my SA used the wrong password to login to them all at once, and caused all my consumers to refresh at the same time. My question is: is the ppolicy overlay inherently unsafe for a provider cluster? Right now I'm considering these options to get rid of the risk of accidentally triggering a consumer REFRESH again:
- Remove the ppolicy overlay from the replicated backend (I'm still checking if there's anything we actually use it for, but if it's inherently unsafe in this configuration then it's gotta go) - Move all of the service accounts to another database that ppolicy is not installed on.
Thanks!