Hi there,
My ldap consists of a cluster of 3 providers that all replicate to each
other, and a fleet of consumers replicating from them, and we have ppolicy
installed on our providers and consumers both, though we're currently not
using it to enforce any particular policies automatically.
I've recently discovered a situation where a script could fail to login
with a non-rootDN service account to all three provider instances in short
order. The providers seem to be able to figure things out quickly, but the
consumers sometimes detect some ContextCSN inconsistency and when this
happens consumers to enter a REFRESH state caused by updates to the
operational attributes on the service account's dn entry in all three
providers at nearly the same time. In production this can cause latency due
to the extra CPU and network traffic of refreshing all consumers at once.
The only relevant documentation I've been able to find for this use case
are from
https://linux.die.net/man/5/slapo-ppolicy:
Note that the current IETF Password Policy proposal does not define how
these operational attributes are expected to behave in a replication
environment. In general, authentication attempts on a slave server only
affect the copy of the operational attributes on that slave and will not
affect any attributes for a user's entry on the master server. Operational
attribute changes resulting from authentication attempts on a master server
will usually replicate to the slaves (and also overwrite any changes that
originated on the slave). These behaviors are not guaranteed and are
subject to change when a formal specification emerges.
And the related ability for consumers to send updates up a replication
chain using the chain overlay:
ppolicy_forward_updates Specify that policy state changes that result from
Bind operations (such as recording failures, lockout, etc.) on a consumer
should be forwarded to a master instead of being written directly into the
consumer's local database. This setting is only useful on a replication
consumer, and also requires the updateref setting and chain overlay to be
appropriately configured.
tl;dr Ppolicy wrote operational attributes to a service account's dn to all
of my provider instances at the same time when my SA used the wrong
password to login to them all at once, and caused all my consumers to
refresh at the same time. My question is: is the ppolicy overlay inherently
unsafe for a provider cluster? Right now I'm considering these options to
get rid of the risk of accidentally triggering a consumer REFRESH again:
- Remove the ppolicy overlay from the replicated backend (I'm still
checking if there's anything we actually use it for, but if it's inherently
unsafe in this configuration then it's gotta go)
- Move all of the service accounts to another database that ppolicy is
not installed on.
Thanks!