Quanah Gibson-Mount wrote:
--On Tuesday, November 7, 2023 12:56 PM +0000 falgon.comp(a)gmail.com wrote:
Hello, sorry for the delay. Thank's for the answers,
Generally, with something like lastbind, you'll run into collissions of the timestamp, which will cause a lot of havoc with replication. It is not the only case where this can occur. I highly advise reading the caveats in the admin guide about MPR replication.
Yes, that's what we thought at first, but with the various tests we've carried out, we're doubtful about the collision problem. When testing with a single account that BIND more than 500 times per second, we can't reproduce the problem. The same applies to 10 accounts looping at 500 BIND/s.
So I'm looking at your configuration and have some question:
a) olcPasswordCryptSaltFormat: $6$rounds=10000$%.16s -> Why are you using crypt passwords? OpenLDAP ships with multiple, secure module for password hashing, such as argon2. I'd advise using that. Note that crypt is non-portable.
b) olcLogLevel: stats sync
This generally should be:
olcLogLevel: stats olcLogLevel: sync
c) olcPasswordHash: {CRYPT} -> See (a)
d) I'd suggest not using a root password at all for cn=config, and use EXTERNAL auth over ldapi. If you are going to use one, upgrade to argon2
e) Why do you have separate credentions for the monitor db?
f) Delete this index: olcDbIndex: pwdLastSuccess eq,pres
g) olcSpReloadHint: TRUE -> This setting should *not* be on the main DB, delete it from dn: olcOverlay={0}syncprov,olcDatabase={2}mdb,cn=config
h) For your benchmark test, this is probably not frequent enough, as the purge will never run since you're saying only data > 1 day old: olcAccessLogPurge: 01+00:00 00+04:00
i) For the accesslog DB, are you sure this is a large enough size? olcDbMaxSize: 2147483648 or are you hitting 2GB?
Also it appears you're running this test on two slapds running on the same server? That's an incredibly bad idea, since the I/O will conflict massively between the two processes writing to disk.
Hello, thank you for the answer and the time for reading the config files of Meheni. I will can answer you for all your questions:
a) + c) Why are you using crypt passwords? - We're using Crypt because we're migrating from an old solution to OpenLDAP and the Crypt option is the most secure and compatible for us.
b) olcLogLevel: stats sync - We running our tests with stats only. Meheni probably left this configuration to check before sending the config here.
d) I'd suggest not using a root password at all for cn=config - Thank you for this option, we will probably try it
e) Why do you have separate credentions for the monitor db? - sorry for this i don't understand the word credentions. Do you mean credentials ?
f) Delete this index: olcDbIndex: pwdLastSuccess eq,pres - This Index are used in some filters, + we have trying another architecture with 1 provider and multiples consumers + a referalForward. But yes this is a good idea, we will try our tests without this index. With 300+ BIND/s this Index is constantly recalculated. Thanks
g) olcSpReloadHint: TRUE -> This setting should *not* be on the main DB, - Thanks yes we have it in the two DB, we will delete it.
h) For your benchmark test, this is probably not frequent enough, as the purge will never run since you're saying only data - We've run endurance tests to include purging. This settings is from a month ago and we have change this settings multiples times for testing differents setup.To add the purge during tests, we actually set it to 00+01:00 00+00:03. In the final configuration we will probably set it too : 03+00:00 00+00:03. We found that purging every 3 minutes reduced the impact on performance.
i) + last question : For the accesslog DB, are you sure this is a large enough size? Also it appears you're running this test on two slapds running on the same server?
- This is because of Meheni's configuration when we cleaned up our configuration files to share it here for privacy reasons. (he tried to reproduce it on his virtual machine and reproduced it) We running our test on 4 servers with only one slapd by server. And actually the accesslog size on each server are : 64424509440
Further information : we tested the consumer/provider mode before MPR, but it didn't meet our needs. We have better performances with the current configuration. (all that remains is to find a solution to the replication problem)
I repost a previous question here too : What are the exact messages or errors messages we should find in case of a collision problem?
Thanks again for your time and your help