On Thu, 2008-01-24 at 15:08 -0800, Howard Chu wrote:
In my experience, 4 million objects (at around 3KB per entry) is near the limit of what will fit into 16GB of RAM. Sounds like you need a server with more than 16GB if you want to keep growing and not be waiting on disks.
I was going through the caching discussion in section 19.4 at http://www.openldap.org/doc/admin24/tuning.html#Performance%20Factors, which talks about how much RAM to devote to what type of cache, but it had not occurred to me to try to just throw the whole thing (of this size) into RAM and be done with it.
The single-master constraints on OpenLDAP were never about performance. Even with OpenLDAP 2.2 the concurrent read/write rates for back-bdb are faster than any other directory server. It's always been about data consistency, and the fact that it's so easy to lose it in a multi-master setup.
I wasn't concerned so much about single-master versus multi-master. I was thinking more about the issue where a very read-intensive environment did not mix well with an environment where a lot of writes were also occurring directly to the system, but where bulk writes from slurpd (at the time) would be handled relatively efficiently while also handling high read intensity.
Thus the reason for ensuring that all updates went directly and only to the master, the master was replicated out to the slaves via slurpd, and the slaves only handled reads.
Of course, the modern architecture doesn't use slurpd, I was just wondering if it might make more sense from a scalability perspective to have a similar data flow architecture.
You've been brainwashed by all the marketing lies other LDAP vendors tell about multi-master replication. Multi-master has no relation to performance.
Again, I wasn't looking at single-master versus multi-master. If I gave that impression, I'm sorry.
It's only about fault tolerance and high availability. No matter whether you choose a single-master or a multi-master setup, with the same number of machines, the same number of writes must be propagated to all servers, so the overall performance will be the same.
I'm confused.
So there's no performance benefit to doing bulk writes via syncrepl to the slaves as opposed to individual writes to the master(s) via ldapadd? Then why have syncrepl at all and instead just have everything handled by ldapadd?
I understand the consistency argument for single-master versus multi-master, I'm just trying to find a way to partition the problem space for performance reasons, in addition to any consistency reasons.
That's a pointless question. The right question is - how fast do you need it to be? What load are you experiencing now, what constitutes a noticeable delay, and how often do you see those?
Good questions, but I'm not sure I've got the answers. I know that our OpenLDAP directory system is going to be used as a critical component of a campus-wide authentication system, and the target for the authentication system is to handle at least hundreds of authentications per second. Problem is, I don't know what that translates to in terms of numbers of OpenLDAP operations per second, and what the mix of reads are relative to the writes.
And the authentication system is just one of the many consumers of data from the OpenLDAP system.
So, at the very least, I would be surprised if the OpenLDAP system didn't have to handle at least thousands of read operations per second, and at peak it may also have to handle thousands of write operations per second -- a single student might potentially have dozens or a hundred or more data elements to be written or updated, and then they may be registering for a half-dozen classes or more at once, and each class might have hundreds of data elements that might also need to be updated. The domino effect of a single high-level entity being added or modified could potentially result in hundreds or thousands of smaller operations that need to be done as a result.
But right now, I'm just guessing.
I haven't actually seen the systems yet, and I don't know what the schemas look like, so I can only speak from my limited experience with OpenLDAP in the past and places where relatively simple uses could result in dozens of data elements for a single entity, and I don't know how OpenLDAP handles those kinds of things internally.
Is CPU more important, or RAM, or disk space/latency?
If you have enough RAM, disk latency shouldn't be a problem. Disk space is so cheap today that it should never be a problem. CPU, well, that depends on your performance target.
I'm not so worried about disk space per se. I would be more concerned about disk latency and throughput being potential bottlenecks.
Generally I like the idea of having compact/simple slapd configs spread all over. With the old slapd.conf that would have been rather painful to administer though. Also in general, more moving parts means more things that can break.
Much appreciated. Thanks!