Hi everyone,
I'm trying to convert from MS based platform to *nix/Linux, specifically FreeBSD. I have a few questions and concerns about OpenLDAP's backend for performance and high reliability.
I have no problems setting OpenLDAP 2.3.38 to run with BDB 4.4.20.4 inside FreeBSD's (6.2 RELEASE) jail, using core, cosine, and inetorgperson schemas. Using the Quick-Start guide at openldap.org's website, I manage to create the layout of OUs, CNs, etc. to my needs using phpLDAPadmin after adding the base ldif file via command line:
dn: dc=example,dc=com objectclass: dcObject objectclass: organization o: Example Company dc: example
After thinking about the robustness of OpenLDAP due to it's BDB backend, I tried to convert over to back-sql and use MySQL 5.0.45 for it's backend. The SQL account is granted with full permission, including reference, to the specified database. The tables are populated using sample sql files - testdb_create.sql - backsql_create.sql - testdb_metadata.sql
Then I tried to add the same base ldif file via command line and get this error:
ldapadd: Server is unwilling to perform (53) additional info: operation not permitted within namingContext
and all of my problems begin even though the unixODBC connection is working properly, despite it's not logging to a file via syslog (not a big concern ATM). After a few days of frustration, research on documentation, and log analyzing, am I wrong to conclude after reading the contents of the sql files and this snip from log: ... slapd startup: initiated. backend_startup_one: starting "cn=config" config_back_db_open config_build_entry: "cn=config" config_build_entry: "cn=include{0}" config_build_entry: "cn=include{1}" config_build_entry: "cn=include{2}" config_build_entry: "cn=module{0}" config_build_entry: "cn=schema" config_build_entry: "cn={0}core" config_build_entry: "cn={1}cosine" config_build_entry: "cn={2}inetorgperson" config_build_entry: "olcDatabase={-1}frontend" config_build_entry: "olcDatabase={0}config" WARNING: No dynamic config support for database sql. config_build_entry: "olcDatabase={1}sql" backend_startup_one: starting "dc=sointe,dc=net" ==>backsql_db_open(): testing RDBMS connection backsql_db_open(): concat func not specified (use "concat_pattern" directive in slapd.conf) backsql_db_open(): subtree search SQL condition not specified (use "subtree_cond" directive in slapd.conf) backsql_db_open(): setting "ldap_entries.dn LIKE CONCAT('%',?)" as default backsql_db_open(): setting "ldap_entries.dn=?" as default backsql_db_open(): objectclass mapping SQL statement not specified (use "oc_query" directive in slapd.conf) backsql_db_open(): setting "SELECT id,name,keytbl,keycol,create_proc,delete_proc,expect_return FROM ldap_oc_mappings" by default ... to conclude for me to use any RDBMS for OpenLDAP's back-sql, I have to setup all the tables, functions and/or stored procedures, and insert all of those appropriately as data into the 4 tables: - ldap_attr_mappings - ldap_entries - ldap_entry_objclasses - ldap_oc_mappings before I can add the base ldif file via command line and use phpLDAPadmin to build & maintain it? What happens when I need to change directory layout/structure because my needs change? Is it feasible?
Here are a few case studies scenario where I see issues:
A) Small company This can be accomplished with OpenLDAP and the database servers on the same box. If the need requires more, OpenLDAP and the database server on separate boxes. (This scenario can also be accomplished using BDB for backend.)
A - 1) Small company grows (still 1 site) OpenLDAP becomes it's own box if it's not already and act as master. Add more OpenLDAP box(es) for proxy requests while the master handles updates/additions. The database server is then reconfigured to be clustered. All OpenLDAP servers connect to the database cluster. (Alternatives? Still possible with OpenLDAP+back-bdb in master/slave replication? What about performance and high reliability?)
A - 2) Company grows and expand to multi-site HQ (is the above scenario A - 1) and each site will have it's own OpenLDAP and database (as in scenario A) depending on requirements of each site and data connection to HQ. (Alternatives? Still possible with OpenLDAP+back-bdb in master/slave replication? What about performance and high reliability?)
B) Enterprise ( or company in scenario A - 2 grows even more) HQ will be setup as in scenario A - 1 but with more servers, both OpenLDAP and databases. Each site will be setup as in scenario A or A - 1 depending upon the requirement and function of each site.
C) What happens (to performance and reliability) when the total entries in OpenLDAP reaches 250,000? 1 million? or 1 billion (most likely to happen in scenario B)? Will it "shrugs it's shoulders as if nothing happens" and still function?
Based on the above the scenarios, should I invest the time to create all the tables, functions/stored procedures, and insert them as data into the 4 core tables as mentioned above? I guess there could be work around for all the scenarios except scenario C. Or should I go elsewhere for my directory service/server like OpenDS or Sun's Directory Server?
Thanks, Tommy
Tommy Pham wrote:
After thinking about the robustness of OpenLDAP due to it's BDB backend, I tried to convert over to back-sql and use MySQL 5.0.45 for it's backend.
What thinking did you do? Both back-bdb and back-hdb are fully ACID-compliant transactional backends. There is nothing more reliable, anywhere.
It seems you haven't read the FAQ yet. http://www.openldap.org/faq/data/cache/1165.html
Here are a few case studies scenario where I see issues:
The database server is then reconfigured to be clustered. All OpenLDAP servers connect to the database cluster. (Alternatives? Still possible with OpenLDAP+back-bdb in master/slave replication? What about performance and high reliability?)
Back-bdb and back-hdb are the most reliable and highest performance LDAP backends in the world, bar none. The backends are proven to scale to manage hundreds of millions of entries at transaction rates and response times many times faster than any other directory software in the world. You can benchmark them yourself against any software of your choice, the result will always be the same.
Back-sql exists to provide LDAP access to legacy SQL data; it's not suitable for general-purpose LDAP use. The SQL translation layer will always impose a large performance cost; it can never perform as well as a native backend.
Distributing data across clusters tends to be less cost-effective than using a single large database. E.g. using LVM Logical Volume Management it's trivial to add storage capacity to an existing database, without the need of clustering protocol overhead.
Hi Howard,
Thanks for your prompt reply.
--- Howard Chu hyc@symas.com wrote:
Tommy Pham wrote:
After thinking about the robustness of OpenLDAP due to it's BDB
backend,
I tried to convert over to back-sql and use MySQL 5.0.45 for it's backend.
What thinking did you do? Both back-bdb and back-hdb are fully ACID-compliant transactional backends. There is nothing more reliable, anywhere.
It seems you haven't read the FAQ yet. http://www.openldap.org/faq/data/cache/1165.html
Here are a few case studies scenario where I see issues:
The database server is then reconfigured to be clustered. All OpenLDAP servers connect to the database cluster. (Alternatives? Still possible with OpenLDAP+back-bdb in
master/slave
replication? What about performance and high reliability?)
Back-bdb and back-hdb are the most reliable and highest performance LDAP backends in the world, bar none. The backends are proven to scale to manage hundreds of millions of entries at transaction rates and response times many times faster than any other directory software in the world. You can benchmark them yourself against any software of your choice, the result will always be the same.
Back-sql exists to provide LDAP access to legacy SQL data; it's not suitable for general-purpose LDAP use. The SQL translation layer will always impose a large performance cost; it can never perform as well as a native backend.
Distributing data across clusters tends to be less cost-effective than using a single large database. E.g. using LVM Logical Volume Management it's trivial to add storage capacity to an existing database, without the need of clustering protocol overhead. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
My concerns are not just about performance for 1 box setup or 1 master with multiple slave replications and proxies. I'm more interested in the robustness such as Dynamic Schema(s), Multi-Master Replication, and Dynamic configuration (as featured in Apache DS). Multi-master or cluster setup have higher reliability and performance under heavy load with large data in my experience. Also, because I'm migrating from MS based platform, I intend to integrate other application servers into LDAP as well such as DNS (via bind-dlz), FTP, e-mail & groupware, Samba, etc... in the same way as MS integrates DNS and Exchange in it's Active Directory. Will OpenLDAP with back-bdb/hdb support all of that and still perform well when there are over millions of entries? As for native DB support vs layer like ODBC, why not just use the DB's native client library? (I guess this falls in line with development mailing list more than this mailing list.) I understand that "a directory is a specialized database optimized for reading, browsing and searching" and not writing. That's why I opt for having dedicated RDBMS vs embedded for distributed computing... just as enterprise applications are developed in n-tier.
Thanks, Tommy
Tommy Pham wrote:
My concerns are not just about performance for 1 box setup or 1 master with multiple slave replications and proxies. I'm more interested in the robustness such as Dynamic Schema(s), Multi-Master Replication, and Dynamic configuration (as featured in Apache DS).
Dynamic configuration and dynamic loading of schema have been supported since OpenLDAP 2.3. Multi-master replication is supported in OpenLDAP 2.4 (although in general, actual multi-master usage is almost always the wrong thing to do; floating master or single-master with hot standby are the only reliable approaches).
Multi-master or cluster setup have higher reliability and performance under heavy load with large data in my experience.
What experience is that? It would help to know what your point of reference is. What do you define as heavy load or large data? What is your definition of reliability? We've run OpenLDAP 2.3 on an SGI Altix with 32 Itanium CPUs on a database of over 150 million entries, delivering transaction rates of over 22,000 searches per second concurrent with over 4800 modifications per second, sustained for several hours. We know that OpenLDAP is unique in these capabilities because several other directory server packages also participated in these tests but most of them failed hard at much smaller sizes. The only other one to survive to the 150 million entry mark was turning in transaction rates orders of magnitude slower than ours.
On a dual-processor AMD Opteron server the slapd frontend can process over 32,000 authentications per second on 100Mbps ethernet - that's equivalent to over 128,000 packets per second, or over 90% of the theoretical bandwidth of the medium. In a clustered environment you'll never get rates this high or latencies this low, because of the overhead in communicating with a remote DB server/cluster.
Also, because I'm migrating from MS based platform, I intend to integrate other application servers into LDAP as well such as DNS (via bind-dlz), FTP, e-mail & groupware, Samba, etc... in the same way as MS integrates DNS and Exchange in it's Active Directory. Will OpenLDAP with back-bdb/hdb support all of that and still perform well when there are over millions of entries?
Yes, easily, and far better than anything else could ever hope to.
As for native DB support vs layer like ODBC, why not just use the DB's native client library?
That only eliminates part of the overhead. Back-bdb's storage format is also highly optimized; getting raw access to the data of a relational system still means accessing individual rows and columns. This is still a significant performance cost.
(I guess this falls in line with development mailing list more than this mailing list.) I understand that "a directory is a specialized database optimized for reading, browsing and searching" and not writing. That's why I opt for having dedicated RDBMS vs embedded for distributed computing... just as enterprise applications are developed in n-tier.
Separating the OpenLDAP frontend from the storage backend offers no benefits; it only incurs additional costs in performance and administration overhead. N-tier architectures make sense in large enterprises for keeping data close to where it will be used. But they don't offer any actual reliability benefits. Simple algebra tells you that these designs decrease MTBF, they can never increase it.
--On Sunday, October 14, 2007 6:13 PM -0700 Howard Chu hyc@symas.com wrote:
(I guess this falls in line with development mailing list more than this mailing list.) I understand that "a directory is a specialized database optimized for reading, browsing and searching" and not writing. That's why I opt for having dedicated RDBMS vs embedded for distributed computing... just as enterprise applications are developed in n-tier.
Separating the OpenLDAP frontend from the storage backend offers no benefits; it only incurs additional costs in performance and administration overhead. N-tier architectures make sense in large enterprises for keeping data close to where it will be used. But they don't offer any actual reliability benefits. Simple algebra tells you that these designs decrease MTBF, they can never increase it.
One other point here that Howard left out, is that back-hdb was designed to also be write efficient, not just read efficient. That's another important piece of data to have.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
On Monday 15 October 2007 02:36:39 Tommy Pham wrote:
My concerns are not just about performance for 1 box setup or 1 master with multiple slave replications and proxies. I'm more interested in the robustness such as Dynamic Schema(s), Multi-Master Replication, and Dynamic configuration (as featured in Apache DS).
Howard answered you on these aspects.
Multi-master or cluster setup have higher reliability and performance under heavy load with large data in my experience.
Multi-master's only real benefit is a cheaper "HA cluster" feature. HA clusters are not necessarily more reliable (more complex), and don't perform any better (unless this is purely because the storage backend is faster ...). No cluster/multi-master setup is going to help with write load, and read load can easily be spread with slaves.
Also, because I'm migrating from MS based platform, I intend to integrate other application servers into LDAP as well such as DNS (via bind-dlz),
bind-dlz supports LDAP (but I won't believe the performance results at http://bind-dlz.sourceforge.net/ldap_perf.html myself - especially since no versions are listed, and the entry cache seems to be about 1000 times too small), but I note that, unless you are doing mass DNS hosting (hundreds of zones), bind sdb_ldap supports DNS records in LDAP sufficiently IMHO (I have about 10 zones stored in OpenLDAP).
FTP
All popular Unix FTP servers support LDAP, for authentication at least, most support other features (such as Bandwidth quotas etc.).
, e-mail & groupware,
Most Unix groupware suites (zimbra, scalix, kolab, insight, etc.) require or ship OpenLDAP, and use the directory as their primary account store.
Samba,
The only supported non-local-file password backend for Samba is LDAP. OpenLDAP is probably the most commonly used LDAP server with Samba.
etc... in the same way as MS integrates DNS and Exchange in it's Active Directory. Will OpenLDAP with back-bdb/hdb support all of that and still perform well when there are over millions of entries?
Why not ? It supports our mail system with 1.1 million entries. We don't have that much in the way of samba/DNS/dhcp/sudo/freeradius entries, only a couple of hundred of each, but I don't see how the kind of account is relevant. Only the attribute size/index and query loads are relevant.
As for native DB support vs layer like ODBC, why not just use the DB's native client library? (I guess this falls in line with development mailing list more than this mailing list.)
I'm guessing (with some experience in high-load environments using ODBC for things such as bandwidth accounting databases) the ODBC layer is probably not the bottleneck ... so why sacrifice portability (e.g. users using distributions shipping OpenLDAP with odbc support should in theory only need to install the Oracle client software to be able to use Oracle) for a small performance gain.
I understand that "a directory is a specialized database optimized for reading, browsing and searching" and not writing. That's why I opt for having dedicated RDBMS vs embedded for distributed computing...
Maybe you should rather opt for having a dedicated directory server accessed via the LDAP protocol instead of some embedded database (I don't know what you are referring to here ... it can't be OpenLDAP ...)?
I note that replication features in OpenLDAP are most likely superior in flexibility/robustness to most popular RDBMSs (depending on your requirements of course).
just as enterprise applications are developed in n-tier.
And OpenLDAP is a very reliable implementation of LDAP as one of those enterprise tiers, and used as such in many organisations.
Regards, Buchan
openldap-software@openldap.org