Hello,
I'm trying to migrate about 19 million entries from OpenLDAP 2.0 to the new 2.3. Conversion and such things are done, but inserting the data takes days.
I hope someone can point me to some helpful direction, because 22 Entries/sec is not too good, especially not for real life use...
I'm using the following to insert the data slapadd -q -v -c
My DB_CONFIG looks like this:
set_cachesize 1 524288000 1 set_lg_regionmax 262144 set_lg_bsize 2097152
Our OpenLDAP version is 2.3.27.
The speed is like following (took that from the output of my insertion script):
*********************************************************** Started run at: 1161817043
Partly run took 642 seconds for 577552 entries Avg.: 899,614 entries/second Partly run took 715 seconds for 577519 entries Avg.: 807,719 entries/second Partly run took 1001 seconds for 607053 entries Avg.: 606,447 entries/second Partly run took 1639 seconds for 610732 entries Avg.: 372,625 entries/second Partly run took 3311 seconds for 610547 entries Avg.: 184,4 entries/second Partly run took 7078 seconds for 610305 entries Avg.: 86,2256 entries/second Partly run took 13104 seconds for 610531 entries Avg.: 46,5912 entries/second Partly run took 19093 seconds for 610394 entries Avg.: 31,9695 entries/second Partly run took 22353 seconds for 610609 entries Avg.: 27,3166 entries/second Partly run took 23831 seconds for 610425 entries Avg.: 25,6147 entries/second Partly run took 24903 seconds for 610223 entries Avg.: 24,504 entries/second Partly run took 25121 seconds for 610223 entries Avg.: 24,2913 entries/second Partly run took 25382 seconds for 610177 entries Avg.: 24,0398 entries/second Partly run took 25013 seconds for 610042 entries Avg.: 24,389 entries/second Partly run took 25048 seconds for 610250 entries Avg.: 24,3632 entries/second Partly run took 24881 seconds for 610460 entries Avg.: 24,5352 entries/second Partly run took 24587 seconds for 610152 entries Avg.: 24,816 entries/second Partly run took 24907 seconds for 610252 entries Avg.: 24,5012 entries/second Partly run took 24841 seconds for 610605 entries Avg.: 24,5805 entries/second Partly run took 24627 seconds for 610432 entries Avg.: 24,7871 entries/second Partly run took 24344 seconds for 610229 entries Avg.: 25,0669 entries/second Partly run took 23958 seconds for 610343 entries Avg.: 25,4755 entries/second Partly run took 24186 seconds for 610629 entries Avg.: 25,2472 entries/second Partly run took 24349 seconds for 610377 entries Avg.: 25,0678 entries/second Partly run took 24634 seconds for 610679 entries Avg.: 24,7901 entries/second Partly run took 24897 seconds for 610230 entries Avg.: 24,5102 entries/second Partly run took 24902 seconds for 610258 entries Avg.: 24,5064 entries/second Partly run took 25800 seconds for 610327 entries Avg.: 23,6561 entries/second Partly run took 26605 seconds for 610478 entries Avg.: 22,946 entries/second Partly run took 27022 seconds for 610301 entries Avg.: 22,5853 entries/second Partly run took 27535 seconds for 610207 entries Avg.: 22,1611 entries/second
Finished run at: 1162437948 Run took 620309 seconds for 18914119 entries Avg.: 30,4914 entries/second ***********************************************************
We are using the following machine:
Linux ldaprep4 2.6.15.3 #1 SMP Mon Feb 13 09:18:43 CET 2006 i686 GNU/Linux MemTotal: 5975412 kB SwapTotal: 2150152 kB 2 * Intel(R) Pentium(R) III CPU family 1133MHz
The slapd.conf is as follows (Don't mind the /tmp as path, I changed that ;-)):
-------------------------------------------------------------------------------------------------- include /tmp/etc/openldap/schema/core.schema include /tmp/etc/openldap/schema/freenet.schema pidfile /tmp/var/ldap/run/slapd.pid argsfile /tmp/var/ldap/run/slapd.args modulepath /tmp/lib moduleload back_bdb.la access to * by * write loglevel 0 sizelimit 10000 timelimit 3600 cachesize 1000000 backend bdb
####################################################################### # BDB database definitions #######################################################################
# first database definition & config directives database bdb
directory /var/lib/ldap/ replogfile /tmp/log/replica.log
rootdn "cn=root,o=....." rootpw .....
suffix "o=....."
#replica uri=ldap://ldaprep1:389 binddn="cn=root,o=..." bindmethod=simple credentials=... #replica uri=ldap://ldaprep2:389 binddn="cn=root,o=..." bindmethod=simple credentials=... #replica uri=ldap://ldaprep3:389 binddn="cn=root,o=..." bindmethod=simple credentials=...
#attribute homeDirectory ces #attribute folderName ces #attribute locked ces # index cid pres,eq index cn pres,eq,sub index objectClass pres,eq index folderName pres,eq index locked pres,eq --------------------------------------------------------------------------------------------------
Thans in advance, Ralf
Ralf Narozny wrote:
Hello,
I'm trying to migrate about 19 million entries from OpenLDAP 2.0 to the new 2.3. Conversion and such things are done, but inserting the data takes days.
I hope someone can point me to some helpful direction, because 22 Entries/sec is not too good, especially not for real life use...
I'm using the following to insert the data slapadd -q -v -c
My DB_CONFIG looks like this:
set_cachesize 1 524288000 1 set_lg_regionmax 262144 set_lg_bsize 2097152
Our OpenLDAP version is 2.3.27.
The speed is like following (took that from the output of my insertion script):
The gradual slowdown you're seeing indicates that the BDB cache is too small.
Since you have two CPUs you should set "tool-threads 2" in your slapd.conf.
You should not use a presence index on objectClass, just equality. Every object is required to have an objectClass attribute, so presence indexing on it is just a waste of time.
As I noted just a few hours ago on this list http://www.openldap.org/lists/openldap-software/200611/msg00051.html
you're unlikely to be able to configure sufficient cache to get good performance with a 19 million entry DB on a 32 bit server. With only ~3GB of working memory available to a process, your performance here is going to be limited to the speed of your disk drives.
Finished run at: 1162437948 Run took 620309 seconds for 18914119 entries Avg.: 30,4914 entries/second
We are using the following machine:
Linux ldaprep4 2.6.15.3 #1 SMP Mon Feb 13 09:18:43 CET 2006 i686 GNU/Linux MemTotal: 5975412 kB SwapTotal: 2150152 kB 2 * Intel(R) Pentium(R) III CPU family 1133MHz
The slapd.conf is as follows (Don't mind the /tmp as path, I changed that ;-)):
include /tmp/etc/openldap/schema/core.schema include /tmp/etc/openldap/schema/freenet.schema pidfile /tmp/var/ldap/run/slapd.pid argsfile /tmp/var/ldap/run/slapd.args modulepath /tmp/lib moduleload back_bdb.la access to * by * write loglevel 0 sizelimit 10000 timelimit 3600 cachesize 1000000 backend bdb
####################################################################### # BDB database definitions #######################################################################
# first database definition & config directives database bdb
directory /var/lib/ldap/ replogfile /tmp/log/replica.log
rootdn "cn=root,o=....." rootpw .....
suffix "o=....."
#replica uri=ldap://ldaprep1:389 binddn="cn=root,o=..." bindmethod=simple credentials=... #replica uri=ldap://ldaprep2:389 binddn="cn=root,o=..." bindmethod=simple credentials=... #replica uri=ldap://ldaprep3:389 binddn="cn=root,o=..." bindmethod=simple credentials=...
#attribute homeDirectory ces #attribute folderName ces #attribute locked ces # index cid pres,eq index cn pres,eq,sub index objectClass pres,eq index folderName pres,eq index locked pres,eq
Thans in advance, Ralf
--On Tuesday, November 07, 2006 5:04 PM +0100 Ralf Narozny rnarozny@web.de wrote:
Hello,
I'm trying to migrate about 19 million entries from OpenLDAP 2.0 to the new 2.3. Conversion and such things are done, but inserting the data takes days.
I hope someone can point me to some helpful direction, because 22 Entries/sec is not too good, especially not for real life use...
I'm using the following to insert the data slapadd -q -v -c
My DB_CONFIG looks like this:
set_cachesize 1 524288000 1 set_lg_regionmax 262144 set_lg_bsize 2097152
It looks like from your stats that slapd has consumed all of the memory given to it, and now has to swap. You need to adjust your DB_CONFIG file so that there is more memory available to BDB, assuming of course you have enough RAM for that.
Since you have dual CPU's, you could set "tool-threads 2" in slapd.conf, as well.
Basically, if you want your slapadd to run quickly, you need to have enough RAM on the system, and the DB_CONFIG file configured, so that the DB can fit into RAM while being loaded. That's why my systems have 8GB and may expand to 16GB before too long as we continue to expand the data in the servers.
--Quanah
-- Quanah Gibson-Mount Principal Software Developer ITS/Shared Application Services Stanford University GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html
On 11/7/06, Quanah Gibson-Mount quanah@stanford.edu wrote:
--On Tuesday, November 07, 2006 5:04 PM +0100 Ralf Narozny rnarozny@web.de wrote:
Hello,
I'm trying to migrate about 19 million entries from OpenLDAP 2.0 to the new 2.3. Conversion and such things are done, but inserting the data takes days.
I hope someone can point me to some helpful direction, because 22 Entries/sec is not too good, especially not for real life use...
I'm using the following to insert the data slapadd -q -v -c
My DB_CONFIG looks like this:
set_cachesize 1 524288000 1 set_lg_regionmax 262144 set_lg_bsize 2097152
It looks like from your stats that slapd has consumed all of the memory given to it, and now has to swap. You need to adjust your DB_CONFIG file so that there is more memory available to BDB, assuming of course you have enough RAM for that.
Since you have dual CPU's, you could set "tool-threads 2" in slapd.conf, as well.
Basically, if you want your slapadd to run quickly, you need to have enough RAM on the system, and the DB_CONFIG file configured, so that the DB can fit into RAM while being loaded. That's why my systems have 8GB and may expand to 16GB before too long as we continue to expand the data in the servers.
Is tool-threads documented somewhere?
--On Tuesday, November 07, 2006 1:52 PM -0500 matthew sporleder msporleder@gmail.com wrote:
On 11/7/06, Quanah Gibson-Mount quanah@stanford.edu wrote:
Is tool-threads documented somewhere?
Yes, in the slapd.conf(5) man page.
tool-threads <integer> Specify the maximum number of threads to use in tool mode. This should not be greater than the number of CPUs in the system. The default is 1.
--Quanah
-- Quanah Gibson-Mount Principal Software Developer ITS/Shared Application Services Stanford University GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html
On 11/7/06, Quanah Gibson-Mount quanah@stanford.edu wrote:
--On Tuesday, November 07, 2006 1:52 PM -0500 matthew sporleder msporleder@gmail.com wrote:
On 11/7/06, Quanah Gibson-Mount quanah@stanford.edu wrote:
Is tool-threads documented somewhere?
Yes, in the slapd.conf(5) man page.
tool-threads <integer> Specify the maximum number of threads to use in tool mode.
This should not be greater than the number of CPUs in the system. The default is 1.
Ah, I was looking at the man pages online. I found it in cvs and on my server. (www is from 2005/06/10, which is after the commit of that line on 2005/10/28)
matthew sporleder wrote:
Ah, I was looking at the man pages online. I found it in cvs and on my server. (www is from 2005/06/10, which is after the commit of that line on 2005/10/28)
I just have a link on my own web server to my source tree. The manpages are generated dynamically by the man2html program I wrote, so I'm always looking at the most up to date docs.
matthew sporleder wrote:
Is tool-threads documented somewhere?
Quanah already responded. But I have to ask - why in the world do you even need to ask a question like this? Just look in the man page first, then it will be obvious that it is (or isn't) documented.
Is it really faster and more efficient for you to email hundreds or thousands of people with a question, than to look at the man pages already distributed with the software?
Howard Chu schrieb:
Is tool-threads documented somewhere?
Quanah already responded. But I have to ask - why in the world do you even need to ask a question like this? Just look in the man page first, then it will be obvious that it is (or isn't) documented.
Is it really faster and more efficient for you to email hundreds or thousands of people with a question, than to look at the man pages already distributed with the software?
You are right.
But on the other hand: Why does not someone get the most recent information on the project web site itself?
When he checked the web man page on openldap.org http://www.openldap.org/software/man.cgi , he got old information. You cannot blame someone for checking "your" official source of information, can you?
At least you should add an disclaimer on the side to tell people: "Recent information is only available with the source tarball."
Hans
Hans Moser wrote:
Howard Chu schrieb:
Is tool-threads documented somewhere?
Quanah already responded. But I have to ask - why in the world do you even need to ask a question like this? Just look in the man page first, then it will be obvious that it is (or isn't) documented.
Is it really faster and more efficient for you to email hundreds or thousands of people with a question, than to look at the man pages already distributed with the software?
You are right.
But on the other hand: Why does not someone get the most recent information on the project web site itself?
When he checked the web man page on openldap.org http://www.openldap.org/software/man.cgi , he got old information. You cannot blame someone for checking "your" official source of information, can you?
The web site only provides one or two snapshots of the manpages. There are 30 releases of 2.2 and getting to 30 releases of 2.3. It is unreasonable to expect that the documentation on the web site matches the specific version of the software you're running. Common sense says that the only version of the documentation that is reliable enough to use is the version that was included with the particular release you're using.
At least you should add an disclaimer on the side to tell people: "Recent information is only available with the source tarball."
It's not a question of "recent" vs "not recent" - it's a question of "exactly matching" or not. Reading docs for a different version of software than you're currently running will only cause confusion, as has been demonstrated numerous times on this list.
A set of software and manpages are collected together for a reason. If you're going to ignore the docs that are provided in the particular distro you download, that's your own concern.
openldap-software@openldap.org