I just sow what is going on...
the search returns sth like tihs:
<-------------------------------------snip---------------------------------------->
# 039010, 046010.100.8000.100, 99893, bestMatchPrefixList, sipDirektor, ot.hr dn: originatorPrefixID=039010,carrierPrefixID=046010.100.8000.100,bestMatchPre fix=99893,ou=bestMatchPrefixList,ou=sipDirektor,dc=ot,dc=hr originatorPrefix: 039010 priority: 100 originator: 039010 originatorPrefixID: 039010 objectClass: top objectClass: originatorPrefixID
# 385, bestMatchPrefixList, sipDirektor, ot.hr dn: bestMatchPrefix=385,ou=bestMatchPrefixList,ou=sipDirektor,dc=ot,dc=hr destination: Croatia bestMatchPrefix: 385 objectClass: top objectClass: bestMatchPrefix
# 006800.100.10000.100, 385, bestMatchPrefixList, sipDirektor, ot.hr dn: carrierPrefixID=006800.100.10000.100,bestMatchPrefix=385,ou=bestMatchPrefi xList,ou=sipDirektor,dc=ot,dc=hr qos: 100 priority: 10000 carrierPrefixID: 006800.100.10000.100 carrierPrefix: 006800 weight: 100 carrier: Optima Telekom objectClass: top objectClass: carrierPrefixID
# 000010, 006800.100.10000.100, 385, bestMatchPrefixList, sipDirektor, ot.hr dn: originatorPrefixID=000010,carrierPrefixID=006800.100.10000.100,bestMatchPr efix=385,ou=bestMatchPrefixList,ou=sipDirektor,dc=ot,dc=hr originatorPrefix: 000010 priority: 100 originator: T-COM/HT originatorPrefixID: 000010 objectClass: top objectClass: originatorPrefixID
it stops here for a while and downbelow are the remainig entries that i added with ldapadd asfer i recreated the database from ldif file.... Something is wrong with this entries .. either are not indexed or something... Just to menitio .. I'm runing the same search several times with same results... Always stops here and the entries i added with ldapadd are returned after a while ... if ever.
# 043010.100.10000.100, 385, bestMatchPrefixList, sipDirektor, ot.hr dn: carrierPrefixID=043010.100.10000.100,bestMatchPrefix=385,ou=bestMatchPrefi xList,ou=sipDirektor,dc=ot,dc=hr qos: 100 priority: 10000 carrierPrefixID: 043010.100.10000.100 carrierPrefix: 043010 weight: 100 carrier: Telekom Austria objectClass: top objectClass: carrierPrefixID
# 000010, 043010.100.10000.100, 385, bestMatchPrefixList, sipDirektor, ot.hr dn: originatorPrefixID=000010,carrierPrefixID=043010.100.10000.100,bestMatchPr efix=385,ou=bestMatchPrefixList,ou=sipDirektor,dc=ot,dc=hr originatorPrefix: 000010 priority: 100 originator: T-COM/HT originatorPrefixID: 000010 objectClass: top objectClass: originatorPrefixID
# 078120.100.10000.100, 385, bestMatchPrefixList, sipDirektor, ot.hr dn: carrierPrefixID=078120.100.10000.100,bestMatchPrefix=385,ou=bestMatchPrefi xList,ou=sipDirektor,dc=ot,dc=hr qos: 100 priority: 10000 carrierPrefixID: 078120.100.10000.100 carrierPrefix: 078120 weight: 100 carrier: Lanck Telekom objectClass: top objectClass: carrierPrefixID
# 000010, 078120.100.10000.100, 385, bestMatchPrefixList, sipDirektor, ot.hr dn: originatorPrefixID=000010,carrierPrefixID=078120.100.10000.100,bestMatchPr efix=385,ou=bestMatchPrefixList,ou=sipDirektor,dc=ot,dc=hr originatorPrefix: 000010 priority: 100 originator: T-COM/HT originatorPrefixID: 000010 objectClass: top objectClass: originatorPrefixID
# search result search: 2 result: 0 Success
# numResponses: 101584 # numEntries: 101583
Tihomir.
On Fri, Sep 11, 2009 at 5:10 PM, Tihomir Culjaga tculjaga@gmail.com wrote:
Hi Quanah,
I moved to OpenLDAP 2.4.18 and patched B DB 4.7.25 with all 4 patches from oracle.
I DIDN't change slapd.config at all
i reduced the number of entries to a total of 3437278.
[root@l01lnp2 ~]# du -c -h /var/lib/ldap/*.bdb 200K /var/lib/ldap/bestMatchPrefix.bdb 982M /var/lib/ldap/dn2id.bdb 2.4G /var/lib/ldap/id2entry.bdb 1.8M /var/lib/ldap/objectClass.bdb 1.2M /var/lib/ldap/originatorPrefixID.bdb 48M /var/lib/ldap/uniqueID.bdb 3.4G total <= interesting ... almost the same as number of entries :)
changed DB_CONFIG to cache 7 GB:
set_cachesize 7 0 1 set_lg_regionmax 262144 set_lg_bsize 2097152
my system has 10 GB of RAM and the situation now is:
[root@l01lnp2 ~]# free total used free shared buffers cached Mem: 10234924 10176544 58380 0 2144 3786596 -/+ buffers/cache: 6387804 3847120 Swap: 4096564 753572 3342992 [root@l01lnp2 ~]#
When i'm doing ldapsearch (time ldapsearch -h localhost -x -b ou=bestMatchPrefixList,ou=sipDirektor,dc=ot,dc=hr -D cn=admin,dc=ot,dc=hr -w pero99) before i actuall add anything with ldapadd, the search completes within 40 seconds. slapd process takes 24 - 26% memory.
After I add new entries (just 2 more) and perform the same search, it hangs after a while. When it ldapsearch finishes returning entries, i see slapd process memory starts growing .... it is taking almost everything.... reaching 97% ?!?! It is always like this.... the search throws all entries and then waits for some time .. it is almost random 60 seconds - 6 minutes to actually exit.
Please can you take a loot to strace logs i've attached in my previous e-mail... as asoon as the ldapsearch stops returning entries i see a lot of jubrish there...
Here is slapd process memory growth:
*top - 16:42:22 up* 4 days, 1:02, 2 users, load average: 2.13, 0.67, 0.23 Tasks: 119 total, 1 running, 118 sleeping, 0 stopped, 0 zombie Cpu(s): 0.8%us, 0.2%sy, 0.0%ni, 70.0%id, 28.8%wa, 0.0%hi, 0.2%si, 0.0%st Mem: 10234924k total, 10177568k used, 57356k free, 6676k buffers Swap: 4096564k total, 36516k used, 4060048k free, 3603688k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9404 ldap 25 0 13.3g 8.8g 2.8g S 4.0 *89.7 * 1:13.49 slapd *
- 1 root 15 0 10344 372 344 S 0.0 0.0 0:01.69 init 2 root RT -5 0 0 0 S 0.0 0.0 0:00.06 migration/0
Tasks: 117 total, 1 running, 116 sleeping, 0 stopped, 0 zombie Cpu(s): 7.2%us, 0.7%sy, 0.0%ni, 67.5%id, 24.3%wa, 0.0%hi, 0.3%si, 0.0%st Mem: 10234924k total, 10177968k used, 56956k free, 6656k buffers Swap: 4096564k total, 36516k used, 4060048k free, 3580356k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9404 ldap 25 0 13.3g 8.9g 2.9g S 30.3 *90.9* 1:16.76 slapd 325 root 10 -5 0 0 0 S 0.7 0.0 5:37.11 kswapd0 8458 root 15 0 0 0 0 D 0.3 0.0 0:02.02 pdflush
Tasks: 117 total, 1 running, 116 sleeping, 0 stopped, 0 zombie Cpu(s): 1.0%us, 0.3%sy, 0.0%ni, 72.3%id, 26.1%wa, 0.0%hi, 0.3%si, 0.0%st Mem: 10234924k total, 10180560k used, 54364k free, 6140k buffers Swap: 4096564k total, 36516k used, 4060048k free, 3488164k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9404 ldap 25 0 13.4g 9.3g 3.2g S 4.7 *95.5* 1:28.86 slapd 8458 root 15 0 0 0 0 D 0.7 0.0 0:02.20 pdflush
Tasks: 117 total, 1 running, 116 sleeping, 0 stopped, 0 zombie Cpu(s): 0.9%us, 0.4%sy, 0.0%ni, 70.5%id, 28.0%wa, 0.0%hi, 0.2%si, 0.0%st Mem: 10234924k total, 10177812k used, 57112k free, 3492k buffers Swap: 4096564k total, 36516k used, 4060048k free, 3481476k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9404 ldap 25 0 13.4g 9.4g 3.2g S 4.3* 95.9* 1:30.39 slapd * * 325 root 10 -5 0 0 0 S 0.7 0.0 5:38.08 kswapd0
*top - 16:45:01 up *4 days, 1:05, 2 users, load average: 1.91, 1.40, 0.59 Tasks: 117 total, 1 running, 116 sleeping, 0 stopped, 0 zombie Cpu(s): 3.2%us, 0.2%sy, 0.0%ni, 75.0%id, 21.4%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 10234924k total, 10179744k used, 55180k free, 396k buffers Swap: 4096564k total, 42328k used, 4054236k free, 3473624k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9404 ldap 25 0 13.5g 9.4g 3.3g S 13.6 *96.7* 1:33.44 slapd 9490 root 15 0 0 0 0 S 0.3 0.0 0:00.31 pdflush
*top - 16:45:33 up *4 days, 1:05, 2 users, load average: 1.55, 1.36, 0.60 Tasks: 117 total, 1 running, 116 sleeping, 0 stopped, 0 zombie Cpu(s): 2.7%us, 0.2%sy, 0.0%ni, 74.7%id, 22.3%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 10234924k total, 10180100k used, 54824k free, 652k buffers Swap: 4096564k total, 118616k used, 3977948k free, 3521232k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9404 ldap 25 0 13.5g 9.4g 3.3g S 10.6 *96.6* 1:37.36 slapd 325 root 10 -5 0 0 0 S 0.3 0.0 5:38.63 kswapd0
This looks to me as a memory leak bug to me.
Tihomir.
On Thu, Sep 10, 2009 at 9:37 PM, Quanah Gibson-Mount quanah@zimbra.comwrote:
--On Thursday, September 10, 2009 8:56 PM +0200 Tihomir Culjaga < tculjaga@gmail.com> wrote:
So, the situation is that i have 2 ldif files i'm recreating the database
from.
/usr/local/libexec/slapadd -l /home/tculjaga/file2.ldif -f /usr/local/etc/openldap/slapd.conf /usr/local/libexec/slapadd -l /home/tculjaga/file2.ldif -f /usr/local/etc/openldap/slapd.conf
I would suggest you just make these a single file, so all the work can be done at one time.
I tried to re-index with /usr/local/libexec/slapindex -f
/usr/local/etc/openldap/slapd.conf -v restart slapd process, restart the machine ... it is always the same issue.
Nothing here indicates a problem with your indices. Running slapindex repeatedly is a waste of your time.
[root@l01lnp2 traces]# /usr/local/libexec/slapd -V
@(#) $OpenLDAP: slapd 2.4.16 (Sep 9 2009 14:39:44) $ root@l01lnp2:/home/tculjaga/openldap-2.4.16/servers/slapd
I would strongly urge you to upgrade to 2.4.18 (for reasons I will note further down)
[root@l01lnp2 traces]# /usr/local/BerkeleyDB.4.7/bin/db_stat -V
Berkeley DB 4.7.25: (May 15, 2008) - unpached!
You need to rebuild BDB 4.7.25 with the 4 patches from Oracle. There are known issues when running BDB 4.7 without them.
[root@l01lnp2 traces]# du -c -h /var/lib/ldap/*.bdb
200K /var/lib/ldap/bestMatchPrefix.bdb 3.8G /var/lib/ldap/dn2id.bdb 6.2G /var/lib/ldap/id2entry.bdb 1.8M /var/lib/ldap/objectClass.bdb 1.2M /var/lib/ldap/originatorPrefixID.bdb 48M /var/lib/ldap/uniqueID.bdb 10G total
Since your database is a total of 10 GB in size, for slapadd to work at optimum efficiency, you need at least 10GB of cache for your DB_CONFIG file. Unfortunately, you only have 10GB of RAM. Essentially, your system is under powered for your database size.
[tculjaga@l01lnp2 ~]$ cat ot.ldif | grep -c "dn: "
101588 [tculjaga@l01lnp2 ~]$ cat l01sipdir1.ldif | grep -c "dn: " 9994864 [tculjaga@l01lnp2 ~]$
So you have 10,096,452 entries total.
[root@l01lnp2 traces]# cat /var/lib/ldap/DB_CONFIG | grep -v "#"
set_cachesize 0 3221225472 1 set_lg_regionmax 262144 set_lg_bsize 2097152
You only have a 3GB DB cachesize configured here. Expect things to perform sub optimally. It would have been easier to set this by going
set_cachesize 3 0 1
Which would have the same effect, since the first number is the number of gigabytes to allocate.
Please find attached slapd.conf
Ok, so the relevant bits from here are:
cachesize 2500000 idlcachesize 7500000 cachefree 1000
Which means you have a cachesize of 2.5 million, an idlcachesize of 7.5 million, and (with OL 2.4.16) a dncachesize of 5 million.
I would highly advise you upgrade to OpenLDAP 2.4.18, and change the slapd.conf settings to:
dncachesize 0 (which means unlimited).
And setting no cache or idlcachesize, and fixing your DB_CONFIG. But you also need to buy a substantial amount of RAM for a DB of this size. :P I would advise you upgrade to at least 32GB total. Then you can more optimally tune the system.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc
Zimbra :: the leader in open source messaging and collaboration