Re: ldapsearch hangs

11 Sep 2009


      I just sow what is going on...
the search returns sth like tihs:
<-------------------------------------snip---------------------------------------->
# 039010, 046010.100.8000.100, 99893, bestMatchPrefixList, sipDirektor,
ot.hr
dn:
originatorPrefixID=039010,carrierPrefixID=046010.100.8000.100,bestMatchPre
 fix=99893,ou=bestMatchPrefixList,ou=sipDirektor,dc=ot,dc=hr
originatorPrefix: 039010
priority: 100
originator: 039010
originatorPrefixID: 039010
objectClass: top
objectClass: originatorPrefixID
# 385, bestMatchPrefixList, sipDirektor, ot.hr
dn: bestMatchPrefix=385,ou=bestMatchPrefixList,ou=sipDirektor,dc=ot,dc=hr
destination: Croatia
bestMatchPrefix: 385
objectClass: top
objectClass: bestMatchPrefix
# 006800.100.10000.100, 385, bestMatchPrefixList, sipDirektor, ot.hr
dn:
carrierPrefixID=006800.100.10000.100,bestMatchPrefix=385,ou=bestMatchPrefi
 xList,ou=sipDirektor,dc=ot,dc=hr
qos: 100
priority: 10000
carrierPrefixID: 006800.100.10000.100
carrierPrefix: 006800
weight: 100
carrier: Optima Telekom
objectClass: top
objectClass: carrierPrefixID
# 000010, 006800.100.10000.100, 385, bestMatchPrefixList, sipDirektor, ot.hr
dn:
originatorPrefixID=000010,carrierPrefixID=006800.100.10000.100,bestMatchPr
 efix=385,ou=bestMatchPrefixList,ou=sipDirektor,dc=ot,dc=hr
originatorPrefix: 000010
priority: 100
originator: T-COM/HT
originatorPrefixID: 000010
objectClass: top
objectClass: originatorPrefixID
it stops here for a while and downbelow are the remainig entries that i
added with ldapadd asfer i recreated the database from ldif file....
Something is wrong with this entries .. either are not indexed or
something... Just to menitio .. I'm runing the same search several times
with same results... Always stops here and the entries i added with ldapadd
are returned after a while ... if ever.
# 043010.100.10000.100, 385, bestMatchPrefixList, sipDirektor, ot.hr
dn:
carrierPrefixID=043010.100.10000.100,bestMatchPrefix=385,ou=bestMatchPrefi
 xList,ou=sipDirektor,dc=ot,dc=hr
qos: 100
priority: 10000
carrierPrefixID: 043010.100.10000.100
carrierPrefix: 043010
weight: 100
carrier: Telekom Austria
objectClass: top
objectClass: carrierPrefixID
# 000010, 043010.100.10000.100, 385, bestMatchPrefixList, sipDirektor, ot.hr
dn:
originatorPrefixID=000010,carrierPrefixID=043010.100.10000.100,bestMatchPr
 efix=385,ou=bestMatchPrefixList,ou=sipDirektor,dc=ot,dc=hr
originatorPrefix: 000010
priority: 100
originator: T-COM/HT
originatorPrefixID: 000010
objectClass: top
objectClass: originatorPrefixID
# 078120.100.10000.100, 385, bestMatchPrefixList, sipDirektor, ot.hr
dn:
carrierPrefixID=078120.100.10000.100,bestMatchPrefix=385,ou=bestMatchPrefi
 xList,ou=sipDirektor,dc=ot,dc=hr
qos: 100
priority: 10000
carrierPrefixID: 078120.100.10000.100
carrierPrefix: 078120
weight: 100
carrier: Lanck Telekom
objectClass: top
objectClass: carrierPrefixID
# 000010, 078120.100.10000.100, 385, bestMatchPrefixList, sipDirektor, ot.hr
dn:
originatorPrefixID=000010,carrierPrefixID=078120.100.10000.100,bestMatchPr
 efix=385,ou=bestMatchPrefixList,ou=sipDirektor,dc=ot,dc=hr
originatorPrefix: 000010
priority: 100
originator: T-COM/HT
originatorPrefixID: 000010
objectClass: top
objectClass: originatorPrefixID
# search result
search: 2
result: 0 Success
# numResponses: 101584
# numEntries: 101583
Tihomir.
On Fri, Sep 11, 2009 at 5:10 PM, Tihomir Culjaga tculjaga@gmail.com wrote:
...
Hi Quanah,
I moved to OpenLDAP 2.4.18 and patched B DB 4.7.25 with all 4 patches from
oracle.
I DIDN't change slapd.config at all
i reduced the number of entries to a total of 3437278.
[root@l01lnp2 ~]# du -c -h /var/lib/ldap/*.bdb
200K    /var/lib/ldap/bestMatchPrefix.bdb
982M    /var/lib/ldap/dn2id.bdb
2.4G    /var/lib/ldap/id2entry.bdb
1.8M    /var/lib/ldap/objectClass.bdb
1.2M    /var/lib/ldap/originatorPrefixID.bdb
48M     /var/lib/ldap/uniqueID.bdb
3.4G    total <= interesting ... almost the same as number of entries :)
changed DB_CONFIG to cache 7 GB:
set_cachesize 7 0 1
set_lg_regionmax 262144
set_lg_bsize 2097152
my system has 10 GB of  RAM and the situation now is:
[root@l01lnp2 ~]# free
             total       used       free     shared    buffers     cached
Mem:      10234924   10176544      58380          0       2144    3786596
-/+ buffers/cache:    6387804    3847120
Swap:      4096564     753572    3342992
[root@l01lnp2 ~]#
When i'm doing ldapsearch (time ldapsearch  -h localhost -x -b
ou=bestMatchPrefixList,ou=sipDirektor,dc=ot,dc=hr  -D cn=admin,dc=ot,dc=hr
-w pero99) before i actuall add anything with ldapadd, the search completes
within 40 seconds. slapd process takes 24 - 26% memory.
After I add new entries (just 2 more) and perform the same search, it hangs
after a while. When it ldapsearch finishes returning entries, i see slapd
process memory starts growing .... it is taking almost everything....
reaching 97% ?!?!
It is always like this.... the search throws all entries and then waits for
some time .. it is almost random 60 seconds - 6 minutes to actually exit.
Please can you take a loot to strace logs i've attached in my previous
e-mail... as asoon as the ldapsearch stops returning entries i see a lot of
jubrish there...
Here is slapd process memory growth:
*top - 16:42:22 up* 4 days,  1:02,  2 users,  load average: 2.13, 0.67,
0.23
Tasks: 119 total,   1 running, 118 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.8%us,  0.2%sy,  0.0%ni, 70.0%id, 28.8%wa,  0.0%hi,  0.2%si,
0.0%st
Mem:  10234924k total, 10177568k used,    57356k free,     6676k buffers
Swap:  4096564k total,    36516k used,  4060048k free,  3603688k cached
PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9404 ldap      25   0 13.3g 8.8g 2.8g S  4.0 *89.7 *  1:13.49 slapd   *

1 root      15   0 10344  372  344 S  0.0  0.0   0:01.69 init
  2 root      RT  -5     0    0    0 S  0.0  0.0   0:00.06 migration/0

Tasks: 117 total,   1 running, 116 sleeping,   0 stopped,   0 zombie
Cpu(s):  7.2%us,  0.7%sy,  0.0%ni, 67.5%id, 24.3%wa,  0.0%hi,  0.3%si,
0.0%st
Mem:  10234924k total, 10177968k used,    56956k free,     6656k buffers
Swap:  4096564k total,    36516k used,  4060048k free,  3580356k cached
PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9404 ldap      25   0 13.3g 8.9g 2.9g S 30.3 *90.9*   1:16.76 slapd
  325 root      10  -5     0    0    0 S  0.7  0.0   5:37.11 kswapd0
 8458 root      15   0     0    0    0 D  0.3  0.0   0:02.02 pdflush
Tasks: 117 total,   1 running, 116 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.0%us,  0.3%sy,  0.0%ni, 72.3%id, 26.1%wa,  0.0%hi,  0.3%si,
0.0%st
Mem:  10234924k total, 10180560k used,    54364k free,     6140k buffers
Swap:  4096564k total,    36516k used,  4060048k free,  3488164k cached
PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9404 ldap      25   0 13.4g 9.3g 3.2g S  4.7 *95.5*   1:28.86 slapd
 8458 root      15   0     0    0    0 D  0.7  0.0   0:02.20 pdflush
Tasks: 117 total,   1 running, 116 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.9%us,  0.4%sy,  0.0%ni, 70.5%id, 28.0%wa,  0.0%hi,  0.2%si,
0.0%st
Mem:  10234924k total, 10177812k used,    57112k free,     3492k buffers
Swap:  4096564k total,    36516k used,  4060048k free,  3481476k cached
PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9404 ldap      25   0 13.4g 9.4g 3.2g S  4.3* 95.9*   1:30.39 slapd * *
  325 root      10  -5     0    0    0 S  0.7  0.0   5:38.08 kswapd0
*top - 16:45:01 up *4 days,  1:05,  2 users,  load average: 1.91, 1.40,
0.59
Tasks: 117 total,   1 running, 116 sleeping,   0 stopped,   0 zombie
Cpu(s):  3.2%us,  0.2%sy,  0.0%ni, 75.0%id, 21.4%wa,  0.0%hi,  0.1%si,
0.0%st
Mem:  10234924k total, 10179744k used,    55180k free,      396k buffers
Swap:  4096564k total,    42328k used,  4054236k free,  3473624k cached
PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9404 ldap      25   0 13.5g 9.4g 3.3g S 13.6 *96.7*   1:33.44 slapd
 9490 root      15   0     0    0    0 S  0.3  0.0   0:00.31 pdflush
*top - 16:45:33 up *4 days,  1:05,  2 users,  load average: 1.55, 1.36,
0.60
Tasks: 117 total,   1 running, 116 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.7%us,  0.2%sy,  0.0%ni, 74.7%id, 22.3%wa,  0.0%hi,  0.1%si,
0.0%st
Mem:  10234924k total, 10180100k used,    54824k free,      652k buffers
Swap:  4096564k total,   118616k used,  3977948k free,  3521232k cached
PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9404 ldap      25   0 13.5g 9.4g 3.3g S 10.6 *96.6*   1:37.36 slapd
  325 root      10  -5     0    0    0 S  0.3  0.0   5:38.63 kswapd0
This looks to me as a memory leak bug to me.
Tihomir.
On Thu, Sep 10, 2009 at 9:37 PM, Quanah Gibson-Mount quanah@zimbra.comwrote:
...
--On Thursday, September 10, 2009 8:56 PM +0200 Tihomir Culjaga <
tculjaga@gmail.com> wrote:
So, the situation is that i have 2 ldif files i'm recreating the database
...
from.
/usr/local/libexec/slapadd -l /home/tculjaga/file2.ldif -f
/usr/local/etc/openldap/slapd.conf
/usr/local/libexec/slapadd -l /home/tculjaga/file2.ldif -f
/usr/local/etc/openldap/slapd.conf
I would suggest you just make these a single file, so all the work can be
done at one time.
I tried to re-index with /usr/local/libexec/slapindex -f
...
/usr/local/etc/openldap/slapd.conf -v
restart slapd process, restart the machine ... it is always the same
issue.
Nothing here indicates a problem with your indices.  Running slapindex
repeatedly is a waste of your time.
[root@l01lnp2 traces]# /usr/local/libexec/slapd -V
...
@(#) $OpenLDAP: slapd 2.4.16 (Sep  9 2009 14:39:44) $
    root@l01lnp2:/home/tculjaga/openldap-2.4.16/servers/slapd
I would strongly urge you to upgrade to 2.4.18 (for reasons I will note
further down)
[root@l01lnp2 traces]# /usr/local/BerkeleyDB.4.7/bin/db_stat -V
...
Berkeley DB 4.7.25: (May 15, 2008) - unpached!
You need to rebuild BDB 4.7.25 with the 4 patches from Oracle.  There are
known issues when running BDB 4.7 without them.
[root@l01lnp2 traces]# du -c -h /var/lib/ldap/*.bdb
...
200K    /var/lib/ldap/bestMatchPrefix.bdb
3.8G    /var/lib/ldap/dn2id.bdb
6.2G    /var/lib/ldap/id2entry.bdb
1.8M    /var/lib/ldap/objectClass.bdb
1.2M    /var/lib/ldap/originatorPrefixID.bdb
48M    /var/lib/ldap/uniqueID.bdb
10G    total
Since your database is a total of 10 GB in size, for slapadd to work at
optimum efficiency, you need at least 10GB of cache for your DB_CONFIG file.
 Unfortunately, you only have 10GB of RAM.  Essentially, your system is
under powered for your database size.
[tculjaga@l01lnp2 ~]$ cat ot.ldif | grep -c "dn: "
...
101588
[tculjaga@l01lnp2 ~]$ cat l01sipdir1.ldif | grep -c "dn: "
9994864
[tculjaga@l01lnp2 ~]$
So you have 10,096,452 entries total.
[root@l01lnp2 traces]# cat /var/lib/ldap/DB_CONFIG | grep -v "#"
...
set_cachesize 0 3221225472 1
set_lg_regionmax 262144
set_lg_bsize 2097152
You only have a 3GB DB cachesize configured here.  Expect things to
perform sub optimally.  It would have been easier to set this by going
set_cachesize 3 0 1
Which would have the same effect, since the first number is the number of
gigabytes to allocate.
Please find attached slapd.conf
...
Ok, so the relevant bits from here are:
cachesize 2500000
idlcachesize 7500000
cachefree 1000
Which means you have a cachesize of 2.5 million, an idlcachesize of 7.5
million, and (with OL 2.4.16) a dncachesize of 5 million.
I would highly advise you upgrade to OpenLDAP 2.4.18, and change the
slapd.conf settings to:
dncachesize 0 (which means unlimited).
And setting no cache or idlcachesize, and fixing your DB_CONFIG.  But you
also need to buy a substantial amount of RAM for a DB of this size. :P  I
would advise you upgrade to at least 32GB total.  Then you can more
optimally tune the system.
--Quanah
--
Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc

Zimbra ::  the leader in open source messaging and collaboration

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: ldapsearch hangs