Hi Quanah,


I moved to OpenLDAP 2.4.18 and patched B DB 4.7.25 with all 4 patches from oracle.


I DIDN't change slapd.config at all

i reduced the number of entries to a total of 3437278.

[root@l01lnp2 ~]# du -c -h /var/lib/ldap/*.bdb
200K    /var/lib/ldap/bestMatchPrefix.bdb
982M    /var/lib/ldap/dn2id.bdb
2.4G    /var/lib/ldap/id2entry.bdb
1.8M    /var/lib/ldap/objectClass.bdb
1.2M    /var/lib/ldap/originatorPrefixID.bdb
48M     /var/lib/ldap/uniqueID.bdb
3.4G    total <= interesting ... almost the same as number of entries :)


changed DB_CONFIG to cache 7 GB:

set_cachesize 7 0 1
set_lg_regionmax 262144
set_lg_bsize 2097152



my system has 10 GB of  RAM and the situation now is:

[root@l01lnp2 ~]# free
             total       used       free     shared    buffers     cached
Mem:      10234924   10176544      58380          0       2144    3786596
-/+ buffers/cache:    6387804    3847120
Swap:      4096564     753572    3342992
[root@l01lnp2 ~]#



When i'm doing ldapsearch (time ldapsearch  -h localhost -x -b ou=bestMatchPrefixList,ou=sipDirektor,dc=ot,dc=hr  -D cn=admin,dc=ot,dc=hr -w pero99) before i actuall add anything with ldapadd, the search completes within 40 seconds. slapd process takes 24 - 26% memory.

After I add new entries (just 2 more) and perform the same search, it hangs after a while. When it ldapsearch finishes returning entries, i see slapd process memory starts growing .... it is taking almost everything.... reaching 97% ?!?!
It is always like this.... the search throws all entries and then waits for some time .. it is almost random 60 seconds - 6 minutes to actually exit.


Please can you take a loot to strace logs i've attached in my previous e-mail... as asoon as the ldapsearch stops returning entries i see a lot of jubrish there...



Here is slapd process memory growth:

top - 16:42:22 up 4 days,  1:02,  2 users,  load average: 2.13, 0.67, 0.23
Tasks: 119 total,   1 running, 118 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.8%us,  0.2%sy,  0.0%ni, 70.0%id, 28.8%wa,  0.0%hi,  0.2%si,  0.0%st
Mem:  10234924k total, 10177568k used,    57356k free,     6676k buffers
Swap:  4096564k total,    36516k used,  4060048k free,  3603688k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND   
 9404 ldap      25   0 13.3g 8.8g 2.8g S  4.0 89.7   1:13.49 slapd     
    1 root      15   0 10344  372  344 S  0.0  0.0   0:01.69 init      
    2 root      RT  -5     0    0    0 S  0.0  0.0   0:00.06 migration/0
   


Tasks: 117 total,   1 running, 116 sleeping,   0 stopped,   0 zombie
Cpu(s):  7.2%us,  0.7%sy,  0.0%ni, 67.5%id, 24.3%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:  10234924k total, 10177968k used,    56956k free,     6656k buffers
Swap:  4096564k total,    36516k used,  4060048k free,  3580356k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9404 ldap      25   0 13.3g 8.9g 2.9g S 30.3 90.9   1:16.76 slapd 
  325 root      10  -5     0    0    0 S  0.7  0.0   5:37.11 kswapd0
 8458 root      15   0     0    0    0 D  0.3  0.0   0:02.02 pdflush


Tasks: 117 total,   1 running, 116 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.0%us,  0.3%sy,  0.0%ni, 72.3%id, 26.1%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:  10234924k total, 10180560k used,    54364k free,     6140k buffers
Swap:  4096564k total,    36516k used,  4060048k free,  3488164k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9404 ldap      25   0 13.4g 9.3g 3.2g S  4.7 95.5   1:28.86 slapd 
 8458 root      15   0     0    0    0 D  0.7  0.0   0:02.20 pdflush


Tasks: 117 total,   1 running, 116 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.9%us,  0.4%sy,  0.0%ni, 70.5%id, 28.0%wa,  0.0%hi,  0.2%si,  0.0%st
Mem:  10234924k total, 10177812k used,    57112k free,     3492k buffers
Swap:  4096564k total,    36516k used,  4060048k free,  3481476k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9404 ldap      25   0 13.4g 9.4g 3.2g S  4.3 95.9   1:30.39 slapd 
  325 root      10  -5     0    0    0 S  0.7  0.0   5:38.08 kswapd0



top - 16:45:01 up 4 days,  1:05,  2 users,  load average: 1.91, 1.40, 0.59
Tasks: 117 total,   1 running, 116 sleeping,   0 stopped,   0 zombie
Cpu(s):  3.2%us,  0.2%sy,  0.0%ni, 75.0%id, 21.4%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:  10234924k total, 10179744k used,    55180k free,      396k buffers
Swap:  4096564k total,    42328k used,  4054236k free,  3473624k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9404 ldap      25   0 13.5g 9.4g 3.3g S 13.6 96.7   1:33.44 slapd 
 9490 root      15   0     0    0    0 S  0.3  0.0   0:00.31 pdflush




top - 16:45:33 up 4 days,  1:05,  2 users,  load average: 1.55, 1.36, 0.60
Tasks: 117 total,   1 running, 116 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.7%us,  0.2%sy,  0.0%ni, 74.7%id, 22.3%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:  10234924k total, 10180100k used,    54824k free,      652k buffers
Swap:  4096564k total,   118616k used,  3977948k free,  3521232k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9404 ldap      25   0 13.5g 9.4g 3.3g S 10.6 96.6   1:37.36 slapd
  325 root      10  -5     0    0    0 S  0.3  0.0   5:38.63 kswapd0




This looks to me as a memory leak bug to me.

Tihomir.


 

On Thu, Sep 10, 2009 at 9:37 PM, Quanah Gibson-Mount <quanah@zimbra.com> wrote:
--On Thursday, September 10, 2009 8:56 PM +0200 Tihomir Culjaga <tculjaga@gmail.com> wrote:

So, the situation is that i have 2 ldif files i'm recreating the database
from.

/usr/local/libexec/slapadd -l /home/tculjaga/file2.ldif -f
/usr/local/etc/openldap/slapd.conf
/usr/local/libexec/slapadd -l /home/tculjaga/file2.ldif -f
/usr/local/etc/openldap/slapd.conf

I would suggest you just make these a single file, so all the work can be done at one time.


I tried to re-index with /usr/local/libexec/slapindex -f
/usr/local/etc/openldap/slapd.conf -v
restart slapd process, restart the machine ... it is always the same
issue.

Nothing here indicates a problem with your indices.  Running slapindex repeatedly is a waste of your time.


[root@l01lnp2 traces]# /usr/local/libexec/slapd -V
@(#) $OpenLDAP: slapd 2.4.16 (Sep  9 2009 14:39:44) $
    root@l01lnp2:/home/tculjaga/openldap-2.4.16/servers/slapd

I would strongly urge you to upgrade to 2.4.18 (for reasons I will note further down)



[root@l01lnp2 traces]# /usr/local/BerkeleyDB.4.7/bin/db_stat -V
Berkeley DB 4.7.25: (May 15, 2008) - unpached!

You need to rebuild BDB 4.7.25 with the 4 patches from Oracle.  There are known issues when running BDB 4.7 without them.


[root@l01lnp2 traces]# du -c -h /var/lib/ldap/*.bdb
200K    /var/lib/ldap/bestMatchPrefix.bdb
3.8G    /var/lib/ldap/dn2id.bdb
6.2G    /var/lib/ldap/id2entry.bdb
1.8M    /var/lib/ldap/objectClass.bdb
1.2M    /var/lib/ldap/originatorPrefixID.bdb
48M    /var/lib/ldap/uniqueID.bdb
10G    total

Since your database is a total of 10 GB in size, for slapadd to work at optimum efficiency, you need at least 10GB of cache for your DB_CONFIG file.  Unfortunately, you only have 10GB of RAM.  Essentially, your system is under powered for your database size.




[tculjaga@l01lnp2 ~]$ cat ot.ldif | grep -c "dn: "
101588
[tculjaga@l01lnp2 ~]$ cat l01sipdir1.ldif | grep -c "dn: "
9994864
[tculjaga@l01lnp2 ~]$

So you have 10,096,452 entries total.


[root@l01lnp2 traces]# cat /var/lib/ldap/DB_CONFIG | grep -v "#"

set_cachesize 0 3221225472 1
set_lg_regionmax 262144
set_lg_bsize 2097152

You only have a 3GB DB cachesize configured here.  Expect things to perform sub optimally.  It would have been easier to set this by going

set_cachesize 3 0 1

Which would have the same effect, since the first number is the number of gigabytes to allocate.


Please find attached slapd.conf

Ok, so the relevant bits from here are:

cachesize 2500000
idlcachesize 7500000
cachefree 1000

Which means you have a cachesize of 2.5 million, an idlcachesize of 7.5 million, and (with OL 2.4.16) a dncachesize of 5 million.

I would highly advise you upgrade to OpenLDAP 2.4.18, and change the slapd.conf settings to:

dncachesize 0 (which means unlimited).

And setting no cache or idlcachesize, and fixing your DB_CONFIG.  But you also need to buy a substantial amount of RAM for a DB of this size. :P  I would advise you upgrade to at least 32GB total.  Then you can more optimally tune the system.


--Quanah

--

Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra ::  the leader in open source messaging and collaboration