It's still happening, see http://i.imgur.com/NL8ztmp.png. The only solution for us now is to reboot slapd on a regular basis. 

What information can I provide to help to find the reason and fix it?

--
Sergey


On Mon, Mar 30, 2015 at 12:01 PM, Sergey Esin <sergey.esin@gmail.com> wrote:
Hi Ryan,

Here's my config of LDAP master:
---------------------------------------------------------------------------
# cat /etc/openldap/slapd.conf | grep -v ^# | grep -ve '^$'
include         /etc/openldap/schema/core.schema
include         /etc/openldap/schema/cosine.schema
include         /etc/openldap/schema/inetorgperson.schema
include         /etc/openldap/schema/nis.schema
allow bind_v2
pidfile         /var/run/openldap/slapd.pid
argsfile        /var/run/openldap/slapd.args
modulepath      /usr/lib64/openldap
moduleload accesslog.la
moduleload syncprov.la
TLSCACertificateFile /etc/openldap/certs/CA.pem
TLSCertificateFile /etc/openldap/certs/ldap-master.pem
TLSCertificateKeyFile /etc/openldap/certs/ldap-master.key
TLSVerifyClient allow

[ .. some limits here .. ]

[ .. some ACLs here .. ]

database config
access to *
        by dn.exact="gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth" manage
        by * none
database monitor

[ .. some ACLs here .. ]

[ .. some limits here .. ]

database        bdb
cachesize       380000
idlcachesize    700000
readonly        off
suffix          "dc=domain,dc=com"
rootdn          "cn=Manager,dc=domain,dc=com"
rootpw          {SSHA}XXXXXXXXXX
directory       /var/lib/ldap
index   uid     eq
index   mail    eq
index   objectClass eq
index entryCSN eq
index entryUUID eq
overlay syncprov
syncprov-checkpoint 100 10
syncprov-sessionlog 100

[ .. some limits here .. ]

loglevel sync stats stats2 shell
checkpoint 5120 10
serverID    1
---------------------------------------------------------------------------


Here's what I have on replica server:

---------------------------------------------------------------------------
include         /etc/openldap/schema/core.schema
include         /etc/openldap/schema/cosine.schema
include         /etc/openldap/schema/nis.schema
include         /etc/openldap/schema/inetorgperson.schema
allow bind_v2
pidfile         /var/run/openldap/slapd.pid
argsfile        /var/run/openldap/slapd.args
threads 8
[ .. some ACLs here .. ]
database config
access to *
        by dn.exact="gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth" manage
        by * none
database monitor
[ .. some ACLs here .. ]
database        bdb
cachesize       380000
idlcachesize    700000
readonly        off
suffix          "dc=domain,dc=com"
rootdn          "cn=Manager,dc=jetbrains,dc=com"
rootpw          {SSHA}XXXXXXXXXXXXXXXXX
directory       /var/lib/ldap
index   uid     eq
index   mail    eq
index   objectClass eq
index entryCSN eq
index entryUUID eq
checkpoint 5120 10
syncrepl rid=34
 provider=ldaps://ldap-master.domain.net:636
 tls_reqcert=demand
 tls_cacert=/etc/openldap/certs/CA.pem
 type=refreshAndPersist
 schemachecking=off
 searchbase="dc=domain,dc=com"
 scope=sub
 bindmethod=simple
 binddn="cn=repluser,ou=Accounts,dc=domain,dc=com"
 credentials=XXXXXXXXXX
 retry="300 +"
updateref ldaps://ldap-master.domain.net
[ .. some limits here .. ]
loglevel stats sync stats2 shell

---------------------------------------------------------------------------


I restarted slapd with "LD_PRELOAD=/usr/lib64/libtcmalloc.so.4.1.0" to use a different memory allocator (tcmalloc) and now memory consumption is almost flat, please see http://i.imgur.com/brIvarB.png

I've also added "threads 8" directive into slapd.conf on LDAP master server but have not started the slapd process to make it active.

According to what I see from the OS (Linux) perspective, slapd is using 18 threads:

# ps -L -o pid= -p  `pgrep slapd` | wc -l
18


> Do your logs show what kind of client activity triggered the growth?

I have some logs but I nothing really special there. No unusual activity.


Regards,
Sergey


On Sun, Mar 29, 2015 at 10:16 AM, Ryan Tandy <ryan@nardis.ca> wrote:
Hi,

On Thu, Mar 26, 2015 at 01:50:27PM +0300, Sergey Esin wrote:
Hi all,

We're running OpenLDAP 2.4.40 (the latest available release) with just one
replica server (connected via TLS) and have the following picture -
http://i.imgur.com/om0lMiy.png

On the graph you can see memory consumption of the slapd process on the
host: in the beginngin it started without replica, then replica server was
connected (memory consumption became around 4 Gigs) and then OOM
(out-of-memory) killer on linux machine just killed the process.

I've seen a similar thing recently. The test case I posted to ITS#8081 causes very high memory usage on the host. (The crash bug is unrelated, it was a regression introduced after 2.4.40 was released.) Are you able to share your host config for comparison?

Howard wrote https://github.com/hyc/mleak while looking into it, but AFAIK we don't have a proven cause, only a suspicion that memory fragmentation may be involved.

Do your logs show what kind of client activity triggered the growth?

Do you use delta-syncrepl?


There are ~400 000 users in our ldap database.

OpenLDAP was compiled from sources using "./configure --prefix=/ldap2440
--with-tls --enable-slapd".

Are there any ways to understand what's is going wrong and how to fix it?

This server is really important for us, please share any ideas how to make
it stable!


My DB_CONFIG is like below:

set_flags DB_LOG_AUTOREMOVE

set_cachesize 0 524288000 5
set_lg_regionmax 1048576
set_lg_max 10485760
set_lg_bsize 2097512

set_lk_max_locks 23000
set_lk_max_lockers 2300
set_lk_max_objects 2300


--
Regards,
Sergey



--
Regards,
Sergey



--
Regards,
Sergey