It's still happening, see http://i.imgur.com/NL8ztmp.png. The only solution for us now is to reboot slapd on a regular basis.
What information can I provide to help to find the reason and fix it?
-- Sergey
On Mon, Mar 30, 2015 at 12:01 PM, Sergey Esin sergey.esin@gmail.com wrote:
Hi Ryan,
Here's my config of LDAP master:
# cat /etc/openldap/slapd.conf | grep -v ^# | grep -ve '^$' include /etc/openldap/schema/core.schema include /etc/openldap/schema/cosine.schema include /etc/openldap/schema/inetorgperson.schema include /etc/openldap/schema/nis.schema allow bind_v2 pidfile /var/run/openldap/slapd.pid argsfile /var/run/openldap/slapd.args modulepath /usr/lib64/openldap moduleload accesslog.la moduleload syncprov.la TLSCACertificateFile /etc/openldap/certs/CA.pem TLSCertificateFile /etc/openldap/certs/ldap-master.pem TLSCertificateKeyFile /etc/openldap/certs/ldap-master.key TLSVerifyClient allow
[ .. some limits here .. ]
[ .. some ACLs here .. ]
database config access to * by dn.exact="gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth" manage by * none database monitor
[ .. some ACLs here .. ]
[ .. some limits here .. ]
database bdb cachesize 380000 idlcachesize 700000 readonly off suffix "dc=domain,dc=com" rootdn "cn=Manager,dc=domain,dc=com" rootpw {SSHA}XXXXXXXXXX directory /var/lib/ldap index uid eq index mail eq index objectClass eq index entryCSN eq index entryUUID eq overlay syncprov syncprov-checkpoint 100 10 syncprov-sessionlog 100
[ .. some limits here .. ]
loglevel sync stats stats2 shell checkpoint 5120 10 serverID 1
Here's what I have on replica server:
include /etc/openldap/schema/core.schema include /etc/openldap/schema/cosine.schema include /etc/openldap/schema/nis.schema include /etc/openldap/schema/inetorgperson.schema allow bind_v2 pidfile /var/run/openldap/slapd.pid argsfile /var/run/openldap/slapd.args threads 8 [ .. some ACLs here .. ] database config access to * by dn.exact="gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth" manage by * none database monitor [ .. some ACLs here .. ] database bdb cachesize 380000 idlcachesize 700000 readonly off suffix "dc=domain,dc=com" rootdn "cn=Manager,dc=jetbrains,dc=com" rootpw {SSHA}XXXXXXXXXXXXXXXXX directory /var/lib/ldap index uid eq index mail eq index objectClass eq index entryCSN eq index entryUUID eq checkpoint 5120 10 syncrepl rid=34 provider=ldaps://ldap-master.domain.net:636 tls_reqcert=demand tls_cacert=/etc/openldap/certs/CA.pem type=refreshAndPersist schemachecking=off searchbase="dc=domain,dc=com" scope=sub bindmethod=simple binddn="cn=repluser,ou=Accounts,dc=domain,dc=com" credentials=XXXXXXXXXX retry="300 +" updateref ldaps://ldap-master.domain.net [ .. some limits here .. ] loglevel stats sync stats2 shell
I restarted slapd with "LD_PRELOAD=/usr/lib64/libtcmalloc.so.4.1.0" to use a different memory allocator (tcmalloc) and now memory consumption is almost flat, please see http://i.imgur.com/brIvarB.png
I've also added "threads 8" directive into slapd.conf on LDAP master server but have not started the slapd process to make it active.
According to what I see from the OS (Linux) perspective, slapd is using 18 threads:
# ps -L -o pid= -p `pgrep slapd` | wc -l 18
Do your logs show what kind of client activity triggered the growth?
I have some logs but I nothing really special there. No unusual activity.
Regards, Sergey
On Sun, Mar 29, 2015 at 10:16 AM, Ryan Tandy ryan@nardis.ca wrote:
Hi,
On Thu, Mar 26, 2015 at 01:50:27PM +0300, Sergey Esin wrote:
Hi all,
We're running OpenLDAP 2.4.40 (the latest available release) with just one replica server (connected via TLS) and have the following picture - http://i.imgur.com/om0lMiy.png
On the graph you can see memory consumption of the slapd process on the host: in the beginngin it started without replica, then replica server was connected (memory consumption became around 4 Gigs) and then OOM (out-of-memory) killer on linux machine just killed the process.
I've seen a similar thing recently. The test case I posted to ITS#8081 causes very high memory usage on the host. (The crash bug is unrelated, it was a regression introduced after 2.4.40 was released.) Are you able to share your host config for comparison?
Howard wrote https://github.com/hyc/mleak while looking into it, but AFAIK we don't have a proven cause, only a suspicion that memory fragmentation may be involved.
Do your logs show what kind of client activity triggered the growth?
Do you use delta-syncrepl?
There are ~400 000 users in our ldap database.
OpenLDAP was compiled from sources using "./configure --prefix=/ldap2440 --with-tls --enable-slapd".
Are there any ways to understand what's is going wrong and how to fix it?
This server is really important for us, please share any ideas how to make it stable!
My DB_CONFIG is like below:
set_flags DB_LOG_AUTOREMOVE
set_cachesize 0 524288000 5 set_lg_regionmax 1048576 set_lg_max 10485760 set_lg_bsize 2097512
set_lk_max_locks 23000 set_lk_max_lockers 2300 set_lk_max_objects 2300
-- Regards, Sergey
-- Regards, Sergey