Full_Name: Mark Bannister Version: 2.4.30 OS: Oracle Solaris 11.2 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (205.228.82.171)
I have a master server and a replica configured with syncrepl in refreshAndPersist mode. Im using bdb (not mdb), simple auth (not SASL), standard syncrepl (not delta-syncrepl).
I noticed that the directory has about 5 erroneous entries in 300,000 where there is a multi-valued attribute containing two identical values. These entries were added by slapadd -q. Here is an example:
dn: cn=test,ou=rpc,dc=mycompany,dc=com objectClass: oncRpc cn: test cn: test oncRpcNumber: 12345678
When the replica attempts to copy this data using syncrepl from the master server, it fails. All entries up to that point are synchronised fine, but any entries from that point onwards are missing. I didn't see any log entries telling me about this failure, although I admit I didn't look very hard or tweak the log levels.
This then causes a memory leak in the master server:
$ while :; do ps -p 9025 -o pid,ppid,pmem,rss,vsz,args | tail +2; sleep 60; done 9025 1 5.2 434204 463892 /usr/lib/slapd -f /etc/openldap/slapd.conf -u openldap -g openldap -h ldap:/// 9025 1 5.3 443216 472900 /usr/lib/slapd -f /etc/openldap/slapd.conf -u openldap -g openldap -h ldap:/// 9025 1 5.4 444384 474076 /usr/lib/sla - -f /etc/openldap/slapd.conf -u openldap -g openldap -h ldap:/// 9025 1 5.5 454680 484364 /usr/lib/slapd -f /etc/openldap/slapd.conf -u openldap -g openldap -h ldap:/// 9025 1 5.5 458288 487972 /usr/lib/slapd -f /etc/openldap/slapd.conf -u openldap -g openldap -h ldap:/// 9025 1 5.6 464996 494684 /usr/lib/slapd -f /etc/openldap/slapd.conf -u openldap -g openldap -h ldap:/// 9025 1 5.7 472952 502636 /usr/lib/slapd -f /etc/openldap/slapd.conf -u openldap -g openldap -h ldap:///
... until memory is exhausted, and I get:
ch_malloc of 606572 bytes failed
... followed by a 3GB core dump file.
When the server restarts, the cycle starts over again, ad infinitum until the filesystem is full of core dump files.
When I remove the 5 duplicate attribute values and restart the master and replica servers, the entire directory is then successfully replicated and the memory footprint of the slapd process remains stable. I am therefore assuming that the duplicate attribute values were causing syncrepl not to complete and a memory leak in the master server.
I did see that a number of memory leaks have been fixed since 2.4.30, but I didn't see anything that looked like this profile. Sorry I don't have a newer version of OpenLDAP to hand to re-test.