Memory leak in 2.4.31 w/ hdb and MMR? - openldap-technical

18 May 2012


      I've got my two-node setup up and running, MMR w/ delta-syncrepl.  
Node 1 is up and running with ~340k entries in the main DIT.  It 
consumes about 4G of VM on a Redhat AS6 box, and the data dir is around 
2.3G on-disk (including __db.* and the one log.* file...), and another 
100M in cn=accesslog.
I'm bringing up node 2 after nuking the data dir, and letting it 
syncrepl from nothing.  I was tracking a SIGBUS error, but that turned 
out to be olcLogLevel=Stats+Sync generating a huge amount of logging and 
filling up the disk.  Fixed that, moved on.
Now, it's still crashing... but that's because slapd is bloating up 
hugely and causing the machine to run out of VM and kill the process.  
I've added about 8G of temporary swap, and it's still going.  slapd on 
node2 is 5.5G resident, and nearly 15G in size total now.  On-disk the 
hdb is only around 1.6G, so it still has a long ways to go.
Can I assume this is not the way it should be?
vmstat shows a lot of swap-out but almost no swap-in, so it's not 
thrashing.  pmap shows a large number of 64M "anon" segments.  140 of 
those at one point, and 157 when I checked just now.
It looks a lot like a memory leak, though I can't tell offhand whether 
the problem is in OpenLDAP (2.4.31) or in BerkeleyDB (5.3.15).  When it 
finishes, I'm planning on turning off delta-syncrepl and pave/rebuild 
again, and see if it behaves the same.  I could also give mdb a shot, 
but since this is a mirror I'd have to rebuild both sides.
Any suggestions as to where I could start looking for the source of the 
problem?  Obviously I'm not planning on rebuilding this node on a 
regular basis (and certainly not all via syncrepl) but I'm concerned 
that over an extended period of time it'll leak memory even during 
normal use.
I'm including my DB_CONFIG, just in case.  I can supply more of the 
config as necessary.
DB_CONFIG:
set_cachesize 0 536870912 0
set_lg_regionmax 10485760
set_lg_max 104857600
set_lg_bsize 26214400
set_lk_max_locks 4096
set_lk_max_objects 4096
set_flags DB_LOG_AUTOREMOVE
(If anything looks particularly stupid in here, even unrelated to the 
leak, I'd love the advice...)