I have been running into some huge memory spikes, and was wondering if its normal, or if anyone has seen it before.
Archtecture:
OpenLDAP 2.4.21 Running on RH4
Ulimit open files upped to 4096
Masters:
auth01.cmc, auth01.inflow
Running n-way multimaster
Slaves:
rsa01.inflow, rsa02.inflow, rsa03.cmc, rsa04.cmc
Syncrepl running off both masters refreshAndPersist with retry=”60 +”
Here is a graph of the spikes we are seeing on auth01.cmc and auth01.inflow
It also looks like the slave servers are only connecting to auth01; when looking at netstat on the auth boxes, I see:
>From auth01.phil:
tcp 0 0 auth01. inflow:ldap rsa04.cmc:46851 ESTABLISHED
tcp 0 0 auth01.inflow:ldap rsa01.inflow:61648 ESTABLISHED
tcp 0 0 auth01.inflow:ldap rsa01.inflow:61686 ESTABLISHED
tcp 0 0 auth01.inflow:ldap rsa01.inflow:61683 ESTABLISHED
tcp 0 0 auth01.inflow:48882 rsa02.inflow.:ldap ESTABLISHED
tcp 0 109500 auth01.inflow:ldap rsa03.cmc:45798 ESTABLISHED
tcp 0 0 auth01.inflow:ldap rsa02.inflow.:8773 ESTABLISHED
>From auth01.cmc:
tcp 0 0 auth01.cmc:ldap rsa02.inflow:8885 ESTABLISHED
tcp 1 0 auth01.cmc:24310 rsa03.cmc:ldap CLOSE_WAIT
tcp 0 0 auth01.cmc:ldap rsa01.inflow:61657 ESTABLISHED
With refreshAndPersist, shouldn’t each slave be connected to each host configured in syncRepl, and keep that connection?
This is a pretty big issue – today we had a master crash; got:
Mar 18 14:39:43 auth01 slapd[17498]: bdb(dc=comcast,dc=com): malloc: 16422: Cannot allocate memory
Mar 18 14:39:43 auth01 slapd[17498]: bdb(dc=comcast,dc=com): malloc: 32000: Cannot allocate memory
Mar 18 14:39:43 auth01 slapd[17498]: bdb(dc=comcast,dc=com): PANIC: Cannot allocate memory
Mar 18 14:39:43 auth01 slapd[17498]: bdb(dc=comcast,dc=com): PANIC: fatal region error detected; run recovery
Mar 18 14:39:43 auth01 slapd[17498]: slap_graduate_commit_csn: removing 0x9b5d3ec8 20100318143943.437380Z#000000#001#000000
Mar 18 14:39:43 auth01 slapd[17498]: bdb(dc=comcast,dc=com): uniqueMember.bdb: write failed for page 35805
Mar 18 14:39:43 auth01 slapd[17498]: bdb(dc=comcast,dc=com): uniqueMember.bdb: unable to flush page: 35805
These boxes have 8gig ram; I am trying to figure out if this is normal and I just need to up the ram.
Thanks for any help in advance.