https://bugs.openldap.org/show_bug.cgi?id=10365
Issue ID: 10365 Summary: OOM during replication with a lot of updates Product: OpenLDAP Version: 2.6.10 Hardware: x86_64 OS: Linux Status: UNCONFIRMED Keywords: needs_review Severity: normal Priority: --- Component: overlays Assignee: bugs@openldap.org Reporter: elecharny@apache.org Target Milestone: ---
When applying millions of updates on a huge database (30M entries), the server can crash with an out of memory.
What happens is that the replication client does not process the incoming updates fast enough, syncprov accumulate the changes in memory up to the point it crashes.
The clear workaround: use more memory on the server.
Context: the replication mode used is delta-syncrepl, 2 servers, MMR , with MDB and accesslog.
My assumption is that it would be valuable to benefit from the fact that we know when the socket is ready for write to actually push data to the replica, which means we can use the changes stored in accesslog and not in memory, while keeping a state of replication. I assume it's a huge change in the way it currently works.
There is no urgency, considering it's quite a specific use case, and it would be way faster to slapmodify the changes on a stopped base, then copy the content to the replica, then restart everything.
https://bugs.openldap.org/show_bug.cgi?id=10365
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords|needs_review | Target Milestone|--- |2.7.0
https://bugs.openldap.org/show_bug.cgi?id=10365
--- Comment #1 from elecharny@apache.org --- Ok, there is more than just a huge number of modifications being made, and an internal queue waiting for the socket to be ready.
Actually, the client is sending a lot of modifications using an asynchronous LdapModify, and does not wait for the response before sending a new LdapModify.
Doing so, the client is sending around 1000 LdapModify per second, and I assume the server also stacks the request in memory, waiting for each single LdapModify to be completed before trying the next one, accumulating hundreds of thousand LdapModify requests, the server being able to absorb around 160 modify/s, so each second we accumulate 840 new waiting LdapModify...
It sounds to me it's a pathological usage, and I'm not sure it requires a lot of analysis.
https://bugs.openldap.org/show_bug.cgi?id=10365
--- Comment #2 from elecharny@apache.org --- Created attachment 1083 --> https://bugs.openldap.org/attachment.cgi?id=1083&action=edit A grafana snapshot showing the memory being eat
This shows a heavy load (millions of LdapModify) on a server with dbNoSync set to FALSE, leading to an over usage of memory, uop to the point the slapd process get killed.
https://bugs.openldap.org/show_bug.cgi?id=10365
Howard Chu hyc@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |INVALID Status|UNCONFIRMED |RESOLVED
--- Comment #3 from Howard Chu hyc@openldap.org --- You should configure conn_max_pending or conn_max_pending_auth. No bug here.
https://bugs.openldap.org/show_bug.cgi?id=10365
--- Comment #4 from elecharny@apache.org --- Agreed. Thanks for pointing me to those parameters.
https://bugs.openldap.org/show_bug.cgi?id=10365
--- Comment #5 from elecharny@apache.org --- As a follow up,; their default value are 100 and 1000 respectively. They are not set in the tested config, so I suspect the LdapModify aren't pending, but currently being processed, bypassing the limit barrier...