https://bugs.openldap.org/show_bug.cgi?id=9365
Issue ID: 9365 Summary: Mem leaks with Æ-DIR providers Product: OpenLDAP Version: 2.4.53 Hardware: All OS: All Status: UNCONFIRMED Severity: normal Priority: --- Component: slapd Assignee: bugs@openldap.org Reporter: michael@stroeder.com Target Milestone: ---
Created attachment 772 --> https://bugs.openldap.org/attachment.cgi?id=772&action=edit valgrind output on openSUSE Tumbleweed x86_64
An Æ-DIR installation with self-compiled OpenLDAP 2.4.53 on Debian (now buster) has memory leak issues on the Æ-DIR providers. The read-only consumers do not have this issue. The provider config is more complex with more overlays and more ACLs.
In this production deployment slapd is automatically restarted (by monit) when memory consumption reaches 80%. Thus monitoring clearly shows a frequent saw tooth pattern.
I've also tested on openSUSE Tumbleweed x86_64 with a RE24 build [1] by running slapd under control of valgrind for a couple of minutes continously sending simple bind operations (additional to the monitoring and other back-ground jobs running).
Find valgrind output of my first attempt attached.
Does that make sense at all?
https://bugs.openldap.org/show_bug.cgi?id=9365
--- Comment #1 from Michael Ströder michael@stroeder.com --- [1] OpenLDAP packages used: https://build.opensuse.org/package/show/home:stroeder:branches:home:stroeder...
https://bugs.openldap.org/show_bug.cgi?id=9365
--- Comment #2 from Michael Ströder michael@stroeder.com --- valgrind command used:
valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes --fullpath-after= --keep-debuginfo=yes --verbose --log-file=valgrind-out.txt -- /usr/lib64/slapd -d stats -n ae-slapd -l LOCAL4 -s 7 -f /opt/ae-dir/etc/openldap/slapd.conf -h 'ldapi://%2Fopt%2Fae-dir%2Frun%2Fslapd%2Fldapi ldap://*:389 ldaps://*:636' -o slp=off -u ae-dir-slapd
https://bugs.openldap.org/show_bug.cgi?id=9365
--- Comment #3 from Michael Ströder michael@stroeder.com --- Created attachment 773 --> https://bugs.openldap.org/attachment.cgi?id=773&action=edit provider configuration
https://bugs.openldap.org/show_bug.cgi?id=9365
--- Comment #4 from Howard Chu hyc@openldap.org --- Comment on attachment 772 --> https://bugs.openldap.org/attachment.cgi?id=772 valgrind output on openSUSE Tumbleweed x86_64
There are two items of interest here, one is an uninit'd variable, which was fixed in master 3 years ago in commit 5bd89a1f1f1 but apparently never merged into RE24.
The other appears to be a leak in regexec(), which would be a libc bug. But to be sure - how many operations did you perform, before stopping slapd? The line
==7378== 47,104 bytes in 23 blocks are indirectly lost in loss record 92 of 93
would imply the leak occurred 23 times, and presumably was once per operation.
https://bugs.openldap.org/show_bug.cgi?id=9365
--- Comment #5 from Quanah Gibson-Mount quanah@openldap.org --- Added the missed init in:
Commits: • 854771cd by Howard Chu at 2020-10-12T16:11:36+00:00 Cleanup uninit'd vars
https://bugs.openldap.org/show_bug.cgi?id=9365
--- Comment #6 from Michael Ströder michael@stroeder.com --- I've retested with latest RE24 with the commit and the output still contains this:
==898== 47,104 bytes in 23 blocks are indirectly lost in loss record 95 of 96 ==898== at 0x483BB65: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==898== by 0x49FA5F5: build_trtable (/usr/src/debug/glibc-2.32-1.1.x86_64/posix/regexec.c:3403) ==898== by 0x49FCC04: transit_state (/usr/src/debug/glibc-2.32-1.1.x86_64/posix/regexec.c:2252) ==898== by 0x49FCC04: check_matching (/usr/src/debug/glibc-2.32-1.1.x86_64/posix/regexec.c:1120) ==898== by 0x49FCC04: re_search_internal (/usr/src/debug/glibc-2.32-1.1.x86_64/posix/regexec.c:792) ==898== by 0x4A0126B: regexec@@GLIBC_2.3.4 (/usr/src/debug/glibc-2.32-1.1.x86_64/posix/regexec.c:218) ==898== by 0x54DEA84: sock_over_op (/usr/src/debug/openldap2-2.4.54-14.1.x86_64/servers/slapd/back-sock/config.c:301) ==898== by 0x1BAA39: overlay_op_walk (/usr/src/debug/openldap2-2.4.54-14.1.x86_64/servers/slapd/backover.c:661) ==898== by 0x1BABAA: over_op_func (/usr/src/debug/openldap2-2.4.54-14.1.x86_64/servers/slapd/backover.c:730) ==898== by 0x168230: fe_op_bind (/usr/src/debug/openldap2-2.4.54-14.1.x86_64/servers/slapd/bind.c:383) ==898== by 0x167840: do_bind (/usr/src/debug/openldap2-2.4.54-14.1.x86_64/servers/slapd/bind.c:205) ==898== by 0x1482C1: connection_operation.lto_priv.0 (/usr/src/debug/openldap2-2.4.54-14.1.x86_64/servers/slapd/connection.c:1175) ==898== by 0x148E3A: connection_read_thread (/usr/src/debug/openldap2-2.4.54-14.1.x86_64/servers/slapd/connection.c:1311) ==898== by 0x4860C41: ldap_int_thread_pool_wrapper.part.0 (/usr/src/debug/openldap2-2.4.54-14.1.x86_64/libraries/libldap_r/tpool.c:696)
For the tests I've disabled background jobs and all other replicas and I've sent simple bind operation 59 times. So to me the count of 23 does not seem to correlate with the number of operations.
https://bugs.openldap.org/show_bug.cgi?id=9365
--- Comment #7 from Howard Chu hyc@openldap.org --- (In reply to Michael Ströder from comment #6)
I've retested with latest RE24 with the commit and the output still contains this:
==898== 47,104 bytes in 23 blocks are indirectly lost in loss record 95 of 96 ==898== at 0x483BB65: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==898== by 0x49FA5F5: build_trtable (/usr/src/debug/glibc-2.32-1.1.x86_64/posix/regexec.c:3403)
For the tests I've disabled background jobs and all other replicas and I've sent simple bind operation 59 times. So to me the count of 23 does not seem to correlate with the number of operations.
There's nothing else in this output that indicates a leak. If this is just some regex-internal structure that needs 23 allocs to init itself, then we have to look elsewhere. Probably you'll need to enable the consumers and generate some activity on the provider.
https://bugs.openldap.org/show_bug.cgi?id=9365
--- Comment #8 from Michael Ströder michael@stroeder.com --- Unfortunately this issue is still present, e.g. when running 2.6.1 on Ubuntu 20.04 (Focal) and eventually even 22.04 (Jammy).
The customer provided jemalloc profiler call-graphs from the running production env. Are these interesting to look at?
https://bugs.openldap.org/show_bug.cgi?id=9365
--- Comment #9 from Howard Chu hyc@openldap.org --- (In reply to Michael Ströder from comment #8)
Unfortunately this issue is still present, e.g. when running 2.6.1 on Ubuntu 20.04 (Focal) and eventually even 22.04 (Jammy).
The customer provided jemalloc profiler call-graphs from the running production env. Are these interesting to look at?
Unlikely. Memory profiles of a running process won't reveal actual leaks, though it may show some areas of heavy memory use. Generally won't know that something has actually leaked unless it is still unfreed after shutdown. I still recommend using this leak tracer https://github.com/hyc/mleak/ since it performs well even in production.
https://bugs.openldap.org/show_bug.cgi?id=9365
--- Comment #10 from Ondřej Kuzník ondra@mistotebe.net --- On Wed, Apr 27, 2022 at 01:38:48PM +0000, openldap-its@openldap.org wrote:
Unfortunately this issue is still present, e.g. when running 2.6.1 on Ubuntu 20.04 (Focal) and eventually even 22.04 (Jammy).
The customer provided jemalloc profiler call-graphs from the running production env. Are these interesting to look at?
If you have several profiles from the same run as the memory use is going up, pprof with --base should be able to show who the additional memory belongs to. That might help pinpoint the code responsible.
Thanks,
https://bugs.openldap.org/show_bug.cgi?id=9365
--- Comment #11 from Michael Ströder michael@stroeder.com --- Some tests show that the fix for ITS#9924 might be the solution.
I've published Debian/Ubuntu packages with this back-port patch and will wait what customers' admins are reporting.
https://bugs.openldap.org/show_bug.cgi?id=9365
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://bugs.openldap.org/s | |how_bug.cgi?id=9924