Hello, I have migrated from HDB to MDB backend and I am seeing higher CPU usage on my MDB openldap consumers. Has anyone else seen the same? Testing in my stage environment showed MDB to use less or the same amount of CPU than HDB - but now with real traffic and a large dataset I see sustained high CPU utilization.
My production environment has the following specs: 6 consumer servers with 8vCPU x 16G RAM openldap version 2.4.45 Syncrepl enabled (with a single openldap provider server which is also MDB and has no issues and no high cpu). The database has ~230K users. data.mdb is about 1.8G in size.
MDB database directives include: olcDbCheckpoint: 102400 10 olcDbNoSync: TRUE
The rest are defaults.
Indexing includes: olcDbIndex: businessCategory eq olcDbIndex: cn eq,sub olcDbIndex: description eq olcDbIndex: displayName eq,sub olcDbIndex: entryCSN eq olcDbIndex: entryUUID eq olcDbIndex: gidNumber eq olcDbIndex: givenName eq,sub olcDbIndex: mail eq olcDbIndex: member eq olcDbIndex: memberOf eq olcDbIndex: memberUid eq olcDbIndex: objectClass pres,eq olcDbIndex: sn eq,sub olcDbIndex: uid eq,sub olcDbIndex: uidNumber eq olcDbIndex: uniqueMember eq
These consumer servers are used for reads only. The initial sync with the provider is ok but once the consumers are actively handling read requests, CPU jumps to 60% usage on average. Our HDB consumers had half the resources (4vCPU and 8GB RAM) and less than half the CPU usage (average of 25% utilization).
I have tested adding other MDB directives (writemap, mapasync, nordahead) but cannot get CPU utilization to come down close to what we see with the HDB backend. I have also load tested in my stage environment and was unable to reproduce (MDB generally utilized the same or less resources than HDB, but never double). There has been no change in the data or traffic between migration. We have also reverted some servers back to HDB and then back to MDB to confirm the high utilization.
Has anyone else come across this with MDB and if so, were you able to alleviate CPU utilization? I can provide more details if needed. Any input welcome.
Thanks! Paul
--On Monday, August 24, 2020 7:05 PM +0000 paul.jc@yahoo.com wrote:
Hello, I have migrated from HDB to MDB backend and I am seeing higher CPU usage on my MDB openldap consumers. openldap version 2.4.45 Syncrepl enabled (with a single openldap provider server which is also MDB and has no issues and no high cpu). The database has ~230K users. data.mdb is about 1.8G in size.
a) You need to be running a current release, not something 4.5 years old.
b) You need to be using delta-syncrepl with a current release, not standard syncrepl
c) Do you know what the server is doing to be using a significant amount of CPU? I.e., have you looked at what it's logging with stats + sync set?
The flags you're playing with generally can help in a high write environent, outside of that they don't do much. I certainly wouldn't expect them to affect CPU usage.
Regards, Quanah
--
Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com
Hi Quanah, Thanks for the response. I finally got around to upgrading and I am still in the same boat. I am now running version 2.4.53.
# slapd -V @(#) $OpenLDAP: slapd 2.4.53 (Sep 24 2020 20:30:24) $
I have implemented delta-syncrepl as suggested. I already had stat + sync set for logging and I see nothing unusual to report - just lots of connections (across all consumers) and periodic syncrepl messages as expected.
I ran strace against slapd and didn't come up with anything that is obviously different between HDB and MDB consumers.
I then set olcLogLevel to "any" and got thousands of the following log entries: "mdb_search: scope not okay"
slapd[28595]: mdb_search: 40396 scope not okay
I am not sure what causes these log entries and if these are related to higher CPU utilization. If you have any input/suggestions on where to look next it would be much appreciated.
Regards, Paul
--On Wednesday, October 28, 2020 5:34 PM +0000 paul.jc@yahoo.com wrote: Hi Paul,
A few things, after going back to your original email.
This is minor, but "pres" indices are useless unless < 50% of the database has an instance of the attribute "pres" is being set on. I.e., setting "pres" on objectClass is always useless, since it appears on every entry.
I am not sure what causes these log entries and if these are related to higher CPU utilization. If you have any input/suggestions on where to look next it would be much appreciated.
That just means an entry it's examining as part of the search result to a query is not in scope. You can ignore it (and that's the reason why it only shows up at a high debug level).
I would also note that since MDB is significantly more efficient, it can do more in a given time slice than HDB. I.e., have you evaluated how many searches/second are being process with MDB vs HDB? The ability to do more in a given time slice means that MDB does generally use more CPU than HDB -- but only because slapd is literally able to process more requests in a given interval than HDB could. For example, in a test I did some years ago (2013 or so), MDB could answer approximately 3x the number of reads/second than HDB could (60k reads/sec vs just under 21k reads/second). On more modern systems, the disparity is even more pronounced.
Outside of that, without more concrete information to work with, it's hard to do anything other then speculate.
Regards, Quanah
--
Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com
Hi Quanah, Thanks for the input! I should mention that we load balance incoming queries so each of my consumers process a similar amount of requests yet my MDB consumers still have much higher CPU utilization. Bind times are also higher (15ms - some spikes up to 55ms - on MDB consumer vs a steady 5ms on HDB consumers on average). My concern about the "scope not okay" log entries references an old thread regarding high numbers of aliases. For MDB, do you know if dereferencing (often with "always") with large numbers of aliases still causes slower search times (and in turn higher cpu utilization) as noted in this thread here:
https://lists.openldap.org/hyperkitty/list/openldap-technical@openldap.org/t...
This thread can also be found here: https://www.openldap.org/lists/openldap-technical/201509/msg00111.html
I am not sure if this is related or if there was any resolution to that as it is several years old but figured I'd would throw it out there as a possible cause of my issue to see what you think. Let me know. Thanks!
Regards, Paul
--On Monday, November 2, 2020 7:07 PM +0000 paul.jc@yahoo.com wrote:
Hi Quanah, Thanks for the input! I should mention that we load balance incoming queries so each of my consumers process a similar amount of requests yet my MDB consumers still have much higher CPU utilization. Bind times are also higher (15ms - some spikes up to 55ms - on MDB consumer vs a steady 5ms on HDB consumers on average). My concern about the "scope not okay" log entries references an old thread regarding high numbers of aliases. For MDB, do you know if dereferencing (often with "always") with large numbers of aliases still causes slower search times (and in turn higher cpu utilization) as noted in this thread here:
If you're using aliases in your LDAP DB, then yes, that'll absolutely trigger issues such as this. The use of aliases generally indicates poor DIT design. ;)
Regards, Quanah
--
Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com
Quanah Gibson-Mount wrote:
If you're using aliases in your LDAP DB, then yes, that'll absolutely trigger issues such as this. The use of aliases generally indicates poor DIT design. ;)
Hey Quanah, understood. :) I inherited this openldap database and I'm not well versed in aliases. How do I verify if I actually have aliases being utilized? I have no ldif files in my core or custom schema config that define aliases. An ldapsearch on the cn=config returns default references, but that is all:
olcAttributeTypes: ( 2.5.4.1 NAME ( 'aliasedObjectName' 'aliasedEntryName' ) D ESC 'RFC4512: name of aliased object' EQUALITY distinguishedNameMatch SYNTAX 1.3.6.1.4.1.1466.115.121.1.12 SINGLE-VALUE )
and
olcObjectClasses: ( 2.5.6.1 NAME 'alias' DESC 'RFC4512: an alias' SUP top STRU CTURAL MUST aliasedObjectName )
Is there something else I should search to verify usage of aliases in the DB?
Thanks. Paul
--On Tuesday, November 3, 2020 6:51 PM +0000 paul.jc@yahoo.com wrote:
Quanah Gibson-Mount wrote:
If you're using aliases in your LDAP DB, then yes, that'll absolutely trigger issues such as this. The use of aliases generally indicates poor DIT design. ;)
Hey Quanah, understood. :) I inherited this openldap database and I'm not well versed in aliases. How do I verify if I actually have aliases being utilized? I have no ldif files in my core or custom schema config that define aliases. An ldapsearch on the cn=config returns default references, but that is all:
Hi Peter,
You would need to search your back-mdb database and see if there are any objects with an objectClass of "alias". I.e.,
ldapsearch ... "(objectClass=alias") 1.1
filling in your bind details of course (I'd suggest something with full read access to the entire db).
Regards, Quanah
--
Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com
Thanks Quanah,
Looks like I do NOT have any alias's defined - which is a good thing considering that would be bad DIT design as you mentioned.
That being said, this means I still do not have an explanation for why my MDB consumers are using up to 4x the CPU compared to my HDB consumers. As I mentioned before, we are processing equivalent numbers of requests on both HDB and MDB. Any further suggestions you have on where to inspect next would be appreciated. Planning to sift through debug logs again and compare the two. Regards, Paul
Drilling into CPU consumption is made easier with Linux perf tool.
++Cyrille
-----Original Message----- From: paul.jc@yahoo.com [mailto:paul.jc@yahoo.com] Sent: Thursday, November 5, 2020 12:51 AM To: openldap-technical@openldap.org Subject: Re: HDB to MDB migration results in higher CPU usage on openldap consumers
Thanks Quanah,
Looks like I do NOT have any alias's defined - which is a good thing considering that would be bad DIT design as you mentioned.
That being said, this means I still do not have an explanation for why my MDB consumers are using up to 4x the CPU compared to my HDB consumers. As I mentioned before, we are processing equivalent numbers of requests on both HDB and MDB. Any further suggestions you have on where to inspect next would be appreciated. Planning to sift through debug logs again and compare the two. Regards, Paul
--On Wednesday, November 4, 2020 11:50 PM +0000 paul.jc@yahoo.com wrote:
That being said, this means I still do not have an explanation for why my MDB consumers are using up to 4x the CPU compared to my HDB consumers.
You could certainly use something like oprofile to profile the different processes and see where they are spending time. It may point to something useful, hard to say.
--Quanah
--
Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com
Hi Quanah,
Using the perf tool on my MDB consumers on a per thread basis, I have found that function "mdb_node_search" and "mdb_page_search_root" are a source of high overhead (along with a number of other MDB fuctions) which correlate to the high cpu utilization I am seeing. When comparing the same to my HDB consumers by thread, the overhead for HDB related slapd functions is minimal.
Here is what I see on MDB: ------------------------- Samples: 75K of event 'cpu-clock', 4000 Hz, Event count (approx.): 6487155549 lost: 0/0 drop: 0/0 Overhead Shared Object Symbol 29.60% slapd [.] mdb_node_search 15.13% slapd [.] mdb_page_search_root 8.98% slapd [.] mdb_cmp_long 8.36% slapd [.] mdb_cursor_set 6.21% slapd [.] mdb_cmp_cint 5.18% slapd [.] mdb_page_get.isra.13
Here is what I see on HDB: ------------------------- Samples: 7K of event 'cpu-clock', 4000 Hz, Event count (approx.): 448391573 lost: 0/0 drop: 0/0 Overhead Shared Object Symbol 3.61% slapd [.] 0x000000000010eab0 0.55% slapd [.] avl_find 0.51% slapd [.] hdb_idl_fetch_key 0.43% slapd [.] hdb_idl_next
Do you know if these MDB functions are expected to use that much overhead and if not, any chance you know what might be causing this?
As a side note, I have also compared backtraces on the threads using gdb and strace and from that perspective I do not see anything outstanding (the output is much the same for both).
Thanks again for your input. Regards, Paul.
paul.jc@yahoo.com schrieb am 02.11.2020 um 20:07 in Nachricht
20201102190737.798.72820@hypatia.openldap.org:
Hi Quanah, Thanks for the input! I should mention that we load balance incoming queries so each of my consumers process a similar amount of requests yet my MDB consumers still have much higher CPU utilization.
Hi!
I wonder: Wouldn't it make more sense to compare the CPU "busy rates", meaning "user + sys + io_wait (+ some more)"? My guess is that MDB might use more user CPU as it uses less I/O, while HDB might use more I/O, thus less user CPU.
I'm experimenting with CPU usage graphs, and I'm attaching two examples: CPU-single shows usage in units of single CPUs, so 150% means one and a half CPU is being used, while CPU-total shows the ratio compared to all CPUs in the system, so 5% means your system could hadle that load "times 20".
Regards, Ulrich
Bind times are also higher (15ms - some spikes up to 55ms - on MDB consumer vs a steady 5ms on HDB consumers on average). My concern about the "scope not okay" log entries references an old thread regarding high numbers of aliases. For MDB, do you know if dereferencing (often with "always") with large numbers of aliases still causes slower search times (and in turn higher cpu utilization) as noted in this thread here:
https://lists.openldap.org/hyperkitty/list/openldap-technical@openldap.org/t hread/FHMQ7UAZZUPG3MEJK5PZCDVJXO4WDECE/#5RR35BAZXBVTIXCS7UZOTKORX65H7KFA
This thread can also be found here: https://www.openldap.org/lists/openldap-technical/201509/msg00111.html
I am not sure if this is related or if there was any resolution to that as it is several years old but figured I'd would throw it out there as a possible cause of my issue to see what you think. Let me know. Thanks!
Regards, Paul
Ulrich Windl wrote:
I wonder: Wouldn't it make more sense to compare the CPU "busy rates", meaning "user + sys + io_wait (+ some more)"? My guess is that MDB might use more user CPU as it uses less I/O, while HDB might use more I/O, thus less user CPU.
Hi Ulrich, thanks for taking a look at this. I really appreciate the feedback. Yes, I agree that MDB probably uses less I/O but the difference I am seeing is negligible. Contrast that with the difference in avg-cpu usage specifically by user stats (and note that my MDB consumer has double the CPU - 8 vCPUs vs 4 vCPUs which makes this discrepancy even more pronounced). And since I am not processing more requests on MDB than HDB, something else must going on here.
MDB: avg-cpu: %user %nice %system %iowait %steal %idle 56.00 3.79 1.48 0.01 0.00 38.73 HDB: avg-cpu: %user %nice %system %iowait %steal %idle 18.48 1.83 3.16 0.02 0.00 76.51
openldap-technical@openldap.org