Maucci, Cyrille wrote:
Hi, So in my decreasing order of preference due to decreasing accuracy/easiness to setup. If your kernel is recent enough and support perf-events you could try to use perf to accurately know where the CPU is spent. If it does not, you could try oprofile even though that's more complex to setup than perf.
oprofile is good, and doesn't require the most recent kernels. But when you're seeing close to 100% CPU usage, it doesn't take a fine-grained profiler to see what's happening. A gdb stack trace will usually reveal the culprit immediately. Of course, it's more readable if you're running a non-optimized binary with debug symbols intact.
When you don't have those tools ready to be used, you can use the poor man profiling tool, i.e. sample that backtrace of slapd in loop using pstack. If you do not have pstack, you can achieve the same in a more heavy weight manner with gdb. If you do not have gdb, you may try ltrace/strace.
ltrace/strace are generally useless for debugging slapd issues. - ltrace only traces calls into installed libraries. There are only two classes of CPU-hog bugs encountered with slapd: a) a stupid programmer error in OpenLDAP code which causes a tight infinite loop in OpenLDAP code, and thus never hits any library functions. b) a stupid programmer error in a library which causes a tight infinite loop in the library, and thus will only show up as a single library call. In both cases, a gdb stack trace will be more informative.
(b) is the most common case, and these days it's almost always glibc malloc at fault.
- strace only traces system calls. slapd performs system calls for only a few purposes, almost all of which are to perform I/O. I/O calls will never result in 100% CPU usage.
If you don't have those, you should ask for some linux sysadmin help ;-) ++Cyrille