(My original message, which presents the problem, is at the bottom.)
Thank you for your responses, here is some more information:
You didn't mention what version of slapd you're running.
That's right, sorry. I'm running Debian-packaged slapd 2.2.23-8, using bdb 4.2.52-18 (everything on my system is the Debian sarge's packages, except for the kernel, which is a recompiled Ubuntu 6.06 kernel, 2.6.12 SMP).
I would expect the call to wait after forking to exec true to be treated as blocking IO, but, perhaps on your system, true is an sh builtin.
That's right, true is a shell builtin. So the two processes don't fork or anything, they just run and run. Some tests I made right now show that a certain ldapsearch completes in 4 seconds if the processes have nice 19, 35-40 seconds if they have nice 10, and longer if nice 0.
And, yes, if I replace "while true" with "while /bin/true", slapd responds instantly.
nicing a process does not affect its time slice, just where it sits in the run queue when it is ready to run.
Your description of how the scheduler works might explain the cause of the problem if slapd makes a huge number of blocking I/O requests: each time it makes such a request, it goes to Sleep, and the next process on the run queue (the niced shell in our case) is set to run, and exhausts its time slice. Then, supposing slapd is ready again, it is set to run, it makes a request, it sleeps again, etc. If it needs to make hundreds of I/O requests, could it explain the delay?
Here is what ps shows while I'm waiting for slapd to respond:
anthony@acheloos:~$ ps u -m 16204 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 16204 0.0 0.5 30140 5596 ? - 11:45 0:00 /usr/sbin/slapd root - 0.0 - - - - Ss 11:45 0:00 - root - 0.0 - - - - Ss 11:45 0:00 - root - 0.0 - - - - Rs 11:46 0:00 -
One of the threads is "Rs"; after slapd delivers its response, it goes back to "Ss".
Maybe you have the memory to let everything rest in memory. I don't know what your two looping shells do to your memory... If you had some control to never swap out ldap, this theory could be tested.
I think I have lots of spare memory:
top - 12:53:32 up 60 days, 1:35, 5 users, load average: 1.38, 0.77, 0.63 Tasks: 262 total, 3 running, 258 sleeping, 1 stopped, 0 zombie Cpu0 : 0.7% us, 0.7% sy, 98.7% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si Cpu1 : 0.0% us, 0.0% sy, 100.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si Mem: 1035680k total, 1016936k used, 18744k free, 39004k buffers Swap: 2097144k total, 99888k used, 1997256k free, 577360k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8551 anthony 35 10 4032 1204 916 R 99.9 0.1 0:10.96 sh 8550 anthony 35 10 4032 1204 916 R 98.8 0.1 0:10.72 sh 16204 root 21 0 30140 5600 3528 S 0.0 0.5 0:00.22 slapd
I tried to play with cachesize and idlcachesize (set them to 10 thousand), but didn't see any difference, which hardly surprises me given that my ldap database has only 257 records.
Finally, here is my DB_CONFIG:
set_cachesize 0 2097152 0 set_lg_bsize 524288 set_lk_max_objects 5000 set_lk_max_locks 5000 set_lk_max_lockers 5000
(My slapd.conf does not contain any db-related parameters).
My original message:
Hi,
At the almost idle Dual Core machine which runs slapd, I run:
nice sh -c 'while true; do true; done' & nice sh -c 'while true; do true; done' &
(i.e. I'm running this twice). Then each of the two CPUs always has some job to do, so both CPUs have 100% usage, but this is "nice".
Then, slapd takes too long to respond to queries. It may take 10 or 20 seconds. If I kill or stop one of the two dummy processes, it replies instantly. If I continue both dummy processes, it's back to 10 or 20 seconds. Needless to say all machine resources seem ok; low disk usage, lots of spare memory; and slapd is not niced.
If it's not something immediately obvious, could you help me debug it? I've run slapd with various "-d" options but it gives me results that I have trouble understanding.
The OS is Debian 3.1 (Sarge), with a 2.6.12 SMP Linux kernel.