Hi,
I've discovered this with the python-ldap module, but having discussed it with the author and done some further testing I'm pretty sure this is down to the ldap_result() C API.
I have written a daemon that has a number of worker threads. A worker thread can launch an async LDAP search operation, and then it goes off to do other work while the search is running. It adds the FD of the socket used for the search operation to its select() call, and if there is data to read on that FD it wakes up and runs ldap_result() to grab those results. If the results are incomplete, it goes back around the loop again (check for more work to do, call select, wake up, ldap_result etc.)
When I call ldap_result(), the timeout parameter is 0. It turns out, however, this is not what I want. When I run an LDAP search operation with say 80,000 entries, examining the program with truss (on Solaris) on strace (on RHEL) reveals that I end up going around that loop 80,000 times, I call select() 80,000 times, I call ldap_result() 80,000 times ... one per entry. The behaviour I actually want is for ldap_result() to return all the items currently available in kernel buffers but without waiting for any new network packets to arrive. The best approximation I can get for this is to use a timeout value of 0.0001. Having tried various values, this is best for the hardware I'm testing with when OpenLDAP is listening on the loopback interface. It reduces the number of times I called select() from 80,000 times to about 3,000 times, and shaved 5 seconds off the total execution time. However, I'm acutely aware that results will be hardware dependent, and will differ when I'm using a physical network interface.
So what I'm asking is, what is the best way of achieving the following workflow:
- start LDAP search - start loop - select() on FD, wake me up when there is data to read - ldap.result() gives me ALL available data, and without waiting for anything new - Attend to other unrelated events - Go round loop again
Thanks & regards, Mark Bannister.
http://dbis.sf.net DBIS, a replacement for RFC2307
Mark R Bannister wrote:
Hi,
I've discovered this with the python-ldap module, but having discussed it with the author and done some further testing I'm pretty sure this is down to the ldap_result() C API.
I have written a daemon that has a number of worker threads. A worker thread can launch an async LDAP search operation, and then it goes off to do other work while the search is running. It adds the FD of the socket used for the search operation to its select() call, and if there is data to read on that FD it wakes up and runs ldap_result() to grab those results. If the results are incomplete, it goes back around the loop again (check for more work to do, call select, wake up, ldap_result etc.)
When I call ldap_result(), the timeout parameter is 0. It turns out, however, this is not what I want. When I run an LDAP search operation with say 80,000 entries, examining the program with truss (on Solaris) on strace (on RHEL) reveals that I end up going around that loop 80,000 times, I call select() 80,000 times, I call ldap_result() 80,000 times ... one per entry. The behaviour I actually want is for ldap_result() to return all the items currently available in kernel buffers but without waiting for any new network packets to arrive. The best approximation I can get for this is to use a timeout value of 0.0001. Having tried various values, this is best for the hardware I'm testing with when OpenLDAP is listening on the loopback interface. It reduces the number of times I called select() from 80,000 times to about 3,000 times, and shaved 5 seconds off the total execution time. However, I'm acutely aware that results will be hardware dependent, and will differ when I'm using a physical network interface.
So what I'm asking is, what is the best way of achieving the following workflow:
- start LDAP search
- start loop
- select() on FD, wake me up when there is data to read
- ldap.result() gives me ALL available data, and without waiting for
anything new
The ldap_result() manpage already tells you how things work. In fact, so does the python-ldap documentation. Your loop structure is wrong.
- Attend to other unrelated events
- Go round loop again
You should be coding, instead:
- start LDAP search - start loop - select() on FD, wake me up when there is data to read
loop2: - call ldap.result() with 0 timeout in a loop until it returns 0.
- Attend to other unrelated events - Go round loop again
Thanks & regards, Mark Bannister.
http://dbis.sf.net DBIS, a replacement for RFC2307
On 12/02/2015 09:11, Howard Chu wrote:
Mark R Bannister wrote:
So what I'm asking is, what is the best way of achieving the following workflow:
- start LDAP search
- start loop
- select() on FD, wake me up when there is data to read
- ldap.result() gives me ALL available data, and without waiting for
anything new
The ldap_result() manpage already tells you how things work. In fact, so does the python-ldap documentation. Your loop structure is wrong.
- Attend to other unrelated events
- Go round loop again
You should be coding, instead:
- start LDAP search
- start loop
- select() on FD, wake me up when there is data to read
loop2:
call ldap.result() with 0 timeout in a loop until it returns 0.
Attend to other unrelated events
Go round loop again
Thanks Howard, that kind of makes sense, but not understanding the internal workings of ldap.result(), what's to prevent loop2 taking, say, 5 seconds to complete? This could be a busy worker thread, I can't afford for it to take that much time out to process a single transaction.
Thanks, Mark.
openldap-technical@openldap.org