We have a development server 'emerson' with roughly one third of the performance of our productional server 'bossdog', it runs ldap query 10 times faster than the productional server. I checked everything I can to find out the difference, it seems both are equally configured.
Test method:
emerson:~# time for (( i=0; i<100; i++)); do ldapsearch -xD cn=manager,dc=eoa,dc=cn -w masked -b ou=contacts,ou=realss.com,dc=eoa,dc=cn '(uidNumber=7)' dn; done; [snip] # zhangweiwu, contacts, realss.com, eoa.cn dn: uid=zhangweiwu,ou=contacts,ou=realss.com,dc=eoa,dc=cn
# search result search: 2 result: 0 Success
# numResponses: 2 # numEntries: 1
real 0m1.047s user 0m0.320s sys 0m0.352s
bossdog:~# time for (( i=0; i<100; i++)); do ldapsearch -xD cn=manager,dc=eoa,dc=cn -w masked -b ou=contacts,ou=realss.com,dc=eoa,dc=cn '(uidNumber=7)' dn; done; [snip]
real 0m41.790s user 0m0.232s sys 0m0.464s
What I have compared:
1. Both server uses Debian Lenney 2. Both server have same DB_CONFIG in /var/lib/ldap 3. Both server have same ACL settings. In fact we tried to reduce the ACL setting on the slow preforming productional server without improvement result. 4. Both server uses bdb 5. Compare syslog at log level 256 both server produce same log messages 6. The productional server shows harddisk is twice as fast as development server (hdparm), free memory more than the development server, and its dual-core xeon cpu should outperform the VIA CPU used in the development server, yet slower. The productional server average load is 0.5 which is not high for dual-CPU (with hyperthreading display as 4 CPUs to Linux). 7. Productional server DB is then re-built (by rm -rf & slapadd) without noticeable change in performance.
This performance difference resulted the productional server user interface much slower than user's patience.
What would you check further in this case? Thanks in advance!
Zhang Weiwu wrote:
We have a development server 'emerson' with roughly one third of the performance of our productional server 'bossdog', it runs ldap query 10 times faster than the productional server. I checked everything I can to find out the difference, it seems both are equally configured.
<snip>
This performance difference resulted the productional server user interface much slower than user's patience.
What would you check further in this case? Thanks in advance!
Two ideas :
1) Check the code that send requests to the prod server. It might receive many more requests than expected 2) Check the network latency. This is most certainly what kills your performances : firewalls, load balancers, etc are adding some latency you don't see on your dev server because you are not using them on dev.
I would bet for (2). I have experienced the very same problem for one of my client 2 months ago. A F5 round up was adding 3ms latancy for each request, enough to slow down the whole server so much that the dev server was running 5 times faster than the production server.
Hi. Thanks for sharing your ideas and experience!
On Mon, 12 Oct 2009, Emmanuel Lecharny wrote:
Two ideas :
- Check the code that send requests to the prod server. It might receive
many more requests than expected
Why? I don't know how to check the code and already asked the web dev to do so without clue. I guess ldap commandline output can exclude potential issue from code writer. By the way the same code run on prod and dev server.
- Check the network latency. This is most certainly what kills your
performances : firewalls, load balancers, etc are adding some latency you don't see on your dev server because you are not using them on dev.
Couldn't be. As a new prod, no firewall nor load balancing thing running. I just double checked iptables kernel module is not loaded.
This information might be helpful. I can manaully track down the slow-down happens because of two pauses.
the small pause (around 0.1 second) is marked here:
# extended LDIF # # LDAPv3 # base <ou=contacts,ou=realss.com,dc=eoa,dc=cn> with scope # subtree # filter: (uidNumber=7) # requesting: dn # -> 0.1s PAUSE # zhangweiwu, contacts, realss.com, eoa.cn dn: uid=zhangweiwu,ou=contacts,ou=realss.com,dc=eoa,dc=cn
# search result search: 2 result: 0 Success
# numResponses: 2 # numEntries: 1
the big pause (around 0.3 second) is marked here:
-0.3s PAUSE # extended LDIF # # LDAPv3 # base <ou=contacts,ou=realss.com,dc=eoa,dc=cn> with scope # subtree # filter: (uidNumber=7) # requesting: dn #
# zhangweiwu, contacts, realss.com, eoa.cn dn: uid=zhangweiwu,ou=contacts,ou=realss.com,dc=eoa,dc=cn
# search result search: 2 result: 0 Success
# numResponses: 2 # numEntries: 1
To get these pauses I had to run ldapsearch(1) multiple times and try capture the pausing with my staring eyes!
Since there is a pause before each ldapsearch command I first think of speed of hostname resolve and time needed to establish a socket. I tested domain name resolve using code snipplet offered by http://paulschreiber.com/blog/2005/10/28/simple-gethostbyname-example/ and found prod is not slow at resolving its own name as given by hostname(1) or 'localhost', excluding resolving issue. I do not know how to test time needed to establish tcp/ip socket connection but a manual 'telnet localhost ldap' does not feel slow.
This is really frustrating...
Zhang Weiwu wrote:
I do not know how to test time needed to establish tcp/ip socket connection but a manual 'telnet localhost ldap' does not feel slow.
I thinnk I can test it, but it is not seems the cause of the problem:
$ time for (( i=0; i<100; i++ )); do echo 1 | nc -q 0 localhost ldap; done;
real 0m0.473s user 0m0.148s sys 0m0.428s
0.4 is fast to me and obviously not the reason for slowness.
I found the problem. Stupid problem, this took us several days!
Since it has been so many days, I quote the full original question for your reference, see the bottom for the last jigsaw of the puzzle
Zhang Weiwu wrote:
We have a development server 'emerson' with roughly one third of the performance of our productional server 'bossdog', it runs ldap query 10 times faster than the productional server. I checked everything I can to find out the difference, it seems both are equally configured.
Test method:
emerson:~# time for (( i=0; i<100; i++)); do ldapsearch -xD cn=manager,dc=eoa,dc=cn -w masked -b ou=contacts,ou=realss.com,dc=eoa,dc=cn '(uidNumber=7)' dn; done; [snip] # zhangweiwu, contacts, realss.com, eoa.cn dn: uid=zhangweiwu,ou=contacts,ou=realss.com,dc=eoa,dc=cn
# search result search: 2 result: 0 Success
# numResponses: 2 # numEntries: 1
real 0m1.047s user 0m0.320s sys 0m0.352s
bossdog:~# time for (( i=0; i<100; i++)); do ldapsearch -xD cn=manager,dc=eoa,dc=cn -w masked -b ou=contacts,ou=realss.com,dc=eoa,dc=cn '(uidNumber=7)' dn; done; [snip]
real 0m41.790s user 0m0.232s sys 0m0.464s
What I have compared:
- Both server uses Debian Lenney
- Both server have same DB_CONFIG in /var/lib/ldap
- Both server have same ACL settings. In fact we tried to reduce the ACL setting on the slow preforming productional server without improvement result.
- Both server uses bdb
- Compare syslog at log level 256 both server produce same log messages
- The productional server shows harddisk is twice as fast as development server (hdparm), free memory more than the development server, and its dual-core xeon cpu should outperform the VIA CPU used in the development server, yet slower. The productional server average load is 0.5 which is not high for dual-CPU (with hyperthreading display as 4 CPUs to Linux).
- Productional server DB is then re-built (by rm -rf & slapadd) without noticeable change in performance.
This performance difference resulted the productional server user interface much slower than user's patience.
What would you check further in this case? Thanks in advance
The problem is we have been looking into openldap and system resource/kernel so much that we forgot other software directly related to performance. There had been something wrong made syslog low performance, change loglevel to none instantly solves the problem.
:)...
You are not the only one who faced this kind of problem !
2009/10/15 Zhang Weiwu zhangweiwu@realss.com:
I found the problem. Stupid problem, this took us several days!
Since it has been so many days, I quote the full original question for your reference, see the bottom for the last jigsaw of the puzzle
Zhang Weiwu wrote:
We have a development server 'emerson' with roughly one third of the performance of our productional server 'bossdog', it runs ldap query 10 times faster than the productional server. I checked everything I can to find out the difference, it seems both are equally configured.
Test method:
emerson:~# time for (( i=0; i<100; i++)); do ldapsearch -xD cn=manager,dc=eoa,dc=cn -w masked -b ou=contacts,ou=realss.com,dc=eoa,dc=cn '(uidNumber=7)' dn; done; [snip] # zhangweiwu, contacts, realss.com, eoa.cn dn: uid=zhangweiwu,ou=contacts,ou=realss.com,dc=eoa,dc=cn
# search result search: 2 result: 0 Success
# numResponses: 2 # numEntries: 1
real 0m1.047s user 0m0.320s sys 0m0.352s
bossdog:~# time for (( i=0; i<100; i++)); do ldapsearch -xD cn=manager,dc=eoa,dc=cn -w masked -b ou=contacts,ou=realss.com,dc=eoa,dc=cn '(uidNumber=7)' dn; done; [snip]
real 0m41.790s user 0m0.232s sys 0m0.464s
What I have compared:
1. Both server uses Debian Lenney 2. Both server have same DB_CONFIG in /var/lib/ldap 3. Both server have same ACL settings. In fact we tried to reduce the ACL setting on the slow preforming productional server without improvement result. 4. Both server uses bdb 5. Compare syslog at log level 256 both server produce same log messages 6. The productional server shows harddisk is twice as fast as development server (hdparm), free memory more than the development server, and its dual-core xeon cpu should outperform the VIA CPU used in the development server, yet slower. The productional server average load is 0.5 which is not high for dual-CPU (with hyperthreading display as 4 CPUs to Linux). 7. Productional server DB is then re-built (by rm -rf & slapadd) without noticeable change in performance.
This performance difference resulted the productional server user interface much slower than user's patience.
What would you check further in this case? Thanks in advance
The problem is we have been looking into openldap and system resource/kernel so much that we forgot other software directly related to performance. There had been something wrong made syslog low performance, change loglevel to none instantly solves the problem.
openldap-software@openldap.org