Hello all,
I'm running some load tests on an LDAP server using JMeter and as part of the tests I'm running a continuous loop of users trying to change their passwords. The problem is, there is a point where OpenLDAP stops responding requests and is unable to complete its current operations. This is what each "user" do:
Bind -> Modify userPassword -> Unbind
When I spawn a few users on a short time and try to do the modify operation the server stops responding. I was able to do that by spawning 5 threads simultaneously and trying to have them change the user password at once. It seems that something is not handling the write operations well and is hanging. Slapd doesnt stop responding completely, as I'm able to send a kill signal to it and it'll tell that it is waiting for some tasks to end. The complete log follows:
Feb 4 13:47:44 Server01 slapd[8772]: @(#) $OpenLDAP: slapd 2.4.23 (Jan 28 2011 17:07:07) $#012#011root@Server01:/home/suporte/openldap-2.4.23/servers/slapd Feb 4 13:47:44 Server01 slapd[8772]: /ldap/etc/slapd.conf: line 108: rootdn is always granted unlimited privileges. Feb 4 13:47:44 Server01 slapd[8773]: bdb_db_open: database "dc=diretorio,dc=mycompany": unclean shutdown detected; attempting recovery. Feb 4 13:47:45 Server01 slapd[8773]: slapd starting Feb 4 13:47:51 Server01 slapd[8773]: conn=1000 fd=12 ACCEPT from IP=192.168.1.2:6098 (IP=0.0.0.0:389) Feb 4 13:47:51 Server01 slapd[8773]: conn=1002 fd=15 ACCEPT from IP=192.168.1.2:6101 (IP=0.0.0.0:389) Feb 4 13:47:51 Server01 slapd[8773]: conn=1003 fd=13 ACCEPT from IP=192.168.1.2:6102 (IP=0.0.0.0:389) Feb 4 13:47:51 Server01 slapd[8773]: conn=1001 fd=14 ACCEPT from IP=192.168.1.2:6099 (IP=0.0.0.0:389) Feb 4 13:47:51 Server01 slapd[8773]: conn=1004 fd=16 ACCEPT from IP=192.168.1.2:6103 (IP=0.0.0.0:389) Feb 4 13:47:51 Server01 slapd[8773]: conn=1001 op=0 BIND dn="cn=tcarga_usuario4,ou=usuarios,dc=diretorio,dc=mycompany" method=128 Feb 4 13:47:51 Server01 slapd[8773]: conn=1002 op=0 BIND dn="cn=tcarga_usuario1,ou=usuarios,dc=diretorio,dc=mycompany" method=128 Feb 4 13:47:51 Server01 slapd[8773]: conn=1003 op=0 BIND dn="cn=tcarga_usuario3,ou=usuarios,dc=diretorio,dc=mycompany" method=128 Feb 4 13:47:51 Server01 slapd[8773]: conn=1004 op=0 BIND dn="cn=tcarga_usuario5,ou=usuarios,dc=diretorio,dc=mycompany" method=128 Feb 4 13:47:51 Server01 slapd[8773]: conn=1002 op=0 BIND dn="cn=tcarga_usuario1,ou=usuarios,dc=diretorio,dc=mycompany" mech=SIMPLE ssf=0 Feb 4 13:47:51 Server01 slapd[8773]: conn=1000 op=0 BIND dn="cn=tcarga_usuario2,ou=usuarios,dc=diretorio,dc=mycompany" method=128 Feb 4 13:47:51 Server01 slapd[8773]: conn=1002 op=0 RESULT tag=97 err=0 text= Feb 4 13:47:51 Server01 slapd[8773]: conn=1000 op=0 BIND dn="cn=tcarga_usuario2,ou=usuarios,dc=diretorio,dc=mycompany" mech=SIMPLE ssf=0 Feb 4 13:47:51 Server01 slapd[8773]: conn=1003 op=0 BIND dn="cn=tcarga_usuario3,ou=usuarios,dc=diretorio,dc=mycompany" mech=SIMPLE ssf=0 Feb 4 13:47:51 Server01 slapd[8773]: conn=1001 op=0 BIND dn="cn=tcarga_usuario4,ou=usuarios,dc=diretorio,dc=mycompany" mech=SIMPLE ssf=0 Feb 4 13:47:51 Server01 slapd[8773]: conn=1000 op=0 RESULT tag=97 err=0 text= Feb 4 13:47:51 Server01 slapd[8773]: conn=1003 op=0 RESULT tag=97 err=0 text= Feb 4 13:47:51 Server01 slapd[8773]: conn=1004 op=0 BIND dn="cn=tcarga_usuario5,ou=usuarios,dc=diretorio,dc=mycompany" mech=SIMPLE ssf=0 Feb 4 13:47:51 Server01 slapd[8773]: conn=1001 op=0 RESULT tag=97 err=0 text= Feb 4 13:47:51 Server01 slapd[8773]: conn=1004 op=0 RESULT tag=97 err=0 text= Feb 4 13:47:51 Server01 slapd[8773]: conn=1004 op=1 MOD dn="cn=tcarga_usuario5,ou=usuarios,dc=diretorio,dc=mycompany" Feb 4 13:47:51 Server01 slapd[8773]: conn=1000 op=1 MOD dn="cn=tcarga_usuario2,ou=usuarios,dc=diretorio,dc=mycompany" Feb 4 13:47:51 Server01 slapd[8773]: conn=1001 op=1 MOD dn="cn=tcarga_usuario4,ou=usuarios,dc=diretorio,dc=mycompany" Feb 4 13:47:51 Server01 slapd[8773]: conn=1004 op=1 MOD attr=userPassword Feb 4 13:47:51 Server01 slapd[8773]: conn=1003 op=1 MOD dn="cn=tcarga_usuario3,ou=usuarios,dc=diretorio,dc=mycompany" Feb 4 13:47:51 Server01 slapd[8773]: conn=1003 op=1 MOD attr=userPassword Feb 4 13:47:51 Server01 slapd[8773]: conn=1000 op=1 MOD attr=userPassword Feb 4 13:47:51 Server01 slapd[8773]: slap_queue_csn: queing 0x40d36450 20110204154751.766634Z#000001#000#000000 Feb 4 13:47:51 Server01 slapd[8773]: slap_queue_csn: queing 0x426a5450 20110204154751.766634Z#000000#000#000000 Feb 4 13:47:51 Server01 slapd[8773]: slap_queue_csn: queing 0x436a7450 20110204154751.766652Z#000000#000#000000 Feb 4 13:47:51 Server01 slapd[8773]: conn=1002 op=1 MOD dn="cn=tcarga_usuario1,ou=usuarios,dc=diretorio,dc=mycompany" Feb 4 13:47:51 Server01 slapd[8773]: conn=1001 op=1 MOD attr=userPassword Feb 4 13:47:51 Server01 slapd[8773]: conn=1002 op=1 MOD attr=userPassword Feb 4 13:47:51 Server01 slapd[8773]: slap_queue_csn: queing 0x42ea6450 20110204154751.766981Z#000000#000#000000 Feb 4 13:47:51 Server01 slapd[8773]: slap_queue_csn: queing 0x41ea4450 20110204154751.767126Z#000000#000#000000 Feb 4 13:47:51 Server01 slapd[8773]: conn=1004 op=1 RESULT tag=103 err=0 text= Feb 4 13:47:51 Server01 slapd[8773]: slap_graduate_commit_csn: removing 0x7f9bc4203f10 20110204154751.766634Z#000000#000#000000 Feb 4 13:47:51 Server01 slapd[8773]: conn=1004 op=2 UNBIND Feb 4 13:47:51 Server01 slapd[8773]: conn=1004 fd=16 closed
[ Here I sent sigterm to slapd ]
Feb 4 13:52:14 Server01 slapd[8773]: daemon: shutdown requested and initiated. Feb 4 13:52:14 Server01 slapd[8773]: slapd shutdown: waiting for 4 operations/tasks to finish
The server does not respond anymore and I have to kill the process by using kill -9. Anyone ever experienced the same or can provide hints as to what is happening?
I'm using slapd 2.4.23 and berkeley db 4.6 on a Debian Lenny system.