I'm still trying to figure out why my servers sometimes get into a state where queries requesting the memberOf attribute take an exceedingly long time to process, for example:
Feb 13 19:46:05 ldap-02 slapd[13564]: conn=297643 fd=104 ACCEPT from PATH=/var/symas/run/ldapi (PATH=/var/symas/run/ldapi) Feb 13 19:46:05 ldap-02 slapd[13564]: conn=297643 op=0 BIND dn="cn=ldaproot,dc=cpp,dc=edu" method=128 Feb 13 19:46:05 ldap-02 slapd[13564]: conn=297643 op=0 BIND dn="cn=ldaproot,dc=cpp,dc=edu" mech=SIMPLE bind_ssf=0 ssf=71 Feb 13 19:46:05 ldap-02 slapd[13564]: conn=297643 op=0 RESULT tag=97 err=0 qtime=0.000006 etime=0.000050 text= Feb 13 19:46:05 ldap-02 slapd[13564]: conn=297643 op=1 SRCH base="dc=cpp,dc=edu" scope=2 deref=0 filter="(uid=henson)" Feb 13 19:46:05 ldap-02 slapd[13564]: conn=297643 op=1 SRCH attr=memberOf Feb 13 19:46:42 ldap-02 slapd[13564]: conn=297643 op=1 SEARCH RESULT tag=101 err=0 qtime=0.000012 etime=36.955710 nentries=1 text=
How is the qtime calculated? It is nice and short, despite the wall clock reading over 30 seconds :(.
Usually I have to reboot the server completely to clear this up, but this time I just had to restart it, and then the queries were lickity split again:
Feb 13 19:55:01 ldap-02 slapd[89655]: conn=1556 fd=40 ACCEPT from PATH=/var/symas/run/ldapi (PATH=/var/symas/run/ldapi) Feb 13 19:55:01 ldap-02 slapd[89655]: conn=1556 op=0 BIND dn="cn=ldaproot,dc=cpp,dc=edu" method=128 Feb 13 19:55:01 ldap-02 slapd[89655]: conn=1556 op=0 BIND dn="cn=ldaproot,dc=cpp,dc=edu" mech=SIMPLE bind_ssf=0 ssf=71 Feb 13 19:55:01 ldap-02 slapd[89655]: conn=1556 op=0 RESULT tag=97 err=0 qtime=0.000009 etime=0.000088 text= Feb 13 19:55:01 ldap-02 slapd[89655]: conn=1556 op=1 SRCH base="dc=cpp,dc=edu" scope=2 deref=0 filter="(uid=henson)" Feb 13 19:55:01 ldap-02 slapd[89655]: conn=1556 op=1 SRCH attr=memberOf Feb 13 19:55:01 ldap-02 slapd[89655]: conn=1556 op=1 SEARCH RESULT tag=101 err=0 qtime=0.000013 etime=0.213149 nentries=1 text=
Over 30 seconds elapsed time to .2 seconds? I'd like to see the .2 all the time :).
When the server gets like this, there's a very high read IO load, 200-300Mb/s, compared to generally less than 1Mb/s when things are working right.
It almost seems like it's doing a full disk scan searching for members every time.
Any suggestions on what to dig into?
Thanks...