I've been testing a 4-way multi-master setup using OpenLDAP 2.4.25 and I'm having some sporadic problems with it that I'm having difficulty diagnosing..
I have four identical RHEL 4.9 machines on the same switch (NTP syncronized to same stratum 2 servers): dual-core Xeon 5110 1.60GHz 8GB RAM 100Mb full-duplex NIC OpenLDAP 2.4.25, BDB 4.8.30, OpenSSL 1.0.0d, Cyrus SASL 2.1.23 (using no tls/ssl at this time)
I start the slapds with '-d conns,sync' then commence. I ldapadd 1000 DNs to one of the servers. After all the syncing has stopped I then compare the slapd contents against each other looking for differences. Occasionally there are as much as a couple hundred DNs missing from one or more of the instances. When that happens I've noticed that the mmaster with less DNs has lost its consumer connection to a mmaster provider (confirmed using lsof and netstat) and will never attempt a re-connect, but the provider still shows the connection (using lsof and netstat). When the consumer gets in this state I can connect to its cn=config and cn=monitor backends (and browse them) but when I try to connect to its multi-master'd backend the connection attempt just hangs. It's almost like the connect succeeds but the client is waiting for a response from the server (and never gets it). Also, the consumer slapd will not respond to a 'kill -TERM' at this time and must be 'kill -KILL'd. The same thing occurs sometimes when I delete the entire tree.
I've been trying to catch logging information that might help but so far nothing's jumping out at me. While I continue to try to reproduce and parse through logfiles maybe someone can look at my slapd.confs below and see if I might have configured something wrong (I'm listing the original slapd.conf files below, but I've used slaptest to convert them to slapd.d/cn=config.ldif format):
HOST1 slapd.conf:
include /tmp/openldap/multi-master/etc/schema/core.schema include /tmp/openldap/multi-master/etc/schema/cosine.schema include /tmp/openldap/multi-master/etc/schema/nis.schema argsfile /tmp/openldap/multi-master/var/run/slapd.args pidfile /tmp/openldap/multi-master/var/run/slapd.pid threads 16 idletimeout 0 writetimeout 5 reverse-lookup off timelimit time.soft=30 time.hard=300 sizelimit size.soft=500 size.hard=1000 password-hash {SSHA} loglevel stats sync serverid 001 modulepath /tmp/openldap/multi-master/libexec moduleload back_monitor.la moduleload back_hdb.la moduleload syncprov.la
database config rootdn cn=manager,cn=config rootpw {SSHA}yMFj3Y7KPh223NkkKLQsFeLUVm08Ckpm
database monitor rootdn cn=manager,cn=monitor rootpw {SSHA}vPVSN8o8eRnLdC/bGS7yDwQGeH4BHc0R
database hdb suffix dc=example,dc=com rootdn cn=manager,dc=example,dc=com rootpw {SSHA}0obbsJw5Yq2XAkdd/kS7vokaB9rrSOtI directory /tmp/openldap/multi-master/var/data/dc=example,dc=com cachesize 30000 cachefree 5 checkpoint 128 15 dncachesize 25000 idlcachesize 100000 index objectClass eq index entryCSN eq index entryUUID eq
syncrepl rid=001 provider=ldap://host2:1389 type=refreshAndPersist interval=00:00:05:00 retry="15 +" searchbase="dc=example,dc=com" binddn="cn=manager,dc=example,dc=com" credentials="example_pass" starttls=no schemachecking=off
syncrepl rid=002 provider=ldap://host3:1389 type=refreshAndPersist interval=00:00:05:00 retry="15 +" searchbase="dc=example,dc=com" binddn="cn=manager,dc=example,dc=com" credentials="example_pass" starttls=no schemachecking=off
syncrepl rid=003 provider=ldap://host4:1389 type=refreshAndPersist interval=00:00:05:00 retry="15 +" searchbase="dc=example,dc=com" binddn="cn=manager,dc=example,dc=com" credentials="example_pass" starttls=no schemachecking=off
HOST2 slapd.conf:
include /tmp/openldap/multi-master/etc/schema/core.schema include /tmp/openldap/multi-master/etc/schema/cosine.schema include /tmp/openldap/multi-master/etc/schema/nis.schema argsfile /tmp/openldap/multi-master/var/run/slapd.args pidfile /tmp/openldap/multi-master/var/run/slapd.pid threads 16 idletimeout 0 writetimeout 5 reverse-lookup off timelimit time.soft=30 time.hard=300 sizelimit size.soft=500 size.hard=1000 password-hash {SSHA} loglevel stats sync serverid 002 modulepath /tmp/openldap/multi-master/libexec moduleload back_monitor.la moduleload back_hdb.la moduleload syncprov.la
database config rootdn cn=manager,cn=config rootpw {SSHA}yMFj3Y7KPh223NkkKLQsFeLUVm08Ckpm
database monitor rootdn cn=manager,cn=monitor rootpw {SSHA}vPVSN8o8eRnLdC/bGS7yDwQGeH4BHc0R
database hdb suffix dc=example,dc=com rootdn cn=manager,dc=example,dc=com rootpw {SSHA}0obbsJw5Yq2XAkdd/kS7vokaB9rrSOtI directory /tmp/openldap/multi-master/var/data/dc=example,dc=com cachesize 30000 cachefree 5 checkpoint 128 15 dncachesize 25000 idlcachesize 100000 index objectClass eq index entryCSN eq index entryUUID eq
syncrepl rid=001 provider=ldap://host1:1389 type=refreshAndPersist interval=00:00:05:00 retry="15 +" searchbase="dc=example,dc=com" binddn="cn=manager,dc=example,dc=com" credentials="example_pass" starttls=no schemachecking=off
syncrepl rid=002 provider=ldap://host3:1389 type=refreshAndPersist interval=00:00:05:00 retry="15 +" searchbase="dc=example,dc=com" binddn="cn=manager,dc=example,dc=com" credentials="example_pass" starttls=no schemachecking=off
syncrepl rid=003 provider=ldap://host4:1389 type=refreshAndPersist interval=00:00:05:00 retry="15 +" searchbase="dc=example,dc=com" binddn="cn=manager,dc=example,dc=com" credentials="example_pass" starttls=no schemachecking=off
HOST3 slapd.conf:
include /tmp/openldap/multi-master/etc/schema/core.schema include /tmp/openldap/multi-master/etc/schema/cosine.schema include /tmp/openldap/multi-master/etc/schema/nis.schema argsfile /tmp/openldap/multi-master/var/run/slapd.args pidfile /tmp/openldap/multi-master/var/run/slapd.pid threads 16 idletimeout 0 writetimeout 5 reverse-lookup off timelimit time.soft=30 time.hard=300 sizelimit size.soft=500 size.hard=1000 password-hash {SSHA} loglevel stats sync serverid 003 modulepath /tmp/openldap/multi-master/libexec moduleload back_monitor.la moduleload back_hdb.la moduleload syncprov.la
database config rootdn cn=manager,cn=config rootpw {SSHA}yMFj3Y7KPh223NkkKLQsFeLUVm08Ckpm
database monitor rootdn cn=manager,cn=monitor rootpw {SSHA}vPVSN8o8eRnLdC/bGS7yDwQGeH4BHc0R
database hdb suffix dc=example,dc=com rootdn cn=manager,dc=example,dc=com rootpw {SSHA}0obbsJw5Yq2XAkdd/kS7vokaB9rrSOtI directory /tmp/openldap/multi-master/var/data/dc=example,dc=com cachesize 30000 cachefree 5 checkpoint 128 15 dncachesize 25000 idlcachesize 100000 index objectClass eq index entryCSN eq index entryUUID eq
syncrepl rid=001 provider=ldap://host1:1389 type=refreshAndPersist interval=00:00:05:00 retry="15 +" searchbase="dc=example,dc=com" binddn="cn=manager,dc=example,dc=com" credentials="example_pass" starttls=no schemachecking=off
syncrepl rid=002 provider=ldap://host2:1389 type=refreshAndPersist interval=00:00:05:00 retry="15 +" searchbase="dc=example,dc=com" binddn="cn=manager,dc=example,dc=com" credentials="example_pass" starttls=no schemachecking=off
syncrepl rid=003 provider=ldap://host4:1389 type=refreshAndPersist interval=00:00:05:00 retry="15 +" searchbase="dc=example,dc=com" binddn="cn=manager,dc=example,dc=com" credentials="example_pass" starttls=no schemachecking=off
HOST4 slapd.conf:
include /tmp/openldap/multi-master/etc/schema/core.schema include /tmp/openldap/multi-master/etc/schema/cosine.schema include /tmp/openldap/multi-master/etc/schema/nis.schema argsfile /tmp/openldap/multi-master/var/run/slapd.args pidfile /tmp/openldap/multi-master/var/run/slapd.pid threads 16 idletimeout 0 writetimeout 5 reverse-lookup off timelimit time.soft=30 time.hard=300 sizelimit size.soft=500 size.hard=1000 password-hash {SSHA} loglevel stats sync serverid 004 modulepath /tmp/openldap/multi-master/libexec moduleload back_monitor.la moduleload back_hdb.la moduleload syncprov.la
database config rootdn cn=manager,cn=config rootpw {SSHA}yMFj3Y7KPh223NkkKLQsFeLUVm08Ckpm
database monitor rootdn cn=manager,cn=monitor rootpw {SSHA}vPVSN8o8eRnLdC/bGS7yDwQGeH4BHc0R
database hdb suffix dc=example,dc=com rootdn cn=manager,dc=example,dc=com rootpw {SSHA}0obbsJw5Yq2XAkdd/kS7vokaB9rrSOtI directory /tmp/openldap/multi-master/var/data/dc=example,dc=com cachesize 30000 cachefree 5 checkpoint 128 15 dncachesize 25000 idlcachesize 100000 index objectClass eq index entryCSN eq index entryUUID eq
syncrepl rid=001 provider=ldap://host1:1389 type=refreshAndPersist interval=00:00:05:00 retry="15 +" searchbase="dc=example,dc=com" binddn="cn=manager,dc=example,dc=com" credentials="example_pass" starttls=no schemachecking=off
syncrepl rid=002 provider=ldap://host2:1389 type=refreshAndPersist interval=00:00:05:00 retry="15 +" searchbase="dc=example,dc=com" binddn="cn=manager,dc=example,dc=com" credentials="example_pass" starttls=no schemachecking=off
syncrepl rid=003 provider=ldap://host3:1389 type=refreshAndPersist interval=00:00:05:00 retry="15 +" searchbase="dc=example,dc=com" binddn="cn=manager,dc=example,dc=com" credentials="example_pass" starttls=no schemachecking=off
Thank you.
Ok, that's embarrassing. I forgot the last couple lines of each of the slapd.confs. Just pretend each of the four ends with the following lines after all the syncrepl rids have been configured:
mirrormode TRUE overlay syncprov syncprov-checkpoint 50 10 syncprov-sessionlog 100
On Thu, Mar 31, 2011 at 9:06 PM, Mark mah042@gmail.com wrote:
I've been testing a 4-way multi-master setup using OpenLDAP 2.4.25 and I'm having some sporadic problems with it that I'm having difficulty diagnosing..
I have four identical RHEL 4.9 machines on the same switch (NTP syncronized to same stratum 2 servers): dual-core Xeon 5110 1.60GHz 8GB RAM 100Mb full-duplex NIC OpenLDAP 2.4.25, BDB 4.8.30, OpenSSL 1.0.0d, Cyrus SASL 2.1.23 (using no tls/ssl at this time)
I start the slapds with '-d conns,sync' then commence. I ldapadd 1000 DNs to one of the servers. After all the syncing has stopped I then compare the slapd contents against each other looking for differences. Occasionally there are as much as a couple hundred DNs missing from one or more of the instances. When that happens I've noticed that the mmaster with less DNs has lost its consumer connection to a mmaster provider (confirmed using lsof and netstat) and will never attempt a re-connect, but the provider still shows the connection (using lsof and netstat). When the consumer gets in this state I can connect to its cn=config and cn=monitor backends (and browse them) but when I try to connect to its multi-master'd backend the connection attempt just hangs. It's almost like the connect succeeds but the client is waiting for a response from the server (and never gets it). Also, the consumer slapd will not respond to a 'kill -TERM' at this time and must be 'kill -KILL'd. The same thing occurs sometimes when I delete the entire tree.
I've been trying to catch logging information that might help but so far nothing's jumping out at me. While I continue to try to reproduce and parse through logfiles maybe someone can look at my slapd.confs below and see if I might have configured something wrong (I'm listing the original slapd.conf files below, but I've used slaptest to convert them to slapd.d/cn=config.ldif format):
HOST1 slapd.conf:
include /tmp/openldap/multi-master/etc/schema/core.schema include /tmp/openldap/multi-master/etc/schema/cosine.schema include /tmp/openldap/multi-master/etc/schema/nis.schema argsfile /tmp/openldap/multi-master/var/run/slapd.args pidfile /tmp/openldap/multi-master/var/run/slapd.pid threads 16 idletimeout 0 writetimeout 5 reverse-lookup off timelimit time.soft=30 time.hard=300 sizelimit size.soft=500 size.hard=1000 password-hash {SSHA} loglevel stats sync serverid 001 modulepath /tmp/openldap/multi-master/libexec moduleload back_monitor.la moduleload back_hdb.la moduleload syncprov.la
database config rootdn cn=manager,cn=config rootpw {SSHA}yMFj3Y7KPh223NkkKLQsFeLUVm08Ckpm
database monitor rootdn cn=manager,cn=monitor rootpw {SSHA}vPVSN8o8eRnLdC/bGS7yDwQGeH4BHc0R
database hdb suffix dc=example,dc=com rootdn cn=manager,dc=example,dc=com rootpw {SSHA}0obbsJw5Yq2XAkdd/kS7vokaB9rrSOtI directory /tmp/openldap/multi-master/var/data/dc=example,dc=com cachesize 30000 cachefree 5 checkpoint 128 15 dncachesize 25000 idlcachesize 100000 index objectClass eq index entryCSN eq index entryUUID eq
syncrepl rid=001 provider=ldap://host2:1389 type=refreshAndPersist interval=00:00:05:00 retry="15 +" searchbase="dc=example,dc=com" binddn="cn=manager,dc=example,dc=com" credentials="example_pass" starttls=no schemachecking=off
syncrepl rid=002 provider=ldap://host3:1389 type=refreshAndPersist interval=00:00:05:00 retry="15 +" searchbase="dc=example,dc=com" binddn="cn=manager,dc=example,dc=com" credentials="example_pass" starttls=no schemachecking=off
syncrepl rid=003 provider=ldap://host4:1389 type=refreshAndPersist interval=00:00:05:00 retry="15 +" searchbase="dc=example,dc=com" binddn="cn=manager,dc=example,dc=com" credentials="example_pass" starttls=no schemachecking=off
HOST2 slapd.conf:
include /tmp/openldap/multi-master/etc/schema/core.schema include /tmp/openldap/multi-master/etc/schema/cosine.schema include /tmp/openldap/multi-master/etc/schema/nis.schema argsfile /tmp/openldap/multi-master/var/run/slapd.args pidfile /tmp/openldap/multi-master/var/run/slapd.pid threads 16 idletimeout 0 writetimeout 5 reverse-lookup off timelimit time.soft=30 time.hard=300 sizelimit size.soft=500 size.hard=1000 password-hash {SSHA} loglevel stats sync serverid 002 modulepath /tmp/openldap/multi-master/libexec moduleload back_monitor.la moduleload back_hdb.la moduleload syncprov.la
database config rootdn cn=manager,cn=config rootpw {SSHA}yMFj3Y7KPh223NkkKLQsFeLUVm08Ckpm
database monitor rootdn cn=manager,cn=monitor rootpw {SSHA}vPVSN8o8eRnLdC/bGS7yDwQGeH4BHc0R
database hdb suffix dc=example,dc=com rootdn cn=manager,dc=example,dc=com rootpw {SSHA}0obbsJw5Yq2XAkdd/kS7vokaB9rrSOtI directory /tmp/openldap/multi-master/var/data/dc=example,dc=com cachesize 30000 cachefree 5 checkpoint 128 15 dncachesize 25000 idlcachesize 100000 index objectClass eq index entryCSN eq index entryUUID eq
syncrepl rid=001 provider=ldap://host1:1389 type=refreshAndPersist interval=00:00:05:00 retry="15 +" searchbase="dc=example,dc=com" binddn="cn=manager,dc=example,dc=com" credentials="example_pass" starttls=no schemachecking=off
syncrepl rid=002 provider=ldap://host3:1389 type=refreshAndPersist interval=00:00:05:00 retry="15 +" searchbase="dc=example,dc=com" binddn="cn=manager,dc=example,dc=com" credentials="example_pass" starttls=no schemachecking=off
syncrepl rid=003 provider=ldap://host4:1389 type=refreshAndPersist interval=00:00:05:00 retry="15 +" searchbase="dc=example,dc=com" binddn="cn=manager,dc=example,dc=com" credentials="example_pass" starttls=no schemachecking=off
HOST3 slapd.conf:
include /tmp/openldap/multi-master/etc/schema/core.schema include /tmp/openldap/multi-master/etc/schema/cosine.schema include /tmp/openldap/multi-master/etc/schema/nis.schema argsfile /tmp/openldap/multi-master/var/run/slapd.args pidfile /tmp/openldap/multi-master/var/run/slapd.pid threads 16 idletimeout 0 writetimeout 5 reverse-lookup off timelimit time.soft=30 time.hard=300 sizelimit size.soft=500 size.hard=1000 password-hash {SSHA} loglevel stats sync serverid 003 modulepath /tmp/openldap/multi-master/libexec moduleload back_monitor.la moduleload back_hdb.la moduleload syncprov.la
database config rootdn cn=manager,cn=config rootpw {SSHA}yMFj3Y7KPh223NkkKLQsFeLUVm08Ckpm
database monitor rootdn cn=manager,cn=monitor rootpw {SSHA}vPVSN8o8eRnLdC/bGS7yDwQGeH4BHc0R
database hdb suffix dc=example,dc=com rootdn cn=manager,dc=example,dc=com rootpw {SSHA}0obbsJw5Yq2XAkdd/kS7vokaB9rrSOtI directory /tmp/openldap/multi-master/var/data/dc=example,dc=com cachesize 30000 cachefree 5 checkpoint 128 15 dncachesize 25000 idlcachesize 100000 index objectClass eq index entryCSN eq index entryUUID eq
syncrepl rid=001 provider=ldap://host1:1389 type=refreshAndPersist interval=00:00:05:00 retry="15 +" searchbase="dc=example,dc=com" binddn="cn=manager,dc=example,dc=com" credentials="example_pass" starttls=no schemachecking=off
syncrepl rid=002 provider=ldap://host2:1389 type=refreshAndPersist interval=00:00:05:00 retry="15 +" searchbase="dc=example,dc=com" binddn="cn=manager,dc=example,dc=com" credentials="example_pass" starttls=no schemachecking=off
syncrepl rid=003 provider=ldap://host4:1389 type=refreshAndPersist interval=00:00:05:00 retry="15 +" searchbase="dc=example,dc=com" binddn="cn=manager,dc=example,dc=com" credentials="example_pass" starttls=no schemachecking=off
HOST4 slapd.conf:
include /tmp/openldap/multi-master/etc/schema/core.schema include /tmp/openldap/multi-master/etc/schema/cosine.schema include /tmp/openldap/multi-master/etc/schema/nis.schema argsfile /tmp/openldap/multi-master/var/run/slapd.args pidfile /tmp/openldap/multi-master/var/run/slapd.pid threads 16 idletimeout 0 writetimeout 5 reverse-lookup off timelimit time.soft=30 time.hard=300 sizelimit size.soft=500 size.hard=1000 password-hash {SSHA} loglevel stats sync serverid 004 modulepath /tmp/openldap/multi-master/libexec moduleload back_monitor.la moduleload back_hdb.la moduleload syncprov.la
database config rootdn cn=manager,cn=config rootpw {SSHA}yMFj3Y7KPh223NkkKLQsFeLUVm08Ckpm
database monitor rootdn cn=manager,cn=monitor rootpw {SSHA}vPVSN8o8eRnLdC/bGS7yDwQGeH4BHc0R
database hdb suffix dc=example,dc=com rootdn cn=manager,dc=example,dc=com rootpw {SSHA}0obbsJw5Yq2XAkdd/kS7vokaB9rrSOtI directory /tmp/openldap/multi-master/var/data/dc=example,dc=com cachesize 30000 cachefree 5 checkpoint 128 15 dncachesize 25000 idlcachesize 100000 index objectClass eq index entryCSN eq index entryUUID eq
syncrepl rid=001 provider=ldap://host1:1389 type=refreshAndPersist interval=00:00:05:00 retry="15 +" searchbase="dc=example,dc=com" binddn="cn=manager,dc=example,dc=com" credentials="example_pass" starttls=no schemachecking=off
syncrepl rid=002 provider=ldap://host2:1389 type=refreshAndPersist interval=00:00:05:00 retry="15 +" searchbase="dc=example,dc=com" binddn="cn=manager,dc=example,dc=com" credentials="example_pass" starttls=no schemachecking=off
syncrepl rid=003 provider=ldap://host3:1389 type=refreshAndPersist interval=00:00:05:00 retry="15 +" searchbase="dc=example,dc=com" binddn="cn=manager,dc=example,dc=com" credentials="example_pass" starttls=no schemachecking=off
Thank you.
--On Thursday, March 31, 2011 9:06 PM -0500 Mark mah042@gmail.com wrote:
I've been testing a 4-way multi-master setup using OpenLDAP 2.4.25 and I'm having some sporadic problems with it that I'm having difficulty diagnosing..
Have you tried applying the patches in ITS#6872?
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
No I hadn't because the usage and symptoms didn't seem to fit. But it's worth a shot.
--- Mark
On Mar 31, 2011, at 9:27 PM, Quanah Gibson-Mount quanah@zimbra.com wrote:
--On Thursday, March 31, 2011 9:06 PM -0500 Mark mah042@gmail.com wrote:
I've been testing a 4-way multi-master setup using OpenLDAP 2.4.25 and I'm having some sporadic problems with it that I'm having difficulty diagnosing..
Have you tried applying the patches in ITS#6872?
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc.
Zimbra :: the leader in open source messaging and collaboration
The patch in ITS#6872 didn't fix the issue.
My first thought was to enable all the logging, but there's *so much data*and I don't know what's normal and what isn't. I captured the (netstat) connection information on all four hosts. Several of the connections are stuck in FIN_WAIT1 which normally is a quick, transitional state:
host1$ netstat -an | fgrep :1389 Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 10.1.1.1:1389 0.0.0.0:* LISTEN tcp 65115 0 10.1.1.1:19284 --> 10.1.1.4:1389 ESTABLISHED tcp 0 0 10.1.1.1:1389 <-- 10.1.1.4:36991 ESTABLISHED tcp 73458 0 10.1.1.1:19286 --> 10.1.1.3:1389 ESTABLISHED tcp 0 0 10.1.1.1:1389 <-- 10.1.1.3:38085 ESTABLISHED tcp 73112 0 10.1.1.1:19263 --> 10.1.1.2:1389 ESTABLISHED tcp 0 0 10.1.1.1:1389 <-- 10.1.1.2:41374 ESTABLISHED
host2$ netstat -an | fgrep :1389 Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 10.1.1.2:1389 0.0.0.0:* LISTEN tcp 0 11537 10.1.1.2:1389 <-- 10.1.1.1:19263 FIN_WAIT1 tcp 0 0 10.1.1.2:1389 <-- 10.1.1.4:36992 ESTABLISHED tcp 0 0 10.1.1.2:1389 <-- 10.1.1.3:38086 ESTABLISHED tcp 0 0 10.1.1.2:41373 --> 10.1.1.3:1389 ESTABLISHED tcp 0 0 10.1.1.2:41375 --> 10.1.1.4:1389 ESTABLISHED tcp 0 0 10.1.1.2:41374 --> 10.1.1.1:1389 ESTABLISHED
host3$ netstat -an | fgrep :1389 Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 10.1.1.3:1389 0.0.0.0:* LISTEN tcp 0 11521 10.1.1.3:1389 <-- 10.1.1.1:19286 FIN_WAIT1 tcp 0 0 10.1.1.3:38087 --> 10.1.1.4:1389 ESTABLISHED tcp 0 0 10.1.1.3:38085 --> 10.1.1.1:1389 ESTABLISHED tcp 0 11505 10.1.1.3:1389 <-- 10.1.1.4:37000 FIN_WAIT1 tcp 0 0 10.1.1.3:38086 --> 10.1.1.2:1389 ESTABLISHED tcp 0 0 10.1.1.3:1389 <-- 10.1.1.2:41373 ESTABLISHED
host4$ netstat -an | fgrep :1389 Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 10.1.1.4:1389 0.0.0.0:* LISTEN tcp 0 14281 10.1.1.4:1389 <-- 10.1.1.1:19284 FIN_WAIT1 tcp 73567 0 10.1.1.4:37000 --> 10.1.1.3:1389 ESTABLISHED tcp 0 0 10.1.1.4:1389 <-- 10.1.1.3:38087 ESTABLISHED tcp 17534 0 10.1.1.4:36991 --> 10.1.1.1:1389 ESTABLISHED tcp 0 0 10.1.1.4:1389 <-- 10.1.1.2:41375 ESTABLISHED tcp 0 0 10.1.1.4:36992 --> 10.1.1.2:1389 ESTABLISHED
I also captured a *pstack* trace on each of the four slapds. But again, I'm not sure what's *normal*:
host1$ Thread 17 (Thread 1082132832 (LWP 25922)): #0 0x0000003a340ca15c in epoll_wait () from /lib64/tls/libc.so.6 #1 0x0000000000437a52 in slapd_daemon_destroy () #2 0x0000003a34706317 in start_thread () from /lib64/tls/libpthread.so.0 #3 0x0000003a340c9d83 in clone () from /lib64/tls/libc.so.6 Thread 16 (Thread 1090525536 (LWP 25923)): #0 0x0000003a34708d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003a34706317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003a340c9d83 in clone () from /lib64/tls/libc.so.6 Thread 15 (Thread 1098918240 (LWP 25924)): #0 0x0000003a34708d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003a34706317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003a340c9d83 in clone () from /lib64/tls/libc.so.6 Thread 14 (Thread 1107310944 (LWP 25925)): #0 0x0000003a34708d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003a34706317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003a340c9d83 in clone () from /lib64/tls/libc.so.6 Thread 13 (Thread 1115703648 (LWP 25926)): #0 0x0000003a34708d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003a34706317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003a340c9d83 in clone () from /lib64/tls/libc.so.6 Thread 12 (Thread 1124096352 (LWP 26071)): #0 0x0000003a34708d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003a34706317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003a340c9d83 in clone () from /lib64/tls/libc.so.6 Thread 11 (Thread 1132489056 (LWP 26072)): #0 0x0000003a340b0719 in sched_yield () from /lib64/tls/libc.so.6 #1 0x00000000004e90d0 in ldap_pvt_thread_yield () #2 0x0000002a9630d11e in syncprov_op_search () #3 0x00000000004c3843 in overlay_op_walk () #4 0x00000000004c3a9f in overlay_op_walk () #5 0x00000000004c3b7a in overlay_op_walk () #6 0x000000000043ef62 in fe_op_search () #7 0x000000000043e8c2 in do_search () #8 0x000000000043b6d5 in connection_done () #9 0x000000000043bc89 in connection_client_stop () #10 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy () #11 0x0000003a34706317 in start_thread () from /lib64/tls/libpthread.so.0 #12 0x0000003a340c9d83 in clone () from /lib64/tls/libc.so.6 Thread 10 (Thread 1140881760 (LWP 26073)): #0 0x0000003a34708d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x0000002a956a644b in __db_pthread_mutex_lock () #2 0x0000002a956a5b11 in __db_tas_mutex_lock_int () #3 0x0000002a956a5887 in __db_tas_mutex_lock () #4 0x0000002a95776ed2 in __lock_get_internal () #5 0x0000002a9577525d in __lock_get () #6 0x0000002a957bd9cf in __db_lget () #7 0x0000002a956b67bc in __bamc_writelock () #8 0x0000002a957a5b71 in __dbc_idel () #9 0x0000002a957a5ace in __dbc_del () #10 0x0000002a957b8039 in __dbc_del_pp () #11 0x0000002a961dc91e in hdb_idl_delete_key () #12 0x0000002a961d1d4b in hdb_key_change () #13 0x0000002a961d0d1b in indexer () #14 0x0000002a961d1159 in index_at_values () #15 0x0000002a961d12d2 in hdb_index_values () #16 0x0000002a961d173a in hdb_index_entry () #17 0x0000002a961c5888 in hdb_delete () #18 0x00000000004c38d7 in overlay_op_walk () #19 0x00000000004c3a9f in overlay_op_walk () #20 0x00000000004c3c2e in overlay_op_walk () #21 0x00000000004b5b1a in cancel_extop () #22 0x00000000004af92a in cancel_extop ()
host2$ Thread 9 (Thread 1082132832 (LWP 12700)): #0 0x0000003530bca15c in epoll_wait () from /lib64/tls/libc.so.6 #1 0x0000000000437a52 in slapd_daemon_destroy () #2 0x0000003531206317 in start_thread () from /lib64/tls/libpthread.so.0 #3 0x0000003530bc9d83 in clone () from /lib64/tls/libc.so.6 Thread 8 (Thread 1090525536 (LWP 12701)): #0 0x0000003531208d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003531206317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003530bc9d83 in clone () from /lib64/tls/libc.so.6 Thread 7 (Thread 1098918240 (LWP 12702)): #0 0x0000003531208d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003531206317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003530bc9d83 in clone () from /lib64/tls/libc.so.6 Thread 6 (Thread 1107310944 (LWP 12703)): #0 0x0000003531208d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003531206317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003530bc9d83 in clone () from /lib64/tls/libc.so.6 Thread 5 (Thread 1115703648 (LWP 12704)): #0 0x0000003531208d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003531206317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003530bc9d83 in clone () from /lib64/tls/libc.so.6 Thread 4 (Thread 1124096352 (LWP 13049)): #0 0x0000003531208d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003531206317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003530bc9d83 in clone () from /lib64/tls/libc.so.6 Thread 3 (Thread 1132489056 (LWP 13050)): #0 0x0000003531208d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003531206317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003530bc9d83 in clone () from /lib64/tls/libc.so.6 Thread 2 (Thread 1140881760 (LWP 13059)): #0 0x0000003531208d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003531206317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003530bc9d83 in clone () from /lib64/tls/libc.so.6 Thread 1 (Thread 182903646528 (LWP 12693)): #0 0x000000353120732b in pthread_join () from /lib64/tls/libpthread.so.0 #1 0x00000000004e90a8 in ldap_pvt_thread_join () #2 0x0000000000438bd8 in slapd_daemon () #3 0x000000000041932a in main ()
host3$ Thread 9 (Thread 1082132832 (LWP 20629)): #0 0x00000035c64ca15c in epoll_wait () from /lib64/tls/libc.so.6 #1 0x0000000000437a52 in slapd_daemon_destroy () #2 0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0 #3 0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6 Thread 8 (Thread 1090525536 (LWP 20630)): #0 0x00000035c6d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6 Thread 7 (Thread 1098918240 (LWP 20631)): #0 0x00000035c6d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6 Thread 6 (Thread 1107310944 (LWP 20632)): #0 0x00000035c6d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6 Thread 5 (Thread 1115703648 (LWP 20633)): #0 0x00000035c6d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6 Thread 4 (Thread 1124096352 (LWP 20983)): #0 0x00000035c6d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6 Thread 3 (Thread 1132489056 (LWP 20984)): #0 0x00000035c6d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6 Thread 2 (Thread 1140881760 (LWP 21005)): #0 0x00000035c6d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6 Thread 1 (Thread 182903654720 (LWP 20628)): #0 0x00000035c6d0732b in pthread_join () from /lib64/tls/libpthread.so.0 #1 0x00000000004e90a8 in ldap_pvt_thread_join () #2 0x0000000000438bd8 in slapd_daemon () #3 0x000000000041932a in main ()
host4$ Thread 12 (Thread 1082132832 (LWP 26819)): #0 0x00000030d86ca15c in epoll_wait () from /lib64/tls/libc.so.6 #1 0x0000000000437a52 in slapd_daemon_destroy () #2 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #3 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 11 (Thread 1090525536 (LWP 26820)): #0 0x00000030d86b0719 in sched_yield () from /lib64/tls/libc.so.6 #1 0x00000000004e90d0 in ldap_pvt_thread_yield () #2 0x00000000004af4ab in cancel_extop () #3 0x00000000004b1457 in cancel_extop () #4 0x000000000043bca3 in connection_client_stop () #5 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy () #6 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #7 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 10 (Thread 1098918240 (LWP 26821)): #0 0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x0000002a956a744b in __db_pthread_mutex_lock () #2 0x0000002a956a6b11 in __db_tas_mutex_lock_int () #3 0x0000002a956a6887 in __db_tas_mutex_lock () #4 0x0000002a95777ed2 in __lock_get_internal () #5 0x0000002a957750ae in __lock_vec () #6 0x0000002a95774e53 in __lock_vec_api () #7 0x0000002a95774da3 in __lock_vec_pp () #8 0x0000002a961df6f0 in hdb_cache_entry_db_relock () #9 0x0000002a961e169e in hdb_cache_modify () #10 0x0000002a961c95ac in hdb_modify () #11 0x0000002a9630a959 in syncprov_checkpoint () #12 0x0000002a9630c241 in syncprov_op_response () #13 0x00000000004500f6 in rs_entry2modifiable () #14 0x00000000004502f5 in rs_entry2modifiable () #15 0x000000000045112e in slap_send_ldap_result () #16 0x0000002a961c7266 in hdb_delete () #17 0x00000000004c38d7 in overlay_op_walk () #18 0x00000000004c3a9f in overlay_op_walk () #19 0x00000000004c3c2e in overlay_op_walk () #20 0x000000000045c958 in fe_op_delete () #21 0x000000000045c688 in do_delete () #22 0x000000000043b6d5 in connection_done () #23 0x000000000043bc89 in connection_client_stop () #24 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy () #25 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #26 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 9 (Thread 1107310944 (LWP 26822)): #0 0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x0000002a956a744b in __db_pthread_mutex_lock () #2 0x0000002a956a6b11 in __db_tas_mutex_lock_int () #3 0x0000002a956a6887 in __db_tas_mutex_lock () #4 0x0000002a95777ed2 in __lock_get_internal () #5 0x0000002a9577613d in __lock_get_api () #6 0x0000002a95775fd7 in __lock_get_pp () #7 0x0000002a961df875 in bdb_cache_entry_db_lock () #8 0x0000002a961e0f02 in hdb_cache_find_id () #9 0x0000002a961d707f in hdb_dn2entry () #10 0x0000002a961cd3a1 in hdb_search () #11 0x00000000004c38d7 in overlay_op_walk () #12 0x00000000004c3a9f in overlay_op_walk () #13 0x00000000004c3b7a in overlay_op_walk () #14 0x0000002a96307877 in syncprov_findbase () #15 0x0000002a9630df7a in syncprov_op_search () #16 0x00000000004c3843 in overlay_op_walk () #17 0x00000000004c3a9f in overlay_op_walk () #18 0x00000000004c3b7a in overlay_op_walk () #19 0x000000000043ef62 in fe_op_search () #20 0x000000000043e8c2 in do_search () #21 0x000000000043b6d5 in connection_done () #22 0x000000000043bc89 in connection_client_stop () #23 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy () #24 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #25 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 8 (Thread 1115703648 (LWP 26823)): #0 0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x0000002a956a744b in __db_pthread_mutex_lock () #2 0x0000002a956a6b11 in __db_tas_mutex_lock_int () #3 0x0000002a956a6887 in __db_tas_mutex_lock () #4 0x0000002a95777ed2 in __lock_get_internal () #5 0x0000002a9577625d in __lock_get () #6 0x0000002a957be9cf in __db_lget () #7 0x0000002a956d1d66 in __bam_search () #8 0x0000002a956b8ca8 in __bamc_search () #9 0x0000002a956b6918 in __bamc_put () #10 0x0000002a957a98ee in __dbc_iput () #11 0x0000002a957a9747 in __dbc_put () #12 0x0000002a95795be7 in __db_put () #13 0x0000002a957b7d05 in __db_put_pp () #14 0x0000002a961d9eff in bdb_id2entry_put () #15 0x0000002a961d9f7b in hdb_id2entry_update () #16 0x0000002a961c92c4 in hdb_modify () #17 0x00000000004c38d7 in overlay_op_walk () #18 0x00000000004c3a9f in overlay_op_walk () #19 0x00000000004c3bc2 in overlay_op_walk () #20 0x00000000004b7811 in syncrepl_add_glue () #21 0x00000000004af957 in cancel_extop () #22 0x00000000004b1457 in cancel_extop () #23 0x000000000043bca3 in connection_client_stop () #24 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy () #25 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #26 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 7 (Thread 1124096352 (LWP 26838)): #0 0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 6 (Thread 1132489056 (LWP 26839)): #0 0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x0000002a956a744b in __db_pthread_mutex_lock () #2 0x0000002a956a6b11 in __db_tas_mutex_lock_int () #3 0x0000002a956a6887 in __db_tas_mutex_lock () #4 0x0000002a95777ed2 in __lock_get_internal () #5 0x0000002a9577613d in __lock_get_api () #6 0x0000002a95775fd7 in __lock_get_pp () #7 0x0000002a961df875 in bdb_cache_entry_db_lock () #8 0x0000002a961e0f02 in hdb_cache_find_id () #9 0x0000002a961d707f in hdb_dn2entry () #10 0x0000002a961cd3a1 in hdb_search () #11 0x00000000004c38d7 in overlay_op_walk () #12 0x00000000004c3a9f in overlay_op_walk () #13 0x00000000004c3b7a in overlay_op_walk () #14 0x0000002a96307877 in syncprov_findbase () #15 0x0000002a9630df7a in syncprov_op_search () #16 0x00000000004c3843 in overlay_op_walk () #17 0x00000000004c3a9f in overlay_op_walk () #18 0x00000000004c3b7a in overlay_op_walk () #19 0x000000000043ef62 in fe_op_search () #20 0x000000000043e8c2 in do_search () #21 0x000000000043b6d5 in connection_done () #22 0x000000000043bc89 in connection_client_stop () #23 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy () #24 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #25 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 5 (Thread 1140881760 (LWP 26840)): #0 0x00000030d8d0b16b in __lll_mutex_lock_wait () #1 0x00000000440066b0 in ?? () #2 0x0000000000000010 in ?? () #3 0x00000030d8d07f34 in pthread_mutex_lock () from /lib64/tls/libpthread.so.0 #4 0x0000002ab68eb520 in ?? () #5 0x0000000000000028 in ?? () #6 0x00000004d866b20d in ?? () #7 0x0000000000000050 in ?? () #8 0x0000002ab5c00020 in ?? () #9 0x0000000000000029 in ?? () #10 0x00000030d8d06280 in __free_tcb () from /lib64/tls/libpthread.so.0 #11 0x00000000410005e0 in ?? () #12 0x0000002ab5c00020 in ?? () #13 0x000000000000000c in ?? () #14 0x00000030d8d06280 in __free_tcb () from /lib64/tls/libpthread.so.0 #15 0x00000000410005e0 in ?? () #16 0x0000000000000001 in ?? () #17 0x00000000410005e0 in ?? () #18 0x00000030d866bc22 in malloc () from /lib64/tls/libc.so.6 #19 0x0000000000772040 in ?? () #20 0x0000000044006340 in ?? () #21 0x00000000004ade33 in cancel_extop () Thread 4 (Thread 1149274464 (LWP 26972)): #0 0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x0000002a956a744b in __db_pthread_mutex_lock () #2 0x0000002a956a6b11 in __db_tas_mutex_lock_int () #3 0x0000002a956a6887 in __db_tas_mutex_lock () #4 0x0000002a95777ed2 in __lock_get_internal () #5 0x0000002a9577613d in __lock_get_api () #6 0x0000002a95775fd7 in __lock_get_pp () #7 0x0000002a961df875 in bdb_cache_entry_db_lock () #8 0x0000002a961e0f02 in hdb_cache_find_id () #9 0x0000002a961d707f in hdb_dn2entry () #10 0x0000002a961cd3a1 in hdb_search () #11 0x00000000004c38d7 in overlay_op_walk () #12 0x00000000004c3a9f in overlay_op_walk () #13 0x00000000004c3b7a in overlay_op_walk () #14 0x000000000043ef62 in fe_op_search () #15 0x000000000043e8c2 in do_search () #16 0x000000000043b6d5 in connection_done () #17 0x000000000043bc89 in connection_client_stop () #18 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy () #19 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #20 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 3 (Thread 1157667168 (LWP 26973)): #0 0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 2 (Thread 1166059872 (LWP 26974)): #0 0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x0000002a956a744b in __db_pthread_mutex_lock () #2 0x0000002a956a6b11 in __db_tas_mutex_lock_int () #3 0x0000002a956a6887 in __db_tas_mutex_lock () #4 0x0000002a95777ed2 in __lock_get_internal () #5 0x0000002a9577613d in __lock_get_api () #6 0x0000002a95775fd7 in __lock_get_pp () #7 0x0000002a961df875 in bdb_cache_entry_db_lock () #8 0x0000002a961e0f02 in hdb_cache_find_id () #9 0x0000002a961d707f in hdb_dn2entry () #10 0x0000002a961cd3a1 in hdb_search () #11 0x00000000004c38d7 in overlay_op_walk () #12 0x00000000004c3a9f in overlay_op_walk () #13 0x00000000004c3b7a in overlay_op_walk () #14 0x000000000043ef62 in fe_op_search () #15 0x000000000043e8c2 in do_search () #16 0x000000000043b6d5 in connection_done () #17 0x000000000043bc89 in connection_client_stop () #18 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy () #19 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #20 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 1 (Thread 182903650624 (LWP 26818)): #0 0x00000030d8d0732b in pthread_join () from /lib64/tls/libpthread.so.0 #1 0x00000000004e90a8 in ldap_pvt_thread_join () #2 0x0000000000438bd8 in slapd_daemon () #3 0x000000000041932a in main ()
I guess the next step is to start clean and capture some normal verbose logging pstack traces. Then compare it to when one or more of them are hung. Any other suggestions?
Thanks, Mark
On Thu, Mar 31, 2011 at 9:58 PM, GMail mah042@gmail.com wrote:
No I hadn't because the usage and symptoms didn't seem to fit. But it's worth a shot.
Mark
On Mar 31, 2011, at 9:27 PM, Quanah Gibson-Mount quanah@zimbra.com wrote:
--On Thursday, March 31, 2011 9:06 PM -0500 Mark mah042@gmail.com
wrote:
I've been testing a 4-way multi-master setup using OpenLDAP 2.4.25 and I'm having some sporadic problems with it that I'm having difficulty diagnosing..
Have you tried applying the patches in ITS#6872?
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc.
Zimbra :: the leader in open source messaging and collaboration
I think I have an idea of what a good pstack output is supposed to look like.
When idle
The main thread should like this:
Thread 1 (Thread 182903654720 (LWP 22479)): #0 0x00000035c6d0732b in pthread_join () from /lib64/tls/libpthread.so.0 #1 0x00000000004e90a8 in ldap_pvt_thread_join () #2 0x0000000000438bd8 in slapd_daemon () #3 0x000000000041932a in main ()
The last thread should look like this:
Thread 6 (Thread 1082132832 (LWP 22480)): #0 0x00000035c64ca15c in epoll_wait () from /lib64/tls/libc.so.6 #1 0x0000000000437a52 in slapd_daemon_destroy () #2 0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0 #3 0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6
and the others should look like this:
Thread *n* (Thread 1090525536 (LWP 22481)): #0 0x00000035c6d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6
After duplicating the issue, that's how three of my multi-masters look (host1, host2 & host3). But one of them (host4) is effectively hung (I *can*connect and browse my cn=config and cn=monitor backends on host4, but not my 'main' backend on host4). Host4 thinks it has ESTABLISHED (consumer) connections to each of the other three (each of the three connections on host4 to the other mmasters show data waiting in the Recv-Q). But the other three show those connections in FIN_WAIT1 state (in which they'll stay until I kill -9 slapd on host4, it won't respond to a kill -TERM) with data in the Send-Q. The pstack trace on host4 looks very confused. Several of the threads seem to be stuck in BDB?:
Thread 8 (Thread 1082132832 (LWP 28288)): #0 0x00000030d86ca15c in epoll_wait () from /lib64/tls/libc.so.6 #1 0x0000000000437a52 in slapd_daemon_destroy () #2 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #3 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 7 (Thread 1090525536 (LWP 28289)): #0 0x00000030d86b0719 in sched_yield () from /lib64/tls/libc.so.6 #1 0x00000000004e90d0 in ldap_pvt_thread_yield () #2 0x0000002a9630e197 in syncprov_op_search () #3 0x00000000004c3843 in overlay_op_walk () #4 0x00000000004c3a9f in overlay_op_walk () #5 0x00000000004c3b7a in overlay_op_walk () #6 0x000000000043ef62 in fe_op_search () #7 0x000000000043e8c2 in do_search () #8 0x000000000043b6d5 in connection_done () #9 0x000000000043bc89 in connection_client_stop () #10 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy () #11 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #12 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 6 (Thread 1098918240 (LWP 28290)): #0 0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x0000002a956a744b in __db_pthread_mutex_lock () #2 0x0000002a956a6b11 in __db_tas_mutex_lock_int () #3 0x0000002a956a6887 in __db_tas_mutex_lock () #4 0x0000002a95777ed2 in __lock_get_internal () #5 0x0000002a9577625d in __lock_get () #6 0x0000002a957be9cf in __db_lget () #7 0x0000002a956d1d66 in __bam_search () #8 0x0000002a956b8ca8 in __bamc_search () #9 0x0000002a956b6918 in __bamc_put () #10 0x0000002a957a98ee in __dbc_iput () #11 0x0000002a957a9747 in __dbc_put () #12 0x0000002a95795be7 in __db_put () #13 0x0000002a957b7d05 in __db_put_pp () #14 0x0000002a961d777e in hdb_dn2id_add () #15 0x0000002a961c3fb4 in hdb_add () #16 0x00000000004c38d7 in overlay_op_walk () #17 0x00000000004c3a9f in overlay_op_walk () #18 0x00000000004c3c0a in overlay_op_walk () #19 0x00000000004b482e in cancel_extop () #20 0x00000000004af92a in cancel_extop () #21 0x00000000004b1457 in cancel_extop () #22 0x000000000043bca3 in connection_client_stop () #23 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy () #24 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #25 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 5 (Thread 1107310944 (LWP 28291)): #0 0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x0000002a956a744b in __db_pthread_mutex_lock () #2 0x0000002a956a6b11 in __db_tas_mutex_lock_int () #3 0x0000002a956a6887 in __db_tas_mutex_lock () #4 0x0000002a95777ed2 in __lock_get_internal () #5 0x0000002a9577625d in __lock_get () #6 0x0000002a957be9cf in __db_lget () #7 0x0000002a956d1d66 in __bam_search () #8 0x0000002a956b8ca8 in __bamc_search () #9 0x0000002a956b6918 in __bamc_put () #10 0x0000002a957a98ee in __dbc_iput () #11 0x0000002a957a9747 in __dbc_put () #12 0x0000002a95795be7 in __db_put () #13 0x0000002a957b7d05 in __db_put_pp () #14 0x0000002a961d777e in hdb_dn2id_add () #15 0x0000002a961c3fb4 in hdb_add () #16 0x00000000004c38d7 in overlay_op_walk () #17 0x00000000004c3a9f in overlay_op_walk () #18 0x00000000004c3c0a in overlay_op_walk () #19 0x00000000004b482e in cancel_extop () #20 0x00000000004af92a in cancel_extop () #21 0x00000000004b1457 in cancel_extop () #22 0x000000000043bca3 in connection_client_stop () #23 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy () #24 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #25 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 4 (Thread 1115703648 (LWP 28292)): #0 0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x0000002a956a744b in __db_pthread_mutex_lock () #2 0x0000002a956a6b11 in __db_tas_mutex_lock_int () #3 0x0000002a956a6887 in __db_tas_mutex_lock () #4 0x0000002a95777ed2 in __lock_get_internal () #5 0x0000002a9577625d in __lock_get () #6 0x0000002a957be9cf in __db_lget () #7 0x0000002a956d1d66 in __bam_search () #8 0x0000002a956b8ca8 in __bamc_search () #9 0x0000002a956b2ea0 in __bamc_get () #10 0x0000002a957a78e7 in __dbc_iget () #11 0x0000002a957a730b in __dbc_get () #12 0x0000002a957b93fe in __dbc_get_pp () #13 0x0000002a961d8366 in hdb_dn2id () #14 0x0000002a961dff3f in hdb_cache_find_ndn () #15 0x0000002a961d6f7c in hdb_dn2entry () #16 0x0000002a961c35ca in hdb_add () #17 0x00000000004c38d7 in overlay_op_walk () #18 0x00000000004c3a9f in overlay_op_walk () #19 0x00000000004c3c0a in overlay_op_walk () #20 0x00000000004b482e in cancel_extop () #21 0x00000000004af92a in cancel_extop () #22 0x00000000004b1457 in cancel_extop () #23 0x000000000043bca3 in connection_client_stop () #24 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy () #25 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #26 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 3 (Thread 1124096352 (LWP 28297)): #0 0x00000030d86b0719 in sched_yield () from /lib64/tls/libc.so.6 #1 0x00000000004e90d0 in ldap_pvt_thread_yield () #2 0x0000002a9630e197 in syncprov_op_search () #3 0x00000000004c3843 in overlay_op_walk () #4 0x00000000004c3a9f in overlay_op_walk () #5 0x00000000004c3b7a in overlay_op_walk () #6 0x000000000043ef62 in fe_op_search () #7 0x000000000043e8c2 in do_search () #8 0x000000000043b6d5 in connection_done () #9 0x000000000043bc89 in connection_client_stop () #10 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy () #11 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #12 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 2 (Thread 1132489056 (LWP 28298)): #0 0x00000030d86b0719 in sched_yield () from /lib64/tls/libc.so.6 #1 0x00000000004e90d0 in ldap_pvt_thread_yield () #2 0x0000002a9630e197 in syncprov_op_search () #3 0x00000000004c3843 in overlay_op_walk () #4 0x00000000004c3a9f in overlay_op_walk () #5 0x00000000004c3b7a in overlay_op_walk () #6 0x000000000043ef62 in fe_op_search () #7 0x000000000043e8c2 in do_search () #8 0x000000000043b6d5 in connection_done () #9 0x000000000043bc89 in connection_client_stop () #10 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy () #11 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #12 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 1 (Thread 182903650624 (LWP 28287)): #0 0x00000030d8d0732b in pthread_join () from /lib64/tls/libpthread.so.0 #1 0x00000000004e90a8 in ldap_pvt_thread_join () #2 0x0000000000438bd8 in slapd_daemon () #3 0x000000000041932a in main ()
Can anyone help me determine what's going on?
Thanks, Mark
On Sat, Apr 2, 2011 at 9:31 PM, Mark mah042@gmail.com wrote:
The patch in ITS#6872 didn't fix the issue.
My first thought was to enable all the logging, but there's *so much data*and I don't know what's normal and what isn't. I captured the (netstat) connection information on all four hosts. Several of the connections are stuck in FIN_WAIT1 which normally is a quick, transitional state:
host1$ netstat -an | fgrep :1389 Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 10.1.1.1:1389 0.0.0.0:* LISTEN tcp 65115 0 10.1.1.1:19284 --> 10.1.1.4:1389 ESTABLISHED tcp 0 0 10.1.1.1:1389 <-- 10.1.1.4:36991 ESTABLISHED tcp 73458 0 10.1.1.1:19286 --> 10.1.1.3:1389 ESTABLISHED tcp 0 0 10.1.1.1:1389 <-- 10.1.1.3:38085 ESTABLISHED tcp 73112 0 10.1.1.1:19263 --> 10.1.1.2:1389 ESTABLISHED tcp 0 0 10.1.1.1:1389 <-- 10.1.1.2:41374 ESTABLISHED
host2$ netstat -an | fgrep :1389 Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 10.1.1.2:1389 0.0.0.0:* LISTEN tcp 0 11537 10.1.1.2:1389 <-- 10.1.1.1:19263 FIN_WAIT1 tcp 0 0 10.1.1.2:1389 <-- 10.1.1.4:36992 ESTABLISHED tcp 0 0 10.1.1.2:1389 <-- 10.1.1.3:38086 ESTABLISHED tcp 0 0 10.1.1.2:41373 --> 10.1.1.3:1389 ESTABLISHED tcp 0 0 10.1.1.2:41375 --> 10.1.1.4:1389 ESTABLISHED tcp 0 0 10.1.1.2:41374 --> 10.1.1.1:1389 ESTABLISHED
host3$ netstat -an | fgrep :1389 Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 10.1.1.3:1389 0.0.0.0:* LISTEN tcp 0 11521 10.1.1.3:1389 <-- 10.1.1.1:19286 FIN_WAIT1 tcp 0 0 10.1.1.3:38087 --> 10.1.1.4:1389 ESTABLISHED tcp 0 0 10.1.1.3:38085 --> 10.1.1.1:1389 ESTABLISHED tcp 0 11505 10.1.1.3:1389 <-- 10.1.1.4:37000 FIN_WAIT1 tcp 0 0 10.1.1.3:38086 --> 10.1.1.2:1389 ESTABLISHED tcp 0 0 10.1.1.3:1389 <-- 10.1.1.2:41373 ESTABLISHED
host4$ netstat -an | fgrep :1389 Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 10.1.1.4:1389 0.0.0.0:* LISTEN tcp 0 14281 10.1.1.4:1389 <-- 10.1.1.1:19284 FIN_WAIT1 tcp 73567 0 10.1.1.4:37000 --> 10.1.1.3:1389 ESTABLISHED tcp 0 0 10.1.1.4:1389 <-- 10.1.1.3:38087 ESTABLISHED tcp 17534 0 10.1.1.4:36991 --> 10.1.1.1:1389 ESTABLISHED tcp 0 0 10.1.1.4:1389 <-- 10.1.1.2:41375 ESTABLISHED tcp 0 0 10.1.1.4:36992 --> 10.1.1.2:1389 ESTABLISHED
I also captured a *pstack* trace on each of the four slapds. But again, I'm not sure what's *normal*:
host1$ Thread 17 (Thread 1082132832 (LWP 25922)): #0 0x0000003a340ca15c in epoll_wait () from /lib64/tls/libc.so.6 #1 0x0000000000437a52 in slapd_daemon_destroy () #2 0x0000003a34706317 in start_thread () from /lib64/tls/libpthread.so.0 #3 0x0000003a340c9d83 in clone () from /lib64/tls/libc.so.6 Thread 16 (Thread 1090525536 (LWP 25923)): #0 0x0000003a34708d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003a34706317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003a340c9d83 in clone () from /lib64/tls/libc.so.6 Thread 15 (Thread 1098918240 (LWP 25924)): #0 0x0000003a34708d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003a34706317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003a340c9d83 in clone () from /lib64/tls/libc.so.6 Thread 14 (Thread 1107310944 (LWP 25925)): #0 0x0000003a34708d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003a34706317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003a340c9d83 in clone () from /lib64/tls/libc.so.6 Thread 13 (Thread 1115703648 (LWP 25926)): #0 0x0000003a34708d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003a34706317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003a340c9d83 in clone () from /lib64/tls/libc.so.6 Thread 12 (Thread 1124096352 (LWP 26071)): #0 0x0000003a34708d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003a34706317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003a340c9d83 in clone () from /lib64/tls/libc.so.6 Thread 11 (Thread 1132489056 (LWP 26072)): #0 0x0000003a340b0719 in sched_yield () from /lib64/tls/libc.so.6 #1 0x00000000004e90d0 in ldap_pvt_thread_yield () #2 0x0000002a9630d11e in syncprov_op_search () #3 0x00000000004c3843 in overlay_op_walk () #4 0x00000000004c3a9f in overlay_op_walk () #5 0x00000000004c3b7a in overlay_op_walk () #6 0x000000000043ef62 in fe_op_search () #7 0x000000000043e8c2 in do_search () #8 0x000000000043b6d5 in connection_done () #9 0x000000000043bc89 in connection_client_stop () #10 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy () #11 0x0000003a34706317 in start_thread () from /lib64/tls/libpthread.so.0 #12 0x0000003a340c9d83 in clone () from /lib64/tls/libc.so.6 Thread 10 (Thread 1140881760 (LWP 26073)): #0 0x0000003a34708d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x0000002a956a644b in __db_pthread_mutex_lock () #2 0x0000002a956a5b11 in __db_tas_mutex_lock_int () #3 0x0000002a956a5887 in __db_tas_mutex_lock () #4 0x0000002a95776ed2 in __lock_get_internal () #5 0x0000002a9577525d in __lock_get () #6 0x0000002a957bd9cf in __db_lget () #7 0x0000002a956b67bc in __bamc_writelock () #8 0x0000002a957a5b71 in __dbc_idel () #9 0x0000002a957a5ace in __dbc_del () #10 0x0000002a957b8039 in __dbc_del_pp () #11 0x0000002a961dc91e in hdb_idl_delete_key () #12 0x0000002a961d1d4b in hdb_key_change () #13 0x0000002a961d0d1b in indexer () #14 0x0000002a961d1159 in index_at_values () #15 0x0000002a961d12d2 in hdb_index_values () #16 0x0000002a961d173a in hdb_index_entry () #17 0x0000002a961c5888 in hdb_delete () #18 0x00000000004c38d7 in overlay_op_walk () #19 0x00000000004c3a9f in overlay_op_walk () #20 0x00000000004c3c2e in overlay_op_walk () #21 0x00000000004b5b1a in cancel_extop () #22 0x00000000004af92a in cancel_extop ()
host2$ Thread 9 (Thread 1082132832 (LWP 12700)): #0 0x0000003530bca15c in epoll_wait () from /lib64/tls/libc.so.6 #1 0x0000000000437a52 in slapd_daemon_destroy () #2 0x0000003531206317 in start_thread () from /lib64/tls/libpthread.so.0 #3 0x0000003530bc9d83 in clone () from /lib64/tls/libc.so.6 Thread 8 (Thread 1090525536 (LWP 12701)): #0 0x0000003531208d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003531206317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003530bc9d83 in clone () from /lib64/tls/libc.so.6 Thread 7 (Thread 1098918240 (LWP 12702)): #0 0x0000003531208d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003531206317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003530bc9d83 in clone () from /lib64/tls/libc.so.6 Thread 6 (Thread 1107310944 (LWP 12703)): #0 0x0000003531208d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003531206317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003530bc9d83 in clone () from /lib64/tls/libc.so.6 Thread 5 (Thread 1115703648 (LWP 12704)): #0 0x0000003531208d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003531206317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003530bc9d83 in clone () from /lib64/tls/libc.so.6 Thread 4 (Thread 1124096352 (LWP 13049)): #0 0x0000003531208d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003531206317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003530bc9d83 in clone () from /lib64/tls/libc.so.6 Thread 3 (Thread 1132489056 (LWP 13050)): #0 0x0000003531208d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003531206317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003530bc9d83 in clone () from /lib64/tls/libc.so.6 Thread 2 (Thread 1140881760 (LWP 13059)): #0 0x0000003531208d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003531206317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003530bc9d83 in clone () from /lib64/tls/libc.so.6 Thread 1 (Thread 182903646528 (LWP 12693)): #0 0x000000353120732b in pthread_join () from /lib64/tls/libpthread.so.0 #1 0x00000000004e90a8 in ldap_pvt_thread_join () #2 0x0000000000438bd8 in slapd_daemon () #3 0x000000000041932a in main ()
host3$ Thread 9 (Thread 1082132832 (LWP 20629)): #0 0x00000035c64ca15c in epoll_wait () from /lib64/tls/libc.so.6 #1 0x0000000000437a52 in slapd_daemon_destroy () #2 0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0 #3 0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6 Thread 8 (Thread 1090525536 (LWP 20630)): #0 0x00000035c6d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6 Thread 7 (Thread 1098918240 (LWP 20631)): #0 0x00000035c6d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6 Thread 6 (Thread 1107310944 (LWP 20632)): #0 0x00000035c6d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6 Thread 5 (Thread 1115703648 (LWP 20633)): #0 0x00000035c6d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6 Thread 4 (Thread 1124096352 (LWP 20983)): #0 0x00000035c6d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6 Thread 3 (Thread 1132489056 (LWP 20984)): #0 0x00000035c6d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6 Thread 2 (Thread 1140881760 (LWP 21005)): #0 0x00000035c6d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6 Thread 1 (Thread 182903654720 (LWP 20628)): #0 0x00000035c6d0732b in pthread_join () from /lib64/tls/libpthread.so.0 #1 0x00000000004e90a8 in ldap_pvt_thread_join () #2 0x0000000000438bd8 in slapd_daemon () #3 0x000000000041932a in main ()
host4$ Thread 12 (Thread 1082132832 (LWP 26819)): #0 0x00000030d86ca15c in epoll_wait () from /lib64/tls/libc.so.6 #1 0x0000000000437a52 in slapd_daemon_destroy () #2 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #3 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 11 (Thread 1090525536 (LWP 26820)): #0 0x00000030d86b0719 in sched_yield () from /lib64/tls/libc.so.6 #1 0x00000000004e90d0 in ldap_pvt_thread_yield () #2 0x00000000004af4ab in cancel_extop () #3 0x00000000004b1457 in cancel_extop () #4 0x000000000043bca3 in connection_client_stop () #5 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy () #6 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #7 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 10 (Thread 1098918240 (LWP 26821)): #0 0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x0000002a956a744b in __db_pthread_mutex_lock () #2 0x0000002a956a6b11 in __db_tas_mutex_lock_int () #3 0x0000002a956a6887 in __db_tas_mutex_lock () #4 0x0000002a95777ed2 in __lock_get_internal () #5 0x0000002a957750ae in __lock_vec () #6 0x0000002a95774e53 in __lock_vec_api () #7 0x0000002a95774da3 in __lock_vec_pp () #8 0x0000002a961df6f0 in hdb_cache_entry_db_relock () #9 0x0000002a961e169e in hdb_cache_modify () #10 0x0000002a961c95ac in hdb_modify () #11 0x0000002a9630a959 in syncprov_checkpoint () #12 0x0000002a9630c241 in syncprov_op_response () #13 0x00000000004500f6 in rs_entry2modifiable () #14 0x00000000004502f5 in rs_entry2modifiable () #15 0x000000000045112e in slap_send_ldap_result () #16 0x0000002a961c7266 in hdb_delete () #17 0x00000000004c38d7 in overlay_op_walk () #18 0x00000000004c3a9f in overlay_op_walk () #19 0x00000000004c3c2e in overlay_op_walk () #20 0x000000000045c958 in fe_op_delete () #21 0x000000000045c688 in do_delete () #22 0x000000000043b6d5 in connection_done () #23 0x000000000043bc89 in connection_client_stop () #24 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy () #25 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #26 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 9 (Thread 1107310944 (LWP 26822)): #0 0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x0000002a956a744b in __db_pthread_mutex_lock () #2 0x0000002a956a6b11 in __db_tas_mutex_lock_int () #3 0x0000002a956a6887 in __db_tas_mutex_lock () #4 0x0000002a95777ed2 in __lock_get_internal () #5 0x0000002a9577613d in __lock_get_api () #6 0x0000002a95775fd7 in __lock_get_pp () #7 0x0000002a961df875 in bdb_cache_entry_db_lock () #8 0x0000002a961e0f02 in hdb_cache_find_id () #9 0x0000002a961d707f in hdb_dn2entry () #10 0x0000002a961cd3a1 in hdb_search () #11 0x00000000004c38d7 in overlay_op_walk () #12 0x00000000004c3a9f in overlay_op_walk () #13 0x00000000004c3b7a in overlay_op_walk () #14 0x0000002a96307877 in syncprov_findbase () #15 0x0000002a9630df7a in syncprov_op_search () #16 0x00000000004c3843 in overlay_op_walk () #17 0x00000000004c3a9f in overlay_op_walk () #18 0x00000000004c3b7a in overlay_op_walk () #19 0x000000000043ef62 in fe_op_search () #20 0x000000000043e8c2 in do_search () #21 0x000000000043b6d5 in connection_done () #22 0x000000000043bc89 in connection_client_stop () #23 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy () #24 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #25 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 8 (Thread 1115703648 (LWP 26823)): #0 0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x0000002a956a744b in __db_pthread_mutex_lock () #2 0x0000002a956a6b11 in __db_tas_mutex_lock_int () #3 0x0000002a956a6887 in __db_tas_mutex_lock () #4 0x0000002a95777ed2 in __lock_get_internal () #5 0x0000002a9577625d in __lock_get () #6 0x0000002a957be9cf in __db_lget () #7 0x0000002a956d1d66 in __bam_search () #8 0x0000002a956b8ca8 in __bamc_search () #9 0x0000002a956b6918 in __bamc_put () #10 0x0000002a957a98ee in __dbc_iput () #11 0x0000002a957a9747 in __dbc_put () #12 0x0000002a95795be7 in __db_put () #13 0x0000002a957b7d05 in __db_put_pp () #14 0x0000002a961d9eff in bdb_id2entry_put () #15 0x0000002a961d9f7b in hdb_id2entry_update () #16 0x0000002a961c92c4 in hdb_modify () #17 0x00000000004c38d7 in overlay_op_walk () #18 0x00000000004c3a9f in overlay_op_walk () #19 0x00000000004c3bc2 in overlay_op_walk () #20 0x00000000004b7811 in syncrepl_add_glue () #21 0x00000000004af957 in cancel_extop () #22 0x00000000004b1457 in cancel_extop () #23 0x000000000043bca3 in connection_client_stop () #24 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy () #25 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #26 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 7 (Thread 1124096352 (LWP 26838)): #0 0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 6 (Thread 1132489056 (LWP 26839)): #0 0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x0000002a956a744b in __db_pthread_mutex_lock () #2 0x0000002a956a6b11 in __db_tas_mutex_lock_int () #3 0x0000002a956a6887 in __db_tas_mutex_lock () #4 0x0000002a95777ed2 in __lock_get_internal () #5 0x0000002a9577613d in __lock_get_api () #6 0x0000002a95775fd7 in __lock_get_pp () #7 0x0000002a961df875 in bdb_cache_entry_db_lock () #8 0x0000002a961e0f02 in hdb_cache_find_id () #9 0x0000002a961d707f in hdb_dn2entry () #10 0x0000002a961cd3a1 in hdb_search () #11 0x00000000004c38d7 in overlay_op_walk () #12 0x00000000004c3a9f in overlay_op_walk () #13 0x00000000004c3b7a in overlay_op_walk () #14 0x0000002a96307877 in syncprov_findbase () #15 0x0000002a9630df7a in syncprov_op_search () #16 0x00000000004c3843 in overlay_op_walk () #17 0x00000000004c3a9f in overlay_op_walk () #18 0x00000000004c3b7a in overlay_op_walk () #19 0x000000000043ef62 in fe_op_search () #20 0x000000000043e8c2 in do_search () #21 0x000000000043b6d5 in connection_done () #22 0x000000000043bc89 in connection_client_stop () #23 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy () #24 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #25 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 5 (Thread 1140881760 (LWP 26840)): #0 0x00000030d8d0b16b in __lll_mutex_lock_wait () #1 0x00000000440066b0 in ?? () #2 0x0000000000000010 in ?? () #3 0x00000030d8d07f34 in pthread_mutex_lock () from /lib64/tls/libpthread.so.0 #4 0x0000002ab68eb520 in ?? () #5 0x0000000000000028 in ?? () #6 0x00000004d866b20d in ?? () #7 0x0000000000000050 in ?? () #8 0x0000002ab5c00020 in ?? () #9 0x0000000000000029 in ?? () #10 0x00000030d8d06280 in __free_tcb () from /lib64/tls/libpthread.so.0 #11 0x00000000410005e0 in ?? () #12 0x0000002ab5c00020 in ?? () #13 0x000000000000000c in ?? () #14 0x00000030d8d06280 in __free_tcb () from /lib64/tls/libpthread.so.0 #15 0x00000000410005e0 in ?? () #16 0x0000000000000001 in ?? () #17 0x00000000410005e0 in ?? () #18 0x00000030d866bc22 in malloc () from /lib64/tls/libc.so.6 #19 0x0000000000772040 in ?? () #20 0x0000000044006340 in ?? () #21 0x00000000004ade33 in cancel_extop () Thread 4 (Thread 1149274464 (LWP 26972)): #0 0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x0000002a956a744b in __db_pthread_mutex_lock () #2 0x0000002a956a6b11 in __db_tas_mutex_lock_int () #3 0x0000002a956a6887 in __db_tas_mutex_lock () #4 0x0000002a95777ed2 in __lock_get_internal () #5 0x0000002a9577613d in __lock_get_api () #6 0x0000002a95775fd7 in __lock_get_pp () #7 0x0000002a961df875 in bdb_cache_entry_db_lock () #8 0x0000002a961e0f02 in hdb_cache_find_id () #9 0x0000002a961d707f in hdb_dn2entry () #10 0x0000002a961cd3a1 in hdb_search () #11 0x00000000004c38d7 in overlay_op_walk () #12 0x00000000004c3a9f in overlay_op_walk () #13 0x00000000004c3b7a in overlay_op_walk () #14 0x000000000043ef62 in fe_op_search () #15 0x000000000043e8c2 in do_search () #16 0x000000000043b6d5 in connection_done () #17 0x000000000043bc89 in connection_client_stop () #18 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy () #19 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #20 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 3 (Thread 1157667168 (LWP 26973)): #0 0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 2 (Thread 1166059872 (LWP 26974)): #0 0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x0000002a956a744b in __db_pthread_mutex_lock () #2 0x0000002a956a6b11 in __db_tas_mutex_lock_int () #3 0x0000002a956a6887 in __db_tas_mutex_lock () #4 0x0000002a95777ed2 in __lock_get_internal () #5 0x0000002a9577613d in __lock_get_api () #6 0x0000002a95775fd7 in __lock_get_pp () #7 0x0000002a961df875 in bdb_cache_entry_db_lock () #8 0x0000002a961e0f02 in hdb_cache_find_id () #9 0x0000002a961d707f in hdb_dn2entry () #10 0x0000002a961cd3a1 in hdb_search () #11 0x00000000004c38d7 in overlay_op_walk () #12 0x00000000004c3a9f in overlay_op_walk () #13 0x00000000004c3b7a in overlay_op_walk () #14 0x000000000043ef62 in fe_op_search () #15 0x000000000043e8c2 in do_search () #16 0x000000000043b6d5 in connection_done () #17 0x000000000043bc89 in connection_client_stop () #18 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy () #19 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #20 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 1 (Thread 182903650624 (LWP 26818)): #0 0x00000030d8d0732b in pthread_join () from /lib64/tls/libpthread.so.0 #1 0x00000000004e90a8 in ldap_pvt_thread_join () #2 0x0000000000438bd8 in slapd_daemon () #3 0x000000000041932a in main ()
I guess the next step is to start clean and capture some normal verbose logging pstack traces. Then compare it to when one or more of them are hung. Any other suggestions?
Thanks, Mark
On Thu, Mar 31, 2011 at 9:58 PM, GMail mah042@gmail.com wrote:
No I hadn't because the usage and symptoms didn't seem to fit. But it's worth a shot.
Mark
On Mar 31, 2011, at 9:27 PM, Quanah Gibson-Mount quanah@zimbra.com wrote:
--On Thursday, March 31, 2011 9:06 PM -0500 Mark mah042@gmail.com
wrote:
I've been testing a 4-way multi-master setup using OpenLDAP 2.4.25 and I'm having some sporadic problems with it that I'm having difficulty diagnosing..
Have you tried applying the patches in ITS#6872?
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc.
Zimbra :: the leader in open source messaging and collaboration
openldap-technical@openldap.org