Full_Name: Nikolai Schupbach Version: 2.4.31 OS: FreeBSD URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (202.78.158.60)
We are experiencing frequent hangs in slapd. Once hung we can continue to connect, but all searches will just hang indefinitely until we kill -9 the slapd process and restart it. The directory is used for mail routing and we have been migrating to it from an existing directory server over the last 3 weeks - we have noted the busier the directory becomes the more often it hangs (now once every 2 days).
We have one master and 10 syncrepl read only replicas - the master is used mainly for writes and has not hung yet, but most of the replicas have hung at least once. The replicas receive anywhere between 50 to 300 searches/sec, while the master would only get 1/sec. There are 45k entries in the directory.
We are running:
FreeBSD 8.3/9.0 x64 OpenLDAP 2.4.31 Berkeley DB 4.6.21
The old directory we are migrating from has the same load and is also running OpenLDAP, but has been rock solid for 5 years. It is running Berkeley DB 4.3.29 and OpenLDAP 2.3.27.
We have managed to collect db_stat lock information, which indicates the same issue each time - a write lock on dn2id.bdb.
Locks grouped by object: Locker Mode Count Status ----------------- Object --------------- 8000a85e READ 1 HELD 0xb26c8 len: 9 data: 60xa800000000000000
8a READ 1 HELD id2entry.bdb handle 0
8c READ 1 HELD dn2id.bdb handle 0
96 READ 1 HELD objectClass.bdb handle 0
93 READ 1 HELD entryCSN.bdb handle 0
90 READ 1 HELD entryUUID.bdb handle 0
8000a85f WRITE 4 HELD dn2id.bdb page 219
80000782 READ 1 HELD dn2id.bdb page 768 80000a45 READ 1 HELD dn2id.bdb page 768 80000b9e READ 1 HELD dn2id.bdb page 768 800006a0 READ 1 HELD dn2id.bdb page 768 80000771 READ 1 HELD dn2id.bdb page 768 80000534 READ 1 HELD dn2id.bdb page 768 80000a44 READ 1 HELD dn2id.bdb page 768 80000641 READ 1 HELD dn2id.bdb page 768 80001049 READ 1 HELD dn2id.bdb page 768 8000104a READ 1 HELD dn2id.bdb page 768 80001048 READ 1 HELD dn2id.bdb page 768 80000783 READ 1 HELD dn2id.bdb page 768 80000535 READ 1 HELD dn2id.bdb page 768 8000066e READ 1 HELD dn2id.bdb page 768 80000697 READ 1 HELD dn2id.bdb page 768 8000a85f READ 1 HELD dn2id.bdb page 768
8000a85e READ 1 HELD 0xb19a8 len: 9 data: 40xa800000000000000
8000a85f READ 1 HELD dn2id.bdb page 933 8000a85f WRITE 2 HELD dn2id.bdb page 933
80001047 WRITE 1 HELD dn2id.bdb page 559 80000782 READ 1 WAIT dn2id.bdb page 559 80000a45 READ 1 WAIT dn2id.bdb page 559 80000b9e READ 1 WAIT dn2id.bdb page 559 800006a0 READ 1 WAIT dn2id.bdb page 559 80000771 READ 1 WAIT dn2id.bdb page 559 80000534 READ 1 WAIT dn2id.bdb page 559 80000a44 READ 1 WAIT dn2id.bdb page 559 80000641 READ 1 WAIT dn2id.bdb page 559 80001049 READ 1 WAIT dn2id.bdb page 559 8000104a READ 1 WAIT dn2id.bdb page 559 80001048 READ 1 WAIT dn2id.bdb page 559 80000783 READ 1 WAIT dn2id.bdb page 559 80000535 READ 1 WAIT dn2id.bdb page 559 8000066e READ 1 WAIT dn2id.bdb page 559 80000697 READ 1 WAIT dn2id.bdb page 559 8000a85f READ 1 WAIT dn2id.bdb page 559
8000a85f READ 2 HELD dn2id.bdb page 1362 8000a85f WRITE 2 HELD dn2id.bdb page 1362
8000a85f READ 2 HELD dn2id.bdb page 1353 8000a85f WRITE 2 HELD dn2id.bdb page 1353
b6 READ 1 HELD uid.bdb handle 0
a5 READ 1 HELD mail.bdb handle 0
af READ 1 HELD mailLocalAddress.bdb handle 0
9b READ 1 HELD miLoginid.bdb handle 0
aa READ 1 HELD mailHost.bdb handle 0
bb READ 1 HELD miDomainName.bdb handle 0
c0 READ 1 HELD mpMailHost.bdb handle 0
a0 READ 1 HELD mpMailUserType.bdb handle 0
We have also collected the backtrace for all the threads which I have uploaded to:
ftp://ftp.openldap.org/incoming/nikolai-gdb-120902.txt
The full db_stat output is located at:
ftp://ftp.openldap.org/incoming/nikolai-dbstat-120902.txt
Our DB_CONFIG:
# One 512MB cache set_cachesize 0 536870912 1
# Transaction Log settings set_lg_regionmax 1048576 set_lg_max 10485760 set_lg_bsize 2097152 set_flags DB_LOG_AUTOREMOVE
# Increase lock maximums set_lk_max_locks 2000 set_lk_max_lockers 2000 set_lk_max_objects 2000
Our slapd.conf on our replicas:
# Load the following schema files include /usr/local/etc/openldap/schema/core.schema include /usr/local/etc/openldap/schema/cosine.schema include /usr/local/etc/openldap/schema/nis.schema include /usr/local/etc/openldap/schema/inetorgperson.schema include /usr/local/etc/openldap/schema/misc.schema include /usr/local/etc/openldap/schema/mirapoint.schema include /usr/local/etc/openldap/schema/smp.schema
# Runtime settings for slapd pidfile /var/run/openldap/slapd.pid argsfile /var/run/openldap/slapd.args loglevel none
# TLS security options for slapd. TLSCipherSuite HIGH TLSCACertificateFile /usr/local/etc/openldap/tls/ca-cert.pem TLSCertificateFile /usr/local/etc/openldap/tls/server-cert.pem TLSCertificateKeyFile /usr/local/etc/openldap/tls/server-key.pem
# This option configures one or more hashes to be used in generation # of user passwords stored in the userPassword attribute during # processing of LDAP Password Modify Extended Operations (RFC 3062). password-hash {SSHA}
# Load dynamic backend modules: modulepath /usr/local/libexec/openldap moduleload back_bdb moduleload back_monitor
# Do not limit size or time of requests. sizelimit unlimited timelimit unlimited
# Require authentication prior to directory operations require authc
############################################################################### # BDB Database Definitions # # The following configuration directives relate to bdb database definitions ###############################################################################
# The remaining configuration directives relate to bdb database definitions database bdb suffix "o=top" rootdn "cn=root,o=top"
# Cleartext passwords, especially for the rootdn, should # be avoid. See slappasswd(8) and slapd.conf(5) for details. rootpw {SSHA}**********
# The database directory must exist prior to running slapd and # should only be accessible by the slapd and slap tools. directory /var/db/openldap-data
# Indices to maintain index cn eq,sub,pres index entryUUID eq index entryCSN eq index mail eq,sub,pres index mailHost eq index mailLocalAddress eq,sub,pres index miDomainName eq,sub index miLoginId eq,pres index mpMailHost eq index mpMailUserType eq index mpSystemRole eq index objectClass eq,pres index uid eq,pres
# Specify the number of entries which should be held in memory cachesize 200000
# Set transactional checkpoint checkpoint 512 60
############################################################################### # LDAP Sync Replication # # A unique replica id number is required for each replication client ###############################################################################
# LDAP sync replication settings syncrepl rid=36 provider=ldaps://ldapmaster/ type=refreshAndPersist retry=30,+ searchbase="o=top" filter="(objectClass=*)" scope=sub attrs="*" sizelimit=unlimited timelimit=unlimited schemachecking=off bindmethod=simple binddn="cn=replica,ou=users,ou=directory,o=top" credentials=**********
# Where to refer ldap updates to updateref ldaps://ldapmaster/
############################################################################### # LDAP Statistics # # The OpenLDAP server can be configured to provide real time performance # statistics through the monitor branch. ###############################################################################
# Enable the statistics monitoring database database monitor
# Allow access to monitoring user only access to dn.subtree="cn=monitor" by dn.exact="cn=monitor,ou=users,ou=directory,o=top" read by * none
Sincerely, Nikolai Schupbach