hrvoje.habjanic@zg.t-com.hr wrote:
Full_Name: Hrvoje Version: 2.4.30 OS: Centos 6.2 x86_64 URL: http://free-zg.t-com.hr/HrvojeHabjanic/hang2.log Submission from: (NULL) (195.29.148.138)
Hi.
While testing openlap, with some of my data, slapd regularly hangs. I did manage to "catch" it, but i need expert's interpretation of traces.
I' using db-5.3.15 (latest), compiled with:
../dist/configure \ --enable-shared --enable-static \ --enable-tcl --with-tcl=/usr/lib64 \ --enable-cxx --enable-sql \ --enable-java \ --enable-test \ --with-tcl=/usr/lib64/tcl8.5 \ --disable-rpath \ --enable-debug \ --prefix=/usr/local/db
and openldap-2.4.30, compiled with:
CFLAGS="-g -I/usr/local/db/include" CPPFLAGS="-g -I/usr/local/db/include" LDFLAGS="-L/usr/local/db/lib -Wl,-R/usr/local/db/lib" ./configure \ --prefix=/usr/local/openldap \ --enable-local \ --enable-rlookups \ --with-tls=no \ --with-cyrus-sasl \ --enable-wrappers \ --enable-passwd \ --enable-cleartext \ --enable-crypt \ --enable-spasswd \ --disable-lmpasswd \ --enable-modules \ --disable-sql \ --enable-slapd \ --enable-bdb \ --enable-hdb \ --enable-ldap \ --enable-meta \ --enable-monitor \ --enable-null \ --enable-shell \ --disable-ndb \ --enable-passwd \ --enable-sock \ --disable-perl \ --enable-relay \ --disable-shared \ --disable-dynamic \ --enable-overlays=mod \ --enable-mdb \ --enable-debug=yes
Slapd is configured to use slapd.d directory (db). Inside, two databases are configured - ie. ou=p,dc=pero,dc=com and ou=d,dc=pero,dc=com, including monitor db. First database is using 10Gb on disk, and have around 10M unique dn's, while second one is using around 3-4Gb, few mil. dn's.
Server have 16G of ram, and 2xquad core CPU - total of 8 cpu's (and disks are local).
I'm using python scripts to generate load on openldap. First i fill in required data (10Gb), and then do some transaction processing (read/update/write).
Filling part goes without problems, but on transaction processing, slapd regularly gets stuck. I'm only able to trigger this using more than one connection - simulating couple of clients, and high load (1-2 req/sec). Complete traces from gdb when this happens, are http://free-zg.t-com.hr/HrvojeHabjanic/hang2.log .
So, am i doing something wrong or openldap is...?
Looks like your glibc malloc is deadlocked. A Centos bug, not an OpenLDAP bug.
In the trace, you could confirm this in gdb with: thread 13 frame 3 print *mutex
most likely the "owner" field of this mutex will be 1502, which corresponds to thread 17, which is waiting for a lock inside libc malloc/free.
You may be able to avoid this bug by using an alternate malloc library, such as Google tcmalloc.