Aaron Richton wrote:
> On Sat, 2 May 2009, Howard Chu wrote:
>
>> Fixed in HEAD. You should probably sync up with cache.c too; the tests Quanah
>> and I have been running seem to like this better.
>
> 2.4.16 + synced back-bdb from head. NULL strncmp:
There's no strncmp in this trace. This is a different symptom than before.
>
> t@14 (l@14) terminated by signal SEGV (no mapping at the fault address)
> Current function is avl_insert
> 125 cmp = fcmp( data, p->avl_data )> 0;
> (dbx) where
> current thread: t@14
> =>[1] avl_insert(root = 0x12c9e52c0, data = 0x12cd3e0f0, fcmp = 0x1001a1e90 =&`slapd`cache.c`bdb_rdn_cmp(const void *v_e1, const void *v_e2), fdup = 0x100273f00 =&avl_dup_error(void *left, void *right)), line 125 in "avl.c"
> [2] hdb_cache_find_parent(op = 0x131b07390, txn = 0x132f36090, id = 12937U, res = 0xffffffff45e7e848), line 595 in "cache.c"
> [3] hdb_cache_find_id(op = 0x131b07390, tid = 0x132f36090, id = 12937U, eip = 0xffffffff45e7e848, flag = 0, lock = 0xffffffff45e7e7f8), line 906 in "cache.c"
> [4] hdb_search(op = 0x131b07390, rs = 0xffffffff45fff998), line 706 in "search.c"
> [5] glue_sub_search(op = 0x131b07390, rs = 0xffffffff45fff998, b0 = 0xffffffff45ffeda8, on = 0x1106e0dd0), line 342 in "backglue.c"
> [6] glue_op_search(op = 0x131b07390, rs = 0xffffffff45fff998), line 465 in "backglue.c"
> [7] overlay_op_walk(op = 0x131b07390, rs = 0xffffffff45fff998, which = op_search, oi = 0x1106e1010, on = 0x1106e0dd0), line 659 in "backover.c"
> [8] over_op_func(op = 0x131b07390, rs = 0xffffffff45fff998, which = op_search), line 721 in "backover.c"
> [9] over_op_search(op = 0x131b07390, rs = 0xffffffff45fff998), line 743 in "backover.c"
> [10] fe_op_search(op = 0x131b07390, rs = 0xffffffff45fff998), line 366 in "search.c"
> [11] overlay_op_walk(op = 0x131b07390, rs = 0xffffffff45fff998, which = op_search, oi = 0x1106e16d0, on = (nil)), line 669 in "backover.c"
> [12] over_op_func(op = 0x131b07390, rs = 0xffffffff45fff998, which = op_search), line 721 in "backover.c"
> [13] over_op_search(op = 0x131b07390, rs = 0xffffffff45fff998), line 743 in "backover.c"
> [14] do_search(op = 0x131b07390, rs = 0xffffffff45fff998), line 217 in "search.c"
> [15] connection_operation(ctx = 0xffffffff45fffc20, arg_v = 0x131b07390), line 1097 in "connection.c"
> [16] connection_read_thread(ctx = 0xffffffff45fffc20, argv = 0xbf), line 1223 in "connection.c"
> [17] ldap_int_thread_pool_wrapper(xpool = 0x11062b6a0), line 663 in "tpool.c"
>
> data is null, p is null.
And the trace shows data = 0x12cd3e0f0. What are you looking at?
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
On Fri, May 01, 2009 at 03:00:26PM -0700, Howard Chu wrote:
> jwm(a)horde.net wrote:
>> Full_Name: John Morrissey
>> Version: 2.4.16
>> OS: Linux
>> URL: ftp://ftp.openldap.org/incoming/
>> Submission from: (NULL) (2001:4978:194:0:21f:5bff:fee9:da92)
>>
>> After a couple days of uptime, slapd no longer responds to incoming
>> connections (the connection would be accepted, but all LDAP operations
>> would block). All worker threads seem to be blocking on mutex acquisition
>> in bdb_cache_lru_link(). One thread was chewing lots of CPU.
>>
>> Backtrace is below. I also have a ~1.7GB core if it's deemed useful; I'll
>> keep it around for a week or two. This is with BDB 4.7.25+all three
>> patches.
>
> Interesting trace, it looks like all the active threads are waiting for
> the mutex but apparently none of them owns it. Can you please provide the
> contents of the mutex? e.g.
> thread 14
> frame 3
> print *mutex
(gdb) fra 3
#3 0xb7eec1cd in ldap_pvt_thread_mutex_lock (mutex=0x940a2cc)
at /tmp/buildd/openldap-2.4.16/libraries/libldap_r/thr_posix.c:296
296 return ERRVAL( pthread_mutex_lock( mutex ) );
(gdb) print *mutex
$1 = {__data = {__lock = 2, __count = 0, __owner = 6372, __kind = 0,
__nusers = 1, {__spins = 0, __list = {__next = 0x0}}},
__size = "\002\000\000\000\000\000\000\000###30\000\000\000\000\000\000\001\000\000\000\000\000\000", __align = 2}
LWP 6372 is the thread trying to do BDB lock promotion.
john
--
John Morrissey _o /\ ---- __o
jwm(a)horde.net _-< \_ / \ ---- < \,
www.horde.net/ __(_)/_(_)________/ \_______(_) /_(_)__
hyc(a)symas.com wrote:
> jwm(a)horde.net wrote:
>> Full_Name: John Morrissey
>> Version: 2.4.16
>> OS: Linux
>> URL: ftp://ftp.openldap.org/incoming/
>> Submission from: (NULL) (2001:4978:194:0:21f:5bff:fee9:da92)
>>
>>
>> After a couple days of uptime, slapd no longer responds to incoming connections
>> (the connection would be accepted, but all LDAP operations would block). All
>> worker threads seem to be blocking on mutex acquisition in bdb_cache_lru_link().
>> One thread was chewing lots of CPU.
>>
>> Backtrace is below. I also have a ~1.7GB core if it's deemed useful; I'll keep
>> it around for a week or two. This is with BDB 4.7.25+all three patches.
>>
> Interesting trace, it looks like all the active threads are waiting for the
> mutex but apparently none of them owns it. Can you please provide the contents
> of the mutex? e.g.
> thread 14
> frame 3
> print *mutex
Ah, I missed this before, your thread 3 is inside a BerkeleyDB lock function.
There's nothing useful in the trace for thread 3 though. It seems you may need
to recompile BerkeleyDB with debugging enabled (and with
-fno-omit-frame-pointer) to get a useful trace from this. This is looking more
like a BDB locking issue than an OpenLDAP issue. If you still have the
environment, db_stat -CA would be helpful.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
jwm(a)horde.net wrote:
> Full_Name: John Morrissey
> Version: 2.4.16
> OS: Linux
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (2001:4978:194:0:21f:5bff:fee9:da92)
>
>
> After a couple days of uptime, slapd no longer responds to incoming connections
> (the connection would be accepted, but all LDAP operations would block). All
> worker threads seem to be blocking on mutex acquisition in bdb_cache_lru_link().
> One thread was chewing lots of CPU.
>
> Backtrace is below. I also have a ~1.7GB core if it's deemed useful; I'll keep
> it around for a week or two. This is with BDB 4.7.25+all three patches.
>
Interesting trace, it looks like all the active threads are waiting for the
mutex but apparently none of them owns it. Can you please provide the contents
of the mutex? e.g.
thread 14
frame 3
print *mutex
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
--On Friday, May 01, 2009 6:18 PM +0200 masarati(a)aero.polimi.it wrote:
>> --On Thursday, April 30, 2009 7:34 PM +0000 quanah(a)zimbra.com wrote:
>>
>>> Full_Name: Quanah Gibson-Mount
>>> Version: 2.4.16
>>> OS: Linux 2.6
>>> URL: ftp://ftp.openldap.org/incoming/
>>> Submission from: (NULL) (75.111.29.239)
>>
>> Cache notes for this ITS and ITS#6086:
>>
>> # Entries to cache in memory
>> cachesize 200000
>>
>> # IDL Entries to cache in memory
>> idlcachesize 200000
>>
>> # Entries to free up when cache gets full
>> cachefree 5000
>
> Are you using slapo-rwm? Can you post the configuration?
No, slapo-rwm is not in use. It's using back-hdb, slapo-valsort, dynlist,
and back-monitor.
include /etc/ldap/schema/core.schema
include /etc/ldap/schema/cosine.schema
include /etc/ldap/schema/dyngroup.schema
include /etc/ldap/schema/krb5-kdc.schema
include /etc/ldap/schema/inetorgperson.schema
include /etc/ldap/schema/misc.schema
include /etc/ldap/schema/nis.schema
include /etc/ldap/schema/eduperson.schema
include /etc/ldap/schema/stanford-oids.schema
include /etc/ldap/schema/suacct.schema
include /etc/ldap/schema/superson.schema
include /etc/ldap/schema/suapplication.schema
include /etc/ldap/schema/suorg.schema
include /etc/ldap/schema/eduorg.schema
include /etc/ldap/schema/suworkgroup.schema
allow bind_v2
TLSCertificateFile /etc/ssl/certs/server.pem
TLSCertificateKeyFile /etc/ssl/private/server.key
TLSCACertificateFile /etc/ssl/certs/comodo-entrust-2012.pem
include /etc/ldap/slapd.acl.global
pidfile /var/run/slapd.pid
argsfile /var/run/slapd.args
defaultsearchbase "dc=stanford,dc=edu"
gentlehup off
loglevel stats
threads 8
tool-threads 2
sasl-realm stanford.edu
authz-policy both
authz-regexp uid=(.*)(a)ms.stanford.edu,cn=stanford.edu,cn=gssapi,cn=auth
ldap:///cn=service-ms,cn=Applications,dc=stanford,dc=edu??sub?krb5PrincipalName=$1@MS.STANFORD.EDU
authz-regexp uid=(.*)/cgi,cn=stanford.edu,cn=gssapi,cn=auth
ldap:///cn=cgi,cn=applications,dc=stanford,dc=edu??sub?krb5PrincipalName=$1/cgi@stanford.edu
authz-regexp uid=service/(.*),cn=stanford.edu,cn=gssapi,cn=auth
ldap:///cn=Service,cn=Applications,dc=stanford,dc=edu??sub?krb5PrincipalName=service/$1@stanford.edu
authz-regexp uid=webauth/(.*),cn=stanford.edu,cn=gssapi,cn=auth
ldap:///cn=Webauth,cn=Applications,dc=stanford,dc=edu??sub?krb5PrincipalName=webauth/$1@stanford.edu
authz-regexp uid=(.*),cn=stanford.edu,cn=gssapi,cn=auth
ldap:///uid=$1,cn=Accounts,dc=stanford,dc=edu??sub?suSeasStatus=active
reverse-lookup off
modulepath /usr/lib/ldap
moduleload back_hdb.la
moduleload back_monitor.la
moduleload valsort.la
moduleload dynlist.la
database hdb
suffix "dc=stanford,dc=edu"
rootdn "cn=manager,dc=stanford,dc=edu"
include /etc/ldap/slapd.acl.stanford
sizelimit 500
conn_max_pending_auth 2000
lastmod on
syncrepl rid=0
provider=ldap://ldap-devmaster.stanford.edu:389
bindmethod=sasl
saslmech=gssapi
realm=stanford.edu
searchbase="dc=stanford,dc=edu"
logbase="cn=accesslog"
logfilter="(&(objectClass=auditWriteObject)(reqResult=0))"
schemachecking=on
type=refreshAndPersist
retry="60 +"
syncdata=accesslog
updateref ldap://ldap-devmaster.stanford.edu
directory /var/lib/ldap
shm_key 1
dbconfig set_cachesize 1 805306368 2
dbconfig set_lg_regionmax 262144
dbconfig set_lg_bsize 2097152
dbconfig set_lg_dir /var/log/bdb
dbconfig set_lk_max_locks 6000
dbconfig set_lk_max_objects 3000
dbconfig set_lk_max_lockers 3000
dbconfig set_flags DB_LOG_AUTOREMOVE
checkpoint 1024 5
cachesize 50000
idlcachesize 50000
cachefree 1000
index_substr_any_len 3
index default eq
index cn eq,sub
index dc
index displayName eq,sub
index entryUUID
index givenName eq,sub
index homePhone eq,sub
index krb5PrincipalName
index mail eq,sub
index member pres,eq
index mobile eq,sub
index modifyTimestamp
index o
index objectClass
index pager eq,sub
index sn eq,sub,approx
... tons of indices ...
overlay valsort
valsort-attr suOrgContactStanford cn=organizations,dc=stanford,dc=edu
weighted
valsort-attr suOrgContactWorld cn=organizations,dc=stanford,dc=edu weighted
overlay dynlist
dynlist-attrset groupOfURLS memberURL member
database monitor
access to dn.subtree="cn=monitor"
by * read
sizelimit 500
--Quanah
--
Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra :: the leader in open source messaging and collaboration
> --On Thursday, April 30, 2009 7:34 PM +0000 quanah(a)zimbra.com wrote:
>
>> Full_Name: Quanah Gibson-Mount
>> Version: 2.4.16
>> OS: Linux 2.6
>> URL: ftp://ftp.openldap.org/incoming/
>> Submission from: (NULL) (75.111.29.239)
>
> Cache notes for this ITS and ITS#6086:
>
> # Entries to cache in memory
> cachesize 200000
>
> # IDL Entries to cache in memory
> idlcachesize 200000
>
> # Entries to free up when cache gets full
> cachefree 5000
Are you using slapo-rwm? Can you post the configuration?
p.