Well, I went ahead and built just the minimal tcmalloc (and upgraded it to 1.5 while I was at it), since that seemed to be something that needed to be fixed anyway.  I installed it to one of our replica servers, and it ran for about 3 days, and then slapd died again this morning.

So next I'm gonna upgrade to 2.4.21.  I'll probably do it on just one of our replica servers for now.

1. A 2.4.21 replica should work fine with a 2.4.19 master, correct?
2. On the machine I upgrade, can I just stop slapd, upgrade, and start slapd again, or should I slapcat/slapadd?

Thanks!

Quanah Gibson-Mount escribió:
--On Wednesday, January 20, 2010 8:39 AM -0600 "Bryan J. Maupin" 
<bmaupin@uta.edu> wrote:

  
We're running on RHEL 5.4, with Heimdal 1.2.1-3, OpenSSL 0.9.8k,
Cyrus-SASL 2.1.23, BDB 4.7.25 (with patches), libunwind 0.99 (for Google
tcmalloc), Google tcmalloc 1.3.
    

libunwind is not required for tcmalloc, you must be building it incorrectly.

  
1. Is there any useful information that can be obtained from these log
entries, or do we simply need to change to a more verbose log level and
wait for slapd to die again?
2. If we need to change our log level, what is a suggested level?  Right
now we're using "loglevel sync stats".  Would it be wise to change the
log level to -1 (any)?  These are production servers, and I imagine
that'd be a huge performance hit.
3. Also, we're logging asynchronously at the moment.  Should we disable
this while debugging?
    

I would suggest you

(a) Upgrade to 2.4.21
(b) Fix your tcmalloc build
(c) If the problem still occurs, run slapd under gdb so you can get a 
backtrace of some kind.

Make sure your OpenLDAP build, etc, has debugging symbols.

--Quanah

--

Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra ::  the leader in open source messaging and collaboration