hi all -- i have a problem with a 2-multi-master, 1-replica setup. my master servers' directories sync up and stay replicated without too many issues; however, when i start up the replica i get this message on the master that i'm sync'ing the replica from: slap_sl_malloc of 138718824 bytes failed, using ch_malloc
and, of course, the slapd dies. this is 100% repeatable.
i noticed that this has been an issue in the past (it cropped up on the mailing list around december 2007) and was curious if it's a known issue or a misconfiguration or what.
i'm running 2.4.8 on linux 2.6.18.8-32bit-5-xenU.
thanks for any insight.
k.
"AR" == Aaron Richton richton@nbcs.rutgers.edu writes:
slap_sl_malloc of 138718824 bytes failed, using ch_malloc
AR> Well, always start with the obvious: are you actually out of AR> memory?
heh. that wouldn't have been funny. i'm not out of memory though; i'm showing there's ~1.6G (real memory) free at the time of the malloc call and the swap has never been touched.
thanks for the idea though.
k.
--On April 1, 2008 4:34:32 PM -0400 kevin montuori montuori@gmail.com wrote:
"AR" == Aaron Richton richton@nbcs.rutgers.edu writes:
slap_sl_malloc of 138718824 bytes failed, using ch_malloc
AR> Well, always start with the obvious: are you actually out of AR> memory?
heh. that wouldn't have been funny. i'm not out of memory though; i'm showing there's ~1.6G (real memory) free at the time of the malloc call and the swap has never been touched.
thanks for the idea though.
Are you running as root, or as a user though? If a user, does the user have memory limits?
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
"QG" == Quanah Gibson-Mount quanah@zimbra.com writes:
QG> Are you running as root, or as a user though? If a user, does the QG> user have memory limits?
the user has no hard or soft memory limits, except for a soft 8MB stack limit.
thanks, k.
kevin montuori wrote:
"QG" == Quanah Gibson-Mount quanah@zimbra.com writes:
QG> Are you running as root, or as a user though? If a user, does the QG> user have memory limits?
the user has no hard or soft memory limits, except for a soft 8MB stack limit.
Is the issue repeatable? If it is, can you ask slapd to generate a core file, and provide a stack backtrace? See http://www.openldap.org/faq/data/cache/59.html for further instructions, and make sure you use an unstripped binary. In case, keep the core and the slapd binary 'round, since we might need to ask you to print some values from the stack.
p.
Ing. Pierangelo Masarati OpenLDAP Core Team
SysNet s.r.l. via Dossi, 8 - 27100 Pavia - ITALIA http://www.sys-net.it --------------------------------------- Office: +39 02 23998309 Mobile: +39 333 4963172 Email: pierangelo.masarati@sys-net.it ---------------------------------------
"PM" == Pierangelo Masarati ando@sys-net.it writes:
PM> Is the issue repeatable? If it is, can you ask slapd to generate a PM> core file, and provide a stack backtrace?
it is and absolutely. the results of (gdb) bt full can be found here:
http://homepage.mac.com/ignavusinfo/ldap-backtrace.txt
again, thanks for the help. k.
kevin montuori wrote:
"PM" == Pierangelo Masarati ando@sys-net.it writes:
PM> Is the issue repeatable? If it is, can you ask slapd to generate a PM> core file, and provide a stack backtrace?
it is and absolutely. the results of (gdb) bt full can be found here:
http://homepage.mac.com/ignavusinfo/ldap-backtrace.txt
again, thanks for the help.
Mmmmh, this issue definitely looks like ITS#5437 and ITS#5444; it has nothing to do with the error message in subject. Since I cannot tell whether the two issues are related, or you hit another, already pointed out issue, can you make sure you can repeat the malloc failure issue? I don't want to load you with unnecessary effort; if the real issue is the syncprov_done_ctrl()-related issue, and the malloc failure one is simply a symptom, then you probably don't need to do anything else. Please follow the discussion of the above mentioned ITS-es to find out if the erroneous behavior you see is related.
Thanks, p.
Ing. Pierangelo Masarati OpenLDAP Core Team
SysNet s.r.l. via Dossi, 8 - 27100 Pavia - ITALIA http://www.sys-net.it --------------------------------------- Office: +39 02 23998309 Mobile: +39 333 4963172 Email: pierangelo.masarati@sys-net.it ---------------------------------------
"PM" == Pierangelo Masarati ando@sys-net.it writes:
PM> Since I cannot tell whether the two issues are related, or you hit PM> another, already pointed out issue, can you make sure you can PM> repeat the malloc failure issue?
i can. that is to say, each time i start up the replica things chug along for a couple seconds and then i receive the malloc failed message followed by a segfault (both the error and segfault are on the master). i'm happy to share debug output from either the master or replica if that'd help.
PM> Please follow the discussion of the above mentioned ITS-es to find PM> out if the erroneous behavior you see is related.
note that it's a little different than ITS#5444 in that i don't have to perform any action explicitly to cause the master to segfault. it's also, unlike what's mentioned in your followup to 5444, the provider that's crashing, not the consumer.
i'll keep an eye on the two tickets. if there's any further debug information i can provide or testing i could perform, please let me know.
k.
kevin montuori wrote:
"PM" == Pierangelo Masarati ando@sys-net.it writes:
PM> Since I cannot tell whether the two issues are related, or you hit PM> another, already pointed out issue, can you make sure you can PM> repeat the malloc failure issue?
i can. that is to say, each time i start up the replica things chug along for a couple seconds and then i receive the malloc failed message followed by a segfault (both the error and segfault are on the master). i'm happy to share debug output from either the master or replica if that'd help.
PM> Please follow the discussion of the above mentioned ITS-es to find PM> out if the erroneous behavior you see is related.
note that it's a little different than ITS#5444 in that i don't have to perform any action explicitly to cause the master to segfault. it's also, unlike what's mentioned in your followup to 5444, the provider that's crashing, not the consumer.
OK. What I believe is common with those ITSes is that the crash is the result of calling syncprov_done_ctrl() with a corrupted/non-initialized cookie.
p.
Ing. Pierangelo Masarati OpenLDAP Core Team
SysNet s.r.l. via Dossi, 8 - 27100 Pavia - ITALIA http://www.sys-net.it --------------------------------------- Office: +39 02 23998309 Mobile: +39 333 4963172 Email: pierangelo.masarati@sys-net.it ---------------------------------------
Pierangelo Masarati wrote:
kevin montuori wrote:
> "PM" == Pierangelo Masaratiando@sys-net.it writes:
PM> Is the issue repeatable? If it is, can you ask slapd to generate a PM> core file, and provide a stack backtrace?
it is and absolutely. the results of (gdb) bt full can be found here:
http://homepage.mac.com/ignavusinfo/ldap-backtrace.txt
again, thanks for the help.
Mmmmh, this issue definitely looks like ITS#5437 and ITS#5444; it has nothing to do with the error message in subject. Since I cannot tell whether the two issues are related, or you hit another, already pointed out issue, can you make sure you can repeat the malloc failure issue? I don't want to load you with unnecessary effort; if the real issue is the syncprov_done_ctrl()-related issue, and the malloc failure one is simply a symptom, then you probably don't need to do anything else. Please follow the discussion of the above mentioned ITS-es to find out if the erroneous behavior you see is related.
Something doesn't make sense in this trace. In frame 10, "changed = 1" but that is an impossible value for this variable. It must be either 0 or SS_CHANGED (2). Since the code in syncprov.c line 2018 relies on the SS_CHANGED value to be set in order to initialize the cookie, and that isn't being set correctly, the syncprov_done_ctrl invocation breaks.
I suggest you try recompiling without optimization and see if the behavior changes. If it still crashes the same way, please post a new trace.
"HC" == Howard Chu hyc@symas.com writes:
HC> I suggest you try recompiling without optimization and see if the HC> behavior changes. If it still crashes the same way, please post a HC> new trace.
i've recompiled without the -O2 flags and find slapd does indeed exhibit the same behavior. there's a backtrace at:
http://homepage.mac.com/ignavusinfo/ldap-backtrace-no-optimization.txt
k.
kevin montuori wrote:
"HC" == Howard Chuhyc@symas.com writes:
HC> I suggest you try recompiling without optimization and see if the HC> behavior changes. If it still crashes the same way, please post a HC> new trace.
i've recompiled without the -O2 flags and find slapd does indeed exhibit the same behavior. there's a backtrace at:
http://homepage.mac.com/ignavusinfo/ldap-backtrace-no-optimization.txt
I believe this is now fixed in CVS HEAD. See the patch to syncprov.c rev 1.227
openldap-software@openldap.org