luben karavelov wrote:
masarati@aero.polimi.it wrote:
Thanks for collecting this info. The valgrind output could be of some use, but unfortunately I don't have time right now to set up a working RDBMS and extensively debug things. I'll keep this on my todo list.
You should please re-run valgrind with --num-callers=30 or more, because in some cases errors are in too nested functions to get a clear idea of whether the issue is caused by garbage fed by slapd/back-sql or by errors inside the RDBMS/ODBC layers. The fact that valgrind systematically complains about internals of the RDBMS/ODBC reading past the end of memory chunks malloc'ed by slapd could be related to passing some non-nul terminated bervals that are dealt with as strings. Having a longer call stack could help tracking those occurrences. However, those issues should not be critical, since there's no invalid writes.
Also, you should walk through the list of attributes being returned, to provide a hint about whether back-sql is computing a screwed attrlist or so. Along the lines of your current gdb session, you should get to frame #5, refresh_merge() in pcache.c, and print *e->e_attrs, *e->e_attrs->a_desc, *e->e_attrs->a_vals[0]; then move to e->e_attrs->a_next and repeat the prints to the end of the list. The fact you get a value of "a" equal to 0x500000000 looks definitely odd to me, as that attr list should result from be_entry_get_rw(), which in turn should collect it from the local database. Unless valgrind reveals some oddity in back-sql, the behavior you notice should not depend on the specific remote database you're using, but rather from the local one.
p.
Hello, Tomorrow I will make a setup with pure sql process and a pure pcache daemon that reads from the first over unix domain socket. In this manner it will be clear if the crashing part is related to back-sql and the database drivers/ODBC manager or not.
Meanwhile, you could find the requested debugging session here: http://purgatory.spnet.net/~karavelov/attr_list/gdb-1
It seems that the "e" pointer is corrupted.
Good catch.
Tomorrow I will start it through valgrind with more back-frames as requested
Another quick check you could probably do relatively quickly is zero out that "e" pointer before calling be_entry_get_rw() within refresh_merge().
Thanks, p.