On Monday 20 October 2008 09:30:11 Buchan Milne wrote:
On Friday 17 October 2008 21:30:04 Quanah Gibson-Mount wrote:
--On Friday, October 17, 2008 8:28 PM +0200 Guillaume Rousse
Guillaume.Rousse@inria.fr wrote:
Quanah Gibson-Mount a écrit :
--On Friday, October 17, 2008 4:22 PM +0200 Guillaume Rousse
Guillaume.Rousse@inria.fr wrote:
Since I upgraded one of my server from 2.4.11 to 2.4.12, I'm facing heavy database issues: [root@etoile ~]# slapcat -b dc=msr-inria,dc=inria,dc=fr ... bdb(dc=msr-inria,dc=inria,dc=fr): pthread lock failed: Invalid argument bdb(dc=msr-inria,dc=inria,dc=fr): PANIC: Invalid argument bdb(dc=msr-inria,dc=inria,dc=fr): PANIC: DB_RUNRECOVERY: Fatal error, run database recovery bdb(dc=msr-inria,dc=inria,dc=fr): PANIC: fatal region error detected; run recovery bdb_db_close: database "dc=msr-inria,dc=inria,dc=fr": close failed: DB_RUNRECOVERY: Fatal error, run database recovery (-30975)
Even importing a backup ldiff file on a fresh installation triggers the same problems.
I did some basic testing (details on the Mandriva bug report), and the tools leave the database in a bad state, but slapd itself does not. Running recovery (e.g. between slapcat's) works. slapd startup will also recover the db.
I tested this problem on two different environment (mandriva 2008.1, mandriva cooker), and one user reported it against mandriva 2009.0 (https://qa.mandriva.com/show_bug.cgi?id=45034). This seems to either imply an openldap or a packaging issue.
IMHO, OpenLDAP issue. Note, all tests pass with these binaries (and must have during the build anyway for the packages to have made it through the build system), but none of the tests test whether the tools will mess up a db ...
Should I report an ITS for this, or rather provide more informations ?
What options was BDB 4.6 compiled with?
The same package that was used for 2.4.11, which did not have *any* problems.
Does it have all the patches from Oracle?
It has 4.6.21.1, but not the 4.6.21.2 patch (4.6.21.3 is irrelevant of course).
Adding 4.6.21.2 and 4.6.21.3 don't improve matters.
According to the spec file, there is one oracle and two fedora patches applied:
http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/db46/current /S PECS/db46.spec?revision=293611&view=markup
The exact option list used is a bit more difficult to tell, given the usage of conditional build options, but it seems to be: --enable-shared --enable-static --enable-rpc --enable-cxx --disable-posixmutexes --with-mutex=x86/gcc-assembly (or --with-mutex=x86_64/gcc-assembly for x86_64).
build_asmmutex defaults to 0, so it's actually:
--enable-shared --enable-static --enable-rpc --enable-cxx --with- mutex=POSIX/pthreads/library
I'd suggest rebuilding BDB with:
--enable-posixmutexes --with-mutex=POSIX/pthreads
Which would improve performance, but really shouldn't prevent slap tools from closing the database uncleanly.
and then rebuilding OpenLDAP against the new BDB build, and see if the problem persists.
I will enable the internal library copy option on a build of my OpenLDAP package, and build against both 4.6 and 4.7 ... but I suspect it will persist.
I built on RHEL5 with 4.6.21.3 with the db4internal option on my package, which builds db4 as follows: ../dist/configure --build=x86_64-redhat-linux --prefix=/usr \ --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc \ --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 \ --libexecdir=/usr/sbin --localstatedir=/var --sharedstatedir=/usr/com \ --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared \ --disable-static --with-uniquename=_openldap_slapd24_mdv \ --program-prefix=slapd2.4_ \ --with-mutex=POSIX/pthreads/library
Exactly the same behaviour.
Will post the entire build log tomorrow, and maybe test on 4.7 as well.
Regards, Buchan