Full_Name: Ryan Tandy Version: 2.4, master OS: Debian URL: Submission from: (NULL) (24.68.37.4) Submitted by: ryan
Forwarding from a Debian bug report: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=816294
Configure a BDB or HDB database with syncprov:
dn: olcDatabase={1}hdb,cn=config objectClass: olcHdbConfig olcDbDirectory: data olcSuffix: dc=example,dc=comD%D
dn: olcOverlay={0}syncprov,olcDatabase={1}hdb,cn=config objectClass: olcSyncProvConfig
perform some kind of modification to the database (so that a syncprov checkpoint is pending), and perform an online olcDbConfig change that reopens the database.
dn: olcDatabase={1}hdb,cn=config changetype: modify replace: olcDbConfig olcDbConfig: set_cachesize 1 0 1
Reopening the database fails:
56e76e17 bdb(dc=example,dc=com): BDB4511 Error: closing the transaction region with active transactions 56e76e17 bdb_db_close: database "dc=example,dc=com": close failed: Invalid argument (22)
and slapd crashes shortly after, when it tries to syncprov_checkpoint while the database is already gone.
What appears to be happening is that "ctx" is different between bdb_reader_get and bdb_reader_flush in this case.
During a normal slapd startup and shutdown:
(gdb) thread apply all frame
Thread 1 (Thread 0x7ffff7fed700 (LWP 21570)): #0 hdb_reader_get (op=0x7fffffffd8d0, env=0xa93fa0, txn=0x7fffffffd610) at cache.c:1666 1666 if ( !ctx ) { (gdb) p ctx $1 = (void *) 0x8b34a0 <ldap_int_main_thrctx>
[ ... killall slapd ... ]
(gdb) thread apply all frame
Thread 1 (Thread 0x7ffff7fed700 (LWP 21570)): #0 hdb_reader_flush (env=0xa93fa0) at cache.c:1643 1643 if ( !ldap_pvt_thread_pool_getkey( ctx, env, &data, NULL ) ) { (gdb) p ctx $2 = (void *) 0x8b34a0 <ldap_int_main_thrctx>
In this case, the readers are cleared correctly.
Another startup, this time the hdb_db_close is triggered by performing an olcDbConfig change:
(gdb) thread apply all frame
Thread 1 (Thread 0x7ffff7fed700 (LWP 21624)): #0 hdb_reader_get (op=0x7fffffffd8d0, env=0xa93fa0, txn=0x7fffffffd610) at cache.c:1666 1666 if ( !ctx ) { (gd2929 p ctx $1 = (void *) 0x8b34a0 <ldap_int_main_thrctx>
[ ... ldapmodify ... ]
(gdb) thread apply all frame
Thread 3 (Thread 0x7ffff362e700 (LWP 21633)): #0 hdb_reader_flush (env=0xa93fa0) at cache.c:1643 1643 if ( !ldap_pvt_thread_pool_getkey( ctx, env, &data, NULL ) ) {
Thread 2 (Thread 0x7ffff3e2f700 (LWP 21631)): #0 0x00007ffff732d4d3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:84 84 ../sysdeps/unix/syscall-template.S: No such file or directory.
Thread 1 (Thread 0x7ffff7fed700 (LWP 21624)): #0 0x00007ffff75f06dd in pthread_join (threadid=140737285125888, thread_return=0x0) at pthread_join.c:90 90 pthread_join.c: No such file or directory. (gdb) p ctx $2 = (void *) 0x7ffff362dbf0
This time we have a different ctx, so the readers are not cleared. This when we get to db->close there is still an active txn.
The comment "free up any keys used by the main thread" seems to assume bdb_reader_flush will be called on the main thread only.