Full_Name: Hallvard B Furuseth Version: master, d861910f55cb1beb5e443caa7e961ed760074352 OS: Linux x86_64 URL: Submission from: (NULL) (195.1.106.125) Submitted by: hallvard
Aborting an MDB process can break MDB if another MDB process is running, because then locks/reader table info are not reset.
The problems below go away when the last slap process terminates so the lockfile can be reset. Sometimes kill -KILL is needed.
==== lock.conf ==== include servers/slapd/schema/core.schema database mdb suffix o=hi directory lock.dir
(A) If a process holding a shared mutex dies, other processes' ops on the mutex can hang/fail:
rm -rf lock.dir; mkdir lock.dir servers/slapd/slapd -f lock.conf -h ldapi://ldapi -d0 & MDB_KILL_R=true servers/slapd/slapd -Tcat -f lock.conf ldapsearch -xLLH ldapi://ldapi -b o=hi l=x 1.1 ^C (kills ldapsearch which is waiting for results) kill %% (fails to kill slapd) kill -KILL %%
The Posix fix, robust mutexes, is outlined below. With _WIN32, check return code WAIT_ABANDONED. __APPLE__ Posix semaphores: Don't know. SysV semaphores have SEM_UNDO to cancel the process' effect if the process dies, but the peers do not seem to be informed that this happened.
if ((rc = pthread_mutexattr_init( &mattr )) || (rc = pthread_mutexattr_setpshared(&mattr, PTHREAD_PROCESS_SHARED)) || (rc = pthread_mutexattr_setrobust( &mattr, PTHREAD_MUTEX_ROBUST )) || (rc = pthread_mutex_init(&mutex, &mattr ))) return rc; ... switch (pthread_mutex_lock(&mutex)) { case 0: break; case EOWNERDEAD: Repair the possibly half-updated data protected by <mutex>; if (successful repair) { pthread_mutex_consistent(&mutex); break; } /* With Posix mutexes, make future use of <mutex> return failure: */ pthread_mutex_unlock(&mutex); /* FALLTHRU */ default: return MDB_PANIC; } ... pthread_mutex_unlock(&mutex);
BTW, the above also happens with MDB_KILL_W=true slapcat followed by ldapadd, dunno why slapcat uses a txn without MDB_TXN_RDONLY. In this case, raise(SIGINT) is sufficient. With MDB_KILL_R, that does not kill slapd and we need SIGKILL - it catches SIGINT I guess. Don't know if that difference is intentional.
(B1) Repeated kill -KILL slap<tool> while slapd is running can use up the reader table since it does not clear its table slots, making the db unusable.
(B2) slapcat report no error after it fails to read the db due to full reader table, it exits with success status.
(B3) After one "kill -KILL slap<tool>" while slapd is running, I imagine the stale slot which does not go away can prevent freelist reuse so the db grows quickly. But I have not seen that.
bash$ rm -rf lock.dir; mkdir lock.dir bash$ servers/slapd/slapd -f lock.conf -h ldapi://ldapi -d0 & bash$ for (( i=0; i<130; i++)) { MDB_KILL_UR=true servers/slapd/slapd -Tcat -f lock.conf } > /dev/null bash$ servers/slapd/slapd -Tcat -f lock.conf -d-1 2>cat.out (success result, no output, no errors in the log) bash$ ldapsearch -xLLH ldapi://ldapi -b o=hi l=x 1.1 version: 1
[R..RLOCKED..R1] Other (e.g., implementation specific) error (80) Additional information: internal error bash$
Maybe it's easier to have a per-process reader table after all, and a shared table with just the oldest txn per process.
Or, each transaction chould have an exclusive file region lock. A writer could try to grab the oldest txn's region lock if that belongs to another process, and clear the txn it if successful. A reader, before giving up, would just search for another pid's txn whose lock it can grab.
Except since the system reuses pids, the ID would either have to be more exclusive (e.g. pid + mdb_env_open() time) or mdb_env_open must clear away reader table entries with its own pid.
Speaking of which, I don't see the use of the repeated getpid() calls. Should be enough to store the pid in the MDB_env. fcntl record locks do not survive fork(), so an MDB_env doesn't either. And the thread ID is unsed, and must be reset with memset, not 'mr_tid = 0' since pthread_t can be a struct.
(C) Mentioned earlier: Startup can fail due to a race condition, when another process starts MDB at the same time and then aborts: - Process 1 starts MDB, gets write lock in mdb_env_setup_locks(). - Process 2 starts MDB, awaits read lock since write lock is taken. - Process 1 dies. - Process 2 gets the read lock and proceeds, but with an uninitialized lockfile since it expects that process 1 initialized it.
Fix: An extra exclusive lock around the current lock setup code.
kill slapd process, if any rm -rf lock.dir; mkdir lock.dir MDB_RACE=true servers/slapd/slapd -Tcat -f lock.conf & sleep 1; \ servers/slapd/slapd -Tcat -f lock.conf
In addition, the above dumps core in mdb_env_close: for (i=0; i<env->me_txns->mti_numreaders; i++) where env->me_txns == (MDB_txninfo *) 0xffffffffffffffff.
(D) And of course ^Z when holding a mutex can block other processes' threads, but I suppose BDB has the same problem.
DB_SUSPEND=true servers/slapd/slapd -Tadd -f lock.conf < /dev/null servers/slapd/slapd -Tadd -f lock.conf < /dev/null (hanging...) ^C kill %%
The fix I can see for now is "don't do that" in the doc. MDB could in addition use pthread_mutex_timedlock() so slapd at least can keep running and can be killed without kill -KILL. Final fix = a lock-free library, when a useful portable one emerges.