Леонид Юрьев wrote:
This is ENOTRECOVERABLE error (/usr/include/asm-generic/errno.h), i.e.
Good catch, thanks.
robust mutex is in unrecoverable/corrupted state (perhaps due on of LMDB bug).
There are no instances of mutex usage in LMDB that don't check for EOWNERDEAD and properly recover.
Note the info about robust mutexes:
PTHREAD_MUTEX_ROBUST If a mutex is initialized with the PTHREAD_MUTEX_ROBUST attribute and its owner dies without unlocking it, any future attempts to call pthread_mu‐ tex_lock(3) on this mutex will succeed and return EOWNERDEAD to indicate that the original owner no longer exists and the mutex is in an inconsistent state. Usually after EOWNERDEAD is returned, the next owner should call pthread_mutex_consistent(3) on the acquired mutex to make it consistent again before using it any further.
If the next owner unlocks the mutex using pthread_mutex_unlock(3) before making it consistent, the mutex will be permanently unusable and any subsequent attempts to lock it using pthread_mutex_lock(3) will fail with the error ENOTRECOVERABLE. The only permitted operation on such a mutex is pthread_mu‐ tex_destroy(3).
If the next owner terminates before calling pthread_mutex_consistent(3), further pthread_mutex_lock(3) operations on this mutex will still return EOWN‐ ERDEAD.
The only way for the mutex to become unrecoverable is by calling pthread_mutex_unlock() on it before calling pthread_mute_consistent(), and LMDB will never do that. If the process dies before calling pthread_mutex_consistent(), the mutex state remains in the EOWNERDEAD state. LMDB never breaks this mutex protocol, so something else in the system is broken.
Regards, Leonid.
On Fri, May 29, 2020 at 9:19 PM Howard Chu hyc@symas.com wrote:
James Anderson wrote:
good evening;
i am looking for an explanation for a situation which we encountered with an lmdb database and library version is 0.9.17-3. the database's condition was such that all attempts to open it for reading failed. in at least some cases the error appears to have occurred during the operation which looked for stale leaders. a problem was also evident when attempting to copy the database:
@nl12:~# mdb_copy /srv/dydra/catalog/repositories/d2141030-9495-c040-b1a7-9e19edbeb491/ /srv/dydra/backups/public-data__rev mdb_copy: copying failed, error 131 (State not recoverable)
131 is not an LMDB error code. Most likely your underlying storage system failed.
-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/