Леонид Юрьев wrote:
This is ENOTRECOVERABLE error (/usr/include/asm-generic/errno.h),
i.e.
Good catch, thanks.
robust mutex is in unrecoverable/corrupted state (perhaps due on of
LMDB bug).
There are no instances of mutex usage in LMDB that don't check for EOWNERDEAD and
properly recover.
Note the info about robust mutexes:
PTHREAD_MUTEX_ROBUST
If a mutex is initialized with the PTHREAD_MUTEX_ROBUST attribute and
its owner dies without unlocking it, any future attempts to call
pthread_mu‐
tex_lock(3) on this mutex will succeed and return EOWNERDEAD to indicate
that the original owner no longer exists and the mutex is in an
inconsistent
state. Usually after EOWNERDEAD is returned, the next owner should call
pthread_mutex_consistent(3) on the acquired mutex to make it consistent
again
before using it any further.
If the next owner unlocks the mutex using pthread_mutex_unlock(3) before
making it consistent, the mutex will be permanently unusable and any
subsequent
attempts to lock it using pthread_mutex_lock(3) will fail with the error
ENOTRECOVERABLE. The only permitted operation on such a mutex is
pthread_mu‐
tex_destroy(3).
If the next owner terminates before calling pthread_mutex_consistent(3),
further pthread_mutex_lock(3) operations on this mutex will still return
EOWN‐
ERDEAD.
The only way for the mutex to become unrecoverable is by calling pthread_mutex_unlock() on
it before calling pthread_mute_consistent(),
and LMDB will never do that. If the process dies before calling
pthread_mutex_consistent(), the mutex state remains in
the EOWNERDEAD state. LMDB never breaks this mutex protocol, so something else in the
system is broken.
Regards,
Leonid.
On Fri, May 29, 2020 at 9:19 PM Howard Chu <hyc(a)symas.com> wrote:
>
> James Anderson wrote:
>> good evening;
>>
>> i am looking for an explanation for a situation which we encountered with an lmdb
database and library version is 0.9.17-3.
>> the database's condition was such that all attempts to open it for reading
failed.
>> in at least some cases the error appears to have occurred during the operation
which looked for stale leaders.
>> a problem was also evident when attempting to copy the database:
>>
>> @nl12:~# mdb_copy
/srv/dydra/catalog/repositories/d2141030-9495-c040-b1a7-9e19edbeb491/
/srv/dydra/backups/public-data__rev
>> mdb_copy: copying failed, error 131 (State not recoverable)
>
> 131 is not an LMDB error code. Most likely your underlying storage system failed.
>
> --
> -- Howard Chu
> CTO, Symas Corp.
http://www.symas.com
> Director, Highland Sun
http://highlandsun.com/hyc/
> Chief Architect, OpenLDAP
http://www.openldap.org/project/
--
-- Howard Chu
CTO, Symas Corp.
http://www.symas.com
Director, Highland Sun
http://highlandsun.com/hyc/
Chief Architect, OpenLDAP
http://www.openldap.org/project/