Luke Kenneth Casson Leighton wrote:
On Thu, Sep 11, 2014 at 11:37 AM, Howard Chu <hyc(a)symas.com>
> Luke Kenneth Casson Leighton wrote:
>> hi all,
>> the infamous obscure error which people are seeing only very
>> infrequently is rearing its head at least 2 to 3 times per day in a
>> test lab where i work. this is however a secure environment so i
>> cannot post core-dumps or any details of the application.
>> given the restrictions, what information is needed and what approach
>> is needed to debug and fix this? luckily it's happening a lot so
>> there's the possibility of a regular iterative approach.
>> the operating system(s) have been ubuntu 12.04 and also 14.04, both
>> have resulted in this obscure bug. bizarrely, this bug occurs in a
>> *single process*. it's not even multi-processing. however
>> metasync=False, sync=False, map_async=True, readahead=False and
> Use the Source, Luke.
> MDB_BAD_RSLOT is returned only one place in mdb.c and the situation is very
> specific. It means you've tried to begin a new read txn on a thread that
> already has a read txn outstanding.
... but there aren't any threads... this is literally only one
process. there are no threads involved at all. the single process is
doing writes in a txn followed by reads in a separate txn.
Technically, a single process is also a single thread.
> The API docs are pretty clear that a
> thread may only have one txn at a time.
> You need to track down whatever is creating read txns in your code and make
> sure they're being properly committed or aborted.
this is from python, and all code is done using "with env.begin .... as
there are no exceptions occurring within any blocks, and even if they
were the "with" statement calls the __exit__ function which closes the
I can't comment on anything python is doing, but it sounds like it's missing a
so, all code is as expected, hence the reason for raising it here
because this is definitely not something that should be happening.
*thinks*... there is only one possible thing that i can think of, and
it's related to using cursors. i am not calling close or del on the
txn.cursor objects within the "with" block. could it be that python's
garbage collection is somehow collecting those txn.cursor objects at
random points, interacting in some way with the current read txn?
No idea. If you're using py-lmdb it sounds like we need David Wilsom to chime
in here. In the C API there's no way a cursor could interfere with a txn, no
guesses what the python code is doing.
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/