Luke Kenneth Casson Leighton wrote:
On Thu, Sep 11, 2014 at 11:37 AM, Howard Chu hyc@symas.com wrote:
Luke Kenneth Casson Leighton wrote:
hi all,
the infamous obscure error which people are seeing only very infrequently is rearing its head at least 2 to 3 times per day in a test lab where i work. this is however a secure environment so i cannot post core-dumps or any details of the application.
given the restrictions, what information is needed and what approach is needed to debug and fix this? luckily it's happening a lot so there's the possibility of a regular iterative approach.
the operating system(s) have been ubuntu 12.04 and also 14.04, both have resulted in this obscure bug. bizarrely, this bug occurs in a *single process*. it's not even multi-processing. however metasync=False, sync=False, map_async=True, readahead=False and writemap=True.
Use the Source, Luke.
:)
MDB_BAD_RSLOT is returned only one place in mdb.c and the situation is very specific. It means you've tried to begin a new read txn on a thread that already has a read txn outstanding.
... but there aren't any threads... this is literally only one process. there are no threads involved at all. the single process is doing writes in a txn followed by reads in a separate txn.
Technically, a single process is also a single thread.
The API docs are pretty clear that a thread may only have one txn at a time.
You need to track down whatever is creating read txns in your code and make sure they're being properly committed or aborted.
this is from python, and all code is done using "with env.begin .... as txn:"
there are no exceptions occurring within any blocks, and even if they were the "with" statement calls the __exit__ function which closes the transaction.
I can't comment on anything python is doing, but it sounds like it's missing a step...
so, all code is as expected, hence the reason for raising it here because this is definitely not something that should be happening.
*thinks*... there is only one possible thing that i can think of, and it's related to using cursors. i am not calling close or del on the txn.cursor objects within the "with" block. could it be that python's garbage collection is somehow collecting those txn.cursor objects at random points, interacting in some way with the current read txn?
No idea. If you're using py-lmdb it sounds like we need David Wilsom to chime in here. In the C API there's no way a cursor could interfere with a txn, no guesses what the python code is doing.