Hi,
I have a system that fails to write to its lmdb based database. There is an assertion triggered from within mdb_txn_commit() inside mdb_page_dirty(). Further investigation shows that there is a duplicate page in a freelist entry:
~# mdb_stat -narrfff /home/skov/data/data.mdb Reader Table Status (no active readers) 0 stale readers cleared. (no active readers) Freelist Status Tree depth: 1 Branch pages: 0 Leaf pages: 1 Overflow pages: 0 Entries: 28 Transaction 128121111, 3 pages, maxspan 1 1281 1300 1468 Transaction 128121112, 6 pages, maxspan 1 [bad sequence] 229 229 569 1317 1444 1460 Transaction 128121113, 5 pages, maxspan 1 ...
What could trigger such a duplicate entry and what could I try to do to figure out the root cause of the issue?
Is there a recommended way of detecting or handling such errors at runtime?
Thanks a lot for any pointers.
Best regards, Christian Wendt
Further investigation shows that the duplicate page 229 has been a P_LEAF page with freelist entries before:
0e5000 e5 00 00 00 00 00 02 00 44 00 14 0c 14 0c dc 0f ... 0e5fd0 e1 03 00 00 38 01 00 00 ab 00 00 00 18 00 00 00 0e5fe0 00 00 04 00 fd f8 a2 07 05 00 00 00 cf 05 00 00 0e5ff0 b0 05 00 00 8b 03 00 00 81 03 00 00 eb 00 00 00
The entry starting at 0e5fdc with data length 0x18 and key length 0x04 looks very much like a freelist entry with transaction number 0x07a2f8fd = 128121085 (which is just a few transactions before the freelist entries in the mdb_stat output above.
Christian Wendt wrote:
Further investigation shows that the duplicate page 229 has been a P_LEAF page with freelist entries before:
0e5000 e5 00 00 00 00 00 02 00 44 00 14 0c 14 0c dc 0f ... 0e5fd0 e1 03 00 00 38 01 00 00 ab 00 00 00 18 00 00 00 0e5fe0 00 00 04 00 fd f8 a2 07 05 00 00 00 cf 05 00 00 0e5ff0 b0 05 00 00 8b 03 00 00 81 03 00 00 eb 00 00 00
The entry starting at 0e5fdc with data length 0x18 and key length 0x04 looks very much like a freelist entry with transaction number 0x07a2f8fd = 128121085 (which is just a few transactions before the freelist entries in the mdb_stat output above.
Check if the freelist is intact in the backup meta page.
Using the other meta page (I patched the transaction number), mdb_stat shows the freelist is already corrupted, see below.
I have uploaded the broken mdb to http://cdn.skov.com/www/CWE/data.mdb_at_crash_time
This is a database with MDB_NO_SUBDIR from a 32bit little endian system.
Reader Table Status (no active readers) 0 stale readers cleared. (no active readers) Freelist Status Tree depth: 1 Branch pages: 0 Leaf pages: 1 Overflow pages: 0 Entries: 28 Transaction 128121110, 3 pages, maxspan 1 762 1038 1452 Transaction 128121111, 5 pages, maxspan 1 48 1157 1281 1300 1468 Transaction 128121112, 6 pages, maxspan 1 [bad sequence] 229 229 569 1317 1444 1460
openldap-technical@openldap.org