Summarizing some discussions from IRC...
The hardcoded limit on the size of the dirty page list in a transaction is a problem, there should not be limits on the effective size of a transaction.
The plan is to change LMDB's disk page format to include the txnID in the page header. This way, when the dirty page list gets full we can flush it to disk without losing track of which pages were dirtied. Then if a subsequent access in the same txn revisits one of these pages, when we read it back from the DB we'll know that it came from the current txn and doesn't need to be copied again before making further modifications.
The P_DIRTY bit in the page header will no longer be needed - if the txnID matches, the page can be used directly. If not, the page is clean and a new page must be allocated before writing.
For WRITEMAP mode the dirty page list can be completely eliminated, the only reason we keep it now is to know which pages' P_DIRTY bit we need to clear at commit time.
Increasing the size of the page header by 8 bytes is a bit annoying, this will require a full slapcat/slapadd reload of existing back-mdb databases. It would be nice if we can avoid this but I don't see how.
Howard Chu wrote:
Summarizing some discussions from IRC...
The hardcoded limit on the size of the dirty page list in a transaction is a problem, there should not be limits on the effective size of a transaction.
The plan is to change LMDB's disk page format to include the txnID in the page header. This way, when the dirty page list gets full we can flush it to disk without losing track of which pages were dirtied. Then if a subsequent access in the same txn revisits one of these pages, when we read it back from the DB we'll know that it came from the current txn and doesn't need to be copied again before making further modifications.
The P_DIRTY bit in the page header will no longer be needed - if the txnID matches, the page can be used directly. If not, the page is clean and a new page must be allocated before writing.
For WRITEMAP mode the dirty page list can be completely eliminated, the only reason we keep it now is to know which pages' P_DIRTY bit we need to clear at commit time.
Increasing the size of the page header by 8 bytes is a bit annoying, this will require a full slapcat/slapadd reload of existing back-mdb databases. It would be nice if we can avoid this but I don't see how.
We went with an alternate approach that didn't require a disk format change. The code in mdb.master has been tested with a variety of large transactions and is working well.