opensource@gmx-topmail.de wrote:
Full_Name: Markus Version: OS: URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (91.64.106.220)
We built a database with LMDB and are very pleased with it in general. The one area giving our users headaches regularly are bulk deletes. But some context first:
Like many others, we use a hierarchical key notation. Some simplified examples: "user/00001", "user/00002", "user/00003", ...; "usergroup/00001", "usergroup/00002", ....
Or, let's say data by timestamp: "sensordata/32489732", "sensordata/32489733", ...
Use case like "remove all users" or "remove all sensordata entries from time x to y" are usual. Those (transactional) operations call mdb_cursor_del() on each entry. This runs fine until we have LOTS of entries to delete. At a magnitude of 100,000s of entries, two things happen: 1) the time it takes to delete gets quite noticeable and 2) the 32 bit (MDB_VL32) will run into MDB_TXN_FULL (see also #8813).
Given the key structure, those operations operate on consecutive keys. Because of that, can we make this more efficient (and get rid MDB_TXN_FULL on the fly)?
A high number of individual deletes involves lots of page merging (and a lot of dirty pages). If we had a specialized "range delete" this could likely be avoided. The signature of the proposed functionality could look like this:
The number of pages dirtied will be the same, whether you batch or delete one by one. As such, this won't solve the MDB_TXN_FULL problem. Also, as of commit 7edf504106c61639a89b9a4e5987242598196932 this problem should have already been removed for MDB_VL32.
int mdb_cursor_del_range(MDB_cursor *mc, MDB_val *key_start, MDB_val *key_stop, unsigned int flags)
Given key_start and key_stop, this new function could decide to prune entire leaf pages, or even entire sub branches. This should result in a significant efficiency/performance gain. The challenge I'm seeing here is balancing the tree though.
Right, rebalancing would become quite a pain. A similar problem exists for PUT_MULTIPLE which is why it just internally loops over one item at a time.
What do you think? I'm not sure if OpenLdap would profit here, but I'm quite certain that LMDB as a general database would. I'm also open to contribute if you are willing to give some guidance.