Full_Name: Markus Version: OS: URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (91.64.106.220)
We built a database with LMDB and are very pleased with it in general. The one area giving our users headaches regularly are bulk deletes. But some context first:
Like many others, we use a hierarchical key notation. Some simplified examples: "user/00001", "user/00002", "user/00003", ...; "usergroup/00001", "usergroup/00002", ....
Or, let's say data by timestamp: "sensordata/32489732", "sensordata/32489733", ...
Use case like "remove all users" or "remove all sensordata entries from time x to y" are usual. Those (transactional) operations call mdb_cursor_del() on each entry. This runs fine until we have LOTS of entries to delete. At a magnitude of 100,000s of entries, two things happen: 1) the time it takes to delete gets quite noticeable and 2) the 32 bit (MDB_VL32) will run into MDB_TXN_FULL (see also #8813).
Given the key structure, those operations operate on consecutive keys. Because of that, can we make this more efficient (and get rid MDB_TXN_FULL on the fly)?
A high number of individual deletes involves lots of page merging (and a lot of dirty pages). If we had a specialized "range delete" this could likely be avoided. The signature of the proposed functionality could look like this:
int mdb_cursor_del_range(MDB_cursor *mc, MDB_val *key_start, MDB_val *key_stop, unsigned int flags)
Given key_start and key_stop, this new function could decide to prune entire leaf pages, or even entire sub branches. This should result in a significant efficiency/performance gain. The challenge I'm seeing here is balancing the tree though.
What do you think? I'm not sure if OpenLdap would profit here, but I'm quite certain that LMDB as a general database would. I'm also open to contribute if you are willing to give some guidance.
Thanks for your consideration, Markus