OK, that seems to be working. Thanks for the tip!
One thing that hung me up along the way: in my pruning, I am using a cursor to iterate through the records and then mdb_cursor_del() to chop dead ones. mdb_cursor_del() seems to set the cursor to the next record, so I'm actually starting at the last record and moving backward through the entries with mdb_cursor_get(…, MDB_PREV). So far so good.
The problem is that mdb_cursor_get(…, MDB_PREV) will continue to return 0 when there are no records left in the database. Obviously this is easy to work around, but it seems like a bug to me.
Thanks so much for your assistance, happy to have this simplified version working.
Jeremy
Am 11.06.2013 um 22:16 schrieb Howard Chu hyc@symas.com:
Jeremy Bernstein wrote:
Thanks Howard,
OK, so I tried this again with a slightly more modest toy database (after reading the presentation, thanks), 1MB (256 pages). Blasting a bunch of records into it at once (with a transaction grain of 100 records) I am getting MDB_MAP_FULL with 1 branch, 115 leaf and 0 overflow nodes. So I suppose that I can use 1/3 of the database size (85 leaf pages in this example) as a rough guideline as to when I should prune. My real database are between 4 and 128MB, 32MB being typical and my real transactions are generally a bit smaller.
Does that seem reasonable to you, or do I need to be working on a different scale entirely?
I doubt that the cutover point will scale as linearly as that, you should just experiment further with your real data.
Jeremy
Am 11.06.2013 um 20:11 schrieb Howard Chu hyc@symas.com:
Your entire mapsize was only 64K, 16 pages? That's not going to work well. Please read the LMDB presentations to understand why not. Remember that in addition to the main data pages, there is also a 2nd DB maintaining a list of old pages, and since LMDB uses copy-on-write every single write you make is going to dirty multiple pages, and dirty pages cannot be reused until 2 transactions after they were freed. So you need enough free space in the map to store ~3 copies of your largest transaction, in addition to the static data.
Thanks Jeremy
Am 11.06.2013 um 19:32 schrieb Howard Chu hyc@symas.com:
Jeremy Bernstein wrote:
Although I didn't figure out a good way to do what I want, this is what I am now doing:
if (MDB_MAP_FULL while putting) { abort txn, close the database reopen the database @ larger mapsize perform some pruning of dead records commit txn, close the database reopen the database @ old mapsize try to put again }
At this point, the database is probably larger than the old mapsize. To handle that, I make a copy of the DB, kill the original, open a new database and copy the records from the old DB to the new one.
All of this is a lot more complicated and code-verbose than I want, but it works and seems to be reliable.
Nevertheless, if there's an easier way, I'm all ears. Thanks for your thoughts.
Use mdb_stat() before performing the _put(). If the total number of pages in use is large (whatever threshold you choose, e.g. 90%) then start pruning.
Look at the mdb_stat command's output to get an idea of what you're looking for.
-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/