Hi Howard, I've now got lmdb working with powerdns in place of kyoto - nice and easy to do thanks! Maximum DNS query load is a little better - about 10-30% depending on use-case, but for us the main gain is that you can have a writer going on at the same time - I was struggling a bit with how to push updates from a different process using kyoto. There's a few issues and things I'd like to comment on though:
1) Can you update documentation to explain what happens when I do a mdb_cursor_del() ? I am assuming it advances the cursor to the next record (this seems to be the behaviour). However there is some sort of bug with this assumption. Basically I have a loop which jumps (MDB_SET_RANGE) to a key and then wants to do a delete until key is like something else. So I do while(..) { mdb_cursor_del(), mdb_cursor_get(..., MDB_GET_CURRENT)}. This works fine mostly, but roughly 1% of the time I get EINVAL returned when I try to MDB_GET_CURRENT after a delete. This always seems to happen on the same records - not sure about the memory structure but could it be something to do with hitting a page boundary somehow invalidating the cursor? At the moment I just catch that and then do an MDB_NEXT to skip over them but this will be an issue for us on live. This is from perl so it /may/ be that, or the version of lmdb that is shipped with it however the perl layer is a very thin wrapper and looking at the code I can only think it comes from lmdb.
2) Currently, because kyoto cabinet didn't have support for multiple identical keys we don't use the DUP options. This leads to quite long keys (1200-1300 bytes in some cases). In the future, it would be nice to have a run-time keylength specifier or something along those lines.
3) Perhaps a mdb_cursor_get_key() function (like kyoto) which doesn't return the data (just the key). As in (2) we store all the data in the key - not sure how much of a performance difference this would make though
4) Creating database with non-sequential keys is very bad (on 4gb databases, 2* slower than kyoto - about 1h30 and uses more memory). I spent quite a bit of time looking at this in perl and then C. Basically I create a database, open 1 txn and then insert a bunch of unordered keys. Up to about 500mb it's fine and nice and quick - from perl about 75k inserts/sec (slow mostly because it's reading from mysql). However after than first 500mb it starts flushing to disk. In a sequential insert case the flush is very quick - 100-200mb/sec or so. However on non-sequential insert I've seen it drop to like 4 or 5mb/sec as it's writing data all over the disk rather than big sequential writes. iostat shows the same ~200tps of write, 100% usage but only 4-10mb/sec of bytes being written.
However, even when it's not flushing (or when storing data on SSD or memdisk), after the first 500mb performance massively drops off to perhaps 10-15k inserts/sec. At the same time, looking at `top`, once the residential memory hits about 500mb, the 'shared memory' starts being used and residential just keeps on increasing. I'm not sure if this is some kind of kernel accounting thing to do with mmap usage but it doesn't happen for sequential key inserts (for those, shared mem stays around 0, residential stays 500mb). I'm using centos 6 with various different kernels from default to 3.7.5 and the behaviour is the same. I don't really know how to go about looking for the root cause of this but I'm pretty sure that whilst the IO is crippling it in places there is something else going on of which the shared memory increase is a sign. I've tried using the WRITEMAP option too which doesn't seem to affect anything significantly in terms of performance or memory usage.
5) pkgconfig/rpms would be really nice to have. Or do you expect it to just be bundled with a project as eg the perl module does?
Thanks,
Mark