Mark Zealey wrote:
Hi Howard, I've now got lmdb working with powerdns in place of kyoto - nice and easy to do thanks! Maximum DNS query load is a little better - about 10-30% depending on use-case, but for us the main gain is that you can have a writer going on at the same time - I was struggling a bit with how to push updates from a different process using kyoto. There's a few issues and things I'd like to comment on though:
- Can you update documentation to explain what happens when I do a
mdb_cursor_del() ? I am assuming it advances the cursor to the next record (this seems to be the behaviour). However there is some sort of bug with this assumption. Basically I have a loop which jumps (MDB_SET_RANGE) to a key and then wants to do a delete until key is like something else. So I do while(..) { mdb_cursor_del(), mdb_cursor_get(..., MDB_GET_CURRENT)}. This works fine mostly, but roughly 1% of the time I get EINVAL returned when I try to MDB_GET_CURRENT after a delete. This always seems to happen on the same records - not sure about the memory structure but could it be something to do with hitting a page boundary somehow invalidating the cursor?
That's exactly what it does, yes.
At the moment I just catch that and then do an MDB_NEXT to skip over them but this will be an issue for us on live. This is from perl so it /may/ be that, or the version of lmdb that is shipped with it however the perl layer is a very thin wrapper and looking at the code I can only think it comes from lmdb.
- Currently, because kyoto cabinet didn't have support for multiple
identical keys we don't use the DUP options. This leads to quite long keys (1200-1300 bytes in some cases). In the future, it would be nice to have a run-time keylength specifier or something along those lines.
I don't foresee that ever happening. The max keysize will always be constrained such that two nodes fit on a page. But we've added the get_maxkeysize() function so that in the future we can increase the limit, there's really no technical reason why it needs to be stuck at 511 bytes.
- Perhaps a mdb_cursor_get_key() function (like kyoto) which doesn't
return the data (just the key). As in (2) we store all the data in the key - not sure how much of a performance difference this would make though
Two answers: In mdb_cursor_get, the data param can be NULL if you don't want the data. Also, since LMDB is zero-copy, all it's doing is storing a pointer value anyway, so the cost difference of returning the data is pretty much nil.
- Creating database with non-sequential keys is very bad (on 4gb
databases, 2* slower than kyoto - about 1h30 and uses more memory). I spent quite a bit of time looking at this in perl and then C. Basically I create a database, open 1 txn and then insert a bunch of unordered keys. Up to about 500mb it's fine and nice and quick - from perl about 75k inserts/sec (slow mostly because it's reading from mysql). However after than first 500mb it starts flushing to disk. In a sequential insert case the flush is very quick - 100-200mb/sec or so. However on non-sequential insert I've seen it drop to like 4 or 5mb/sec as it's writing data all over the disk rather than big sequential writes. iostat shows the same ~200tps of write, 100% usage but only 4-10mb/sec of bytes being written.
However, even when it's not flushing (or when storing data on SSD or memdisk), after the first 500mb performance massively drops off to perhaps 10-15k inserts/sec. At the same time, looking at `top`, once the residential memory hits about 500mb, the 'shared memory' starts being used and residential just keeps on increasing. I'm not sure if this is some kind of kernel accounting thing to do with mmap usage but it doesn't happen for sequential key inserts (for those, shared mem stays around 0, residential stays 500mb). I'm using centos 6 with various different kernels from default to 3.7.5 and the behaviour is the same. I don't really know how to go about looking for the root cause of this but I'm pretty sure that whilst the IO is crippling it in places there is something else going on of which the shared memory increase is a sign. I've tried using the WRITEMAP option too which doesn't seem to affect anything significantly in terms of performance or memory usage.
None of the memory behavior you just described makes any sense to me. LMDB uses a shared memory map, exclusively. All of the memory growth you see in the process should be shared memory. If it's anywhere else then I'm pretty sure you have a memory leak. With all the valgrind sessions we've run I'm also pretty sure that *we* don't have a memory leak.
As for the random I/O, it also seems a bit suspect. Are you doing a commit on every key, or batching multiple keys per commit?
- pkgconfig/rpms would be really nice to have. Or do you expect it to
just be bundled with a project as eg the perl module does?
The OpenLDAP Project releases source code, period. Distros do whatever they do. FreeBSD and Debian have LMDB packages now; if you want RPMs I suggest you ask your distro provider.