Mark Zealey wrote:
On 22/08/13 23:37, Howard Chu wrote:
- Can you update documentation to explain what happens when I do a
mdb_cursor_del() ? I am assuming it advances the cursor to the next record (this seems to be the behaviour). However there is some sort of bug with this assumption. Basically I have a loop which jumps (MDB_SET_RANGE) to a key and then wants to do a delete until key is like something else. So I do while(..) { mdb_cursor_del(), mdb_cursor_get(..., MDB_GET_CURRENT)}. This works fine mostly, but roughly 1% of the time I get EINVAL returned when I try to MDB_GET_CURRENT after a delete. This always seems to happen on the same records - not sure about the memory structure but could it be something to do with hitting a page boundary somehow invalidating the cursor?
That's exactly what it does, yes.
Any idea about the EINVAL issue?
Yes, as I said already, it does exactly what you said. When you've deleted the last item on the page the cursor no longer points at a valid node, so GET_CURRENT returns EINVAL.
None of the memory behavior you just described makes any sense to me. LMDB uses a shared memory map, exclusively. All of the memory growth you see in the process should be shared memory. If it's anywhere else then I'm pretty sure you have a memory leak. With all the valgrind sessions we've run I'm also pretty sure that *we* don't have a memory leak.
As for the random I/O, it also seems a bit suspect. Are you doing a commit on every key, or batching multiple keys per commit?
I'm not doing *any* commits just one big txn for all the data...
The below C works fine up until i=4m (ie 500mb of residential memory shown in top), then has massive slowdown, shared memory (again, as seen in top) increases, waits about 20-30 seconds and then disks get hammered writing 10mb/sec (200txns) when they are capable of 100-200mb/sec streaming writes... Does it do the same for you?
int main(int argc,char * argv[]) { int i = 0, j = 0, rc; MDB_env *env; MDB_dbi dbi; MDB_val key, data; MDB_txn *txn; char buf[40]; int count = 100000000;
rc = mdb_env_create(&env); rc = mdb_env_set_mapsize(env, (size_t)1024*1024*1024*10); rc = mdb_env_open(env, "./testdb", 0, 0664); rc = mdb_txn_begin(env, NULL, 0, &txn); rc = mdb_open(txn, NULL, 0, &dbi); for (i=0;i<count;i++) { sprintf( buf, "blah foo %9d%9d%9d", (long)(random() *
(float)count / RAND_MAX) - i, i, i ); if( i %100000 == 0 ) printf("%s\n", buf); key.mv_size = sizeof(buf); key.mv_data = &buf; data.mv_size = sizeof(buf); data.mv_data = &buf; rc = mdb_put(txn, dbi, &key, &data, 0); } rc = mdb_txn_commit(txn); mdb_close(env, dbi);
mdb_env_close(env); return 0;
}
By the way, I've just generated our biggest database (~4.5gb) from scratch using our standard perl script. Using kyoto (treedb) with various tunings it did it in 18 min real time vs lmdb at 50 minutes (both ssd-backed in a box with 24gb free memory).
Kyoto writes async by default. You should do the same here, use MDB_NOSYNC on the env_open.