Re: Issues arising from creating powerdns backend based on LMDB

22 Aug 2013


      Mark Zealey wrote:
...
On 22/08/13 23:37, Howard Chu wrote:
...
...

Can you update documentation to explain what happens when I do a

mdb_cursor_del() ? I am assuming it advances the cursor to the next
record (this seems to be the behaviour). However there is some sort of
bug with this assumption. Basically I have a loop which jumps
(MDB_SET_RANGE) to a key and then wants to do a delete until key is like
something else. So I do while(..) { mdb_cursor_del(),
mdb_cursor_get(..., MDB_GET_CURRENT)}. This works fine mostly, but
roughly 1% of the time I get EINVAL returned when I try to
MDB_GET_CURRENT after a delete. This always seems to happen on the same
records - not sure about the memory structure but could it be something
to do with hitting a page boundary somehow invalidating the cursor?
That's exactly what it does, yes.
Any idea about the EINVAL issue?
Yes, as I said already, it does exactly what you said. When you've deleted the 
last item on the page the cursor no longer points at a valid node, so 
GET_CURRENT returns EINVAL.
...
...
None of the memory behavior you just described makes any sense to me.
LMDB uses a shared memory map, exclusively. All of the memory growth
you see in the process should be shared memory. If it's anywhere else
then I'm pretty sure you have a memory leak. With all the valgrind
sessions we've run I'm also pretty sure that *we* don't have a memory
leak.
As for the random I/O, it also seems a bit suspect. Are you doing a
commit on every key, or batching multiple keys per commit?
I'm not doing *any* commits just one big txn for all the data...
The below C works fine up until i=4m (ie 500mb of residential memory
shown in top), then has massive slowdown, shared memory (again, as seen
in top) increases, waits about 20-30 seconds and then disks get hammered
writing 10mb/sec (200txns) when they are capable of 100-200mb/sec
streaming writes... Does it do the same for you?
int main(int argc,char * argv[]) {
      int i = 0, j = 0, rc;
      MDB_env *env; MDB_dbi dbi; MDB_val key, data; MDB_txn *txn; char
buf[40];
      int count = 100000000;
      rc = mdb_env_create(&env);
      rc = mdb_env_set_mapsize(env, (size_t)1024*1024*1024*10);
      rc = mdb_env_open(env, "./testdb", 0, 0664);
      rc = mdb_txn_begin(env, NULL, 0, &txn);
      rc = mdb_open(txn, NULL, 0, &dbi);

      for (i=0;i<count;i++) {
          sprintf( buf, "blah foo %9d%9d%9d", (long)(random() *

(float)count / RAND_MAX) - i, i, i );
              if( i %100000 == 0 )
                  printf("%s\n", buf);
              key.mv_size = sizeof(buf); key.mv_data = &buf;
              data.mv_size = sizeof(buf); data.mv_data = &buf;
              rc = mdb_put(txn, dbi, &key, &data, 0);
          }
          rc = mdb_txn_commit(txn);
          mdb_close(env, dbi);
      mdb_env_close(env);

  return 0;

}
By the way, I've just generated our biggest database (~4.5gb) from
scratch using our standard perl script. Using kyoto (treedb) with
various tunings it did it in 18 min real time vs lmdb at 50 minutes
(both ssd-backed in a box with 24gb free memory).
Kyoto writes async by default. You should do the same here, use MDB_NOSYNC on 
the env_open.
-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: Issues arising from creating powerdns backend based on LMDB