Hello Everyone,
I am new to LMDB. I have couple of questions,
1) Is there anyway that I can find the total number of records in LMDB. 2) Can I access all the rows from LMDB randomly instead of sequentially. For example, if there are 10 rows in LMDB as shown below,
Key, Value K1, V1 K2, V2 K3, V3 K4, V4 K5, V5
I want to access it randomly, something like shown below
Key, Value K3, V3 K5, V5 K4, V4 K2, V2 K1, V1
I know that it is better to read sequentially from LMDB and then later randomize the records. But I have around 1 million records in LMDB, I cant upload entire data to memory at once. I am planning to read data batch wise into memory and perform some operation on it. So, I am wondering, is there anyway that I can read the data randomly from LMDB directly.
Looking forward to hear from you
- Regards, Sravan
On 09/02/15 20:16, Sravan Kumar Reddy Javaji wrote:
- Is there anyway that I can find the total number of records in LMDB.
mdb_stat -a <database>.
- Can I access all the rows from LMDB randomly instead of sequentially.
(...)
No.
I know that it is better to read sequentially from LMDB and then later randomize the records. But I have around 1 million records in LMDB, I cant upload entire data to memory at once. I am planning to read data batch wise into memory and perform some operation on it. So, I am wondering, is there anyway that I can read the data randomly from LMDB directly.
Make a random permutation of the integers [1..number of records]. Walk the DB with mdb_cursor_get:MDB_<FIRST/NEXT>, associate each record with an ID from the permutation. Or something like that.
To avoid massacring your cache, avoid following the data.mv_data pointer at this stage. (Only relevant when nodes are > 1/2 OS page so the data items are stored in overflow pages rather than next to the keys.) Unless you preprocess your entries and write them to a file at this stage, then just record (file position, size).
Now process your records ordered by ID, that'll be your random walk.
Don't know what "associate a record with an ID" will be for you. If you have a read-only copy of your database, maybe just build a 32 Mbyte array of (offset of key, size, offset of data, size) for each record, save that to a file, and bypass LMDB. Offsets relative to MDB_envinfo.me_mapaddr. Otherwise, maybe build a named database with {key = record ID, data = original key}.
I wrote:
(...) Otherwise, maybe build a named database with {key = record ID, data = original key}.
Whoops, I meant provided the other data is also in named database(s). Database names are inserted as keys in the database with name=NULL, which is why NULL database and named database should not be mixed.
Thanks everyone for your help.
I am working on the below technique....
*Make a random permutation of the integers [1..number of records]. Walk the DB with mdb_cursor_get:MDB_<FIRST/NEXT>, associate each record with an ID from the permutation. Or something like that.*
- Regards, Sravan
On Tue, Feb 10, 2015 at 3:12 PM, Hallvard Breien Furuseth < h.b.furuseth@usit.uio.no> wrote:
I wrote:
(...) Otherwise, maybe build a named database with {key = record ID, data = original key}.
Whoops, I meant provided the other data is also in named database(s). Database names are inserted as keys in the database with name=NULL, which is why NULL database and named database should not be mixed.
openldap-technical@openldap.org