On 09/02/15 20:16, Sravan Kumar Reddy Javaji wrote:
- Is there anyway that I can find the total number of records in LMDB.
mdb_stat -a <database>.
- Can I access all the rows from LMDB randomly instead of sequentially.
(...)
No.
I know that it is better to read sequentially from LMDB and then later randomize the records. But I have around 1 million records in LMDB, I cant upload entire data to memory at once. I am planning to read data batch wise into memory and perform some operation on it. So, I am wondering, is there anyway that I can read the data randomly from LMDB directly.
Make a random permutation of the integers [1..number of records]. Walk the DB with mdb_cursor_get:MDB_<FIRST/NEXT>, associate each record with an ID from the permutation. Or something like that.
To avoid massacring your cache, avoid following the data.mv_data pointer at this stage. (Only relevant when nodes are > 1/2 OS page so the data items are stored in overflow pages rather than next to the keys.) Unless you preprocess your entries and write them to a file at this stage, then just record (file position, size).
Now process your records ordered by ID, that'll be your random walk.
Don't know what "associate a record with an ID" will be for you. If you have a read-only copy of your database, maybe just build a 32 Mbyte array of (offset of key, size, offset of data, size) for each record, save that to a file, and bypass LMDB. Offsets relative to MDB_envinfo.me_mapaddr. Otherwise, maybe build a named database with {key = record ID, data = original key}.