Hi,
I am considering using LMDB to store a dynamic graph structure, and it will be very helpful to support "append to value" having the same semantics as "append" in Kyoto Cabinet. http://fallabs.com/kyotocabinet/api/classkyotocabinet_1_1BasicDB.html#a23e77...
For instance, if I were to store an adjacency list representation of a graph in LMDB, I would use vertex ID as the key, and the value is a vector of 64-bit IDs. To insert an edge into the graph will simply require appending 8 bytes to the end of the vector. Re-reading and re-writing the entire value will be somewhat excessive (and will actually change the asymptotic runtime of the insertion).
This is also one of those situations where a batch append might be useful. For instance, if I have an edge list on file which is too large to retain in memory. To convert it to an adjacency list in LMDB will require batch-append. Batch-writes are insufficient since that will require me to cache at least, large parts of the adjacency list in memory, and the original edge list could be ordered arbitrarily.
The actual graph representation I am thinking of is somewhat more intelligent than a straight adjacency list, but similar concepts hold.
I have tried looking through the lmdb source to try to implement append myself, but the put logic is somewhat complicated...
Yucheng
Yucheng Low wrote:
Hi,
I am considering using LMDB to store a dynamic graph structure, and it will be very helpful to support "append to value" having the same semantics as "append" in Kyoto Cabinet. http://fallabs.com/kyotocabinet/api/classkyotocabinet_1_1BasicDB.html#a23e77...
For instance, if I were to store an adjacency list representation of a graph in LMDB, I would use vertex ID as the key, and the value is a vector of 64-bit IDs. To insert an edge into the graph will simply require appending 8 bytes to the end of the vector. Re-reading and re-writing the entire value will be somewhat excessive (and will actually change the asymptotic runtime of the insertion).
This is also one of those situations where a batch append might be useful. For instance, if I have an edge list on file which is too large to retain in memory. To convert it to an adjacency list in LMDB will require batch-append. Batch-writes are insufficient since that will require me to cache at least, large parts of the adjacency list in memory, and the original edge list could be ordered arbitrarily.
The actual graph representation I am thinking of is somewhat more intelligent than a straight adjacency list, but similar concepts hold.
Based solely on the description here, I would use MDB_DUPFIXED and MDB_APPENDDUP or MDB_MULTIPLE. You could later use MDB_GET_MULTIPLE to read the values and reconstruct the vector.
I see, correct me if I misunderstood: you are suggesting to use duplicate keys to store the vector? In other words, if I have an neighbor list for vertex 5, comprising of values [1,2,3], I will store it as 3 duplicate values for key 5.
Also, does the LMDB store the key just once for each duplicate value? Or multiple times once for each value.
Thanks, Yucheng
On Tue, Mar 26, 2013 at 11:35 AM, Howard Chu hyc@symas.com wrote:
Yucheng Low wrote:
Hi,
I am considering using LMDB to store a dynamic graph structure, and it will be very helpful to support "append to value" having the same semantics as "append" in Kyoto Cabinet. http://fallabs.com/**kyotocabinet/api/**classkyotocabinet_1_1BasicDB.** html#**a23e776e5bd1e3c5caa0f62edffb87**a54http://fallabs.com/kyotocabinet/api/classkyotocabinet_1_1BasicDB.html#a23e776e5bd1e3c5caa0f62edffb87a54
For instance, if I were to store an adjacency list representation of a graph in LMDB, I would use vertex ID as the key, and the value is a vector of 64-bit IDs. To insert an edge into the graph will simply require appending 8 bytes to the end of the vector. Re-reading and re-writing the entire value will be somewhat excessive (and will actually change the asymptotic runtime of the insertion).
This is also one of those situations where a batch append might be useful. For instance, if I have an edge list on file which is too large to retain in memory. To convert it to an adjacency list in LMDB will require batch-append. Batch-writes are insufficient since that will require me to cache at least, large parts of the adjacency list in memory, and the original edge list could be ordered arbitrarily.
The actual graph representation I am thinking of is somewhat more intelligent than a straight adjacency list, but similar concepts hold.
Based solely on the description here, I would use MDB_DUPFIXED and MDB_APPENDDUP or MDB_MULTIPLE. You could later use MDB_GET_MULTIPLE to read the values and reconstruct the vector.
-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/**project/http://www.openldap.org/project/
Yucheng Low wrote:
I see, correct me if I misunderstood: you are suggesting to use duplicate keys to store the vector? In other words, if I have an neighbor list for vertex 5, comprising of values [1,2,3], I will store it as 3 duplicate values for key 5.
Also, does the LMDB store the key just once for each duplicate value? Or multiple times once for each value.
The key is only stored once. Duplicating the key would be stupid. Also DUPFIXED values are stored contiguously, with no headers or other overhead. It will be effectively the same amount of storage as storing your single record.
Thanks, Yucheng
On Tue, Mar 26, 2013 at 11:35 AM, Howard Chu <hyc@symas.com mailto:hyc@symas.com> wrote:
Yucheng Low wrote: Hi, I am considering using LMDB to store a dynamic graph structure, and it will be very helpful to support "append to value" having the same semantics as "append" in Kyoto Cabinet. http://fallabs.com/__kyotocabinet/api/__classkyotocabinet_1_1BasicDB.__html#__a23e776e5bd1e3c5caa0f62edffb87__a54 <http://fallabs.com/kyotocabinet/api/classkyotocabinet_1_1BasicDB.html#a23e776e5bd1e3c5caa0f62edffb87a54> For instance, if I were to store an adjacency list representation of a graph in LMDB, I would use vertex ID as the key, and the value is a vector of 64-bit IDs. To insert an edge into the graph will simply require appending 8 bytes to the end of the vector. Re-reading and re-writing the entire value will be somewhat excessive (and will actually change the asymptotic runtime of the insertion). This is also one of those situations where a batch append might be useful. For instance, if I have an edge list on file which is too large to retain in memory. To convert it to an adjacency list in LMDB will require batch-append. Batch-writes are insufficient since that will require me to cache at least, large parts of the adjacency list in memory, and the original edge list could be ordered arbitrarily. The actual graph representation I am thinking of is somewhat more intelligent than a straight adjacency list, but similar concepts hold. Based solely on the description here, I would use MDB_DUPFIXED and MDB_APPENDDUP or MDB_MULTIPLE. You could later use MDB_GET_MULTIPLE to read the values and reconstruct the vector. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/__project/ <http://www.openldap.org/project/>
openldap-technical@openldap.org