Hi,
Currently I am creating support for using LMDB as a new storage backend for one of our products. At the moment I am testing import bulk data into lmdb using transactions that span a single record of 10MB. The total db size afterwards is 5GB. I also tested with records of 1MB.
I noticed a very odd thing: when using the MDB_WRITEMAP option, memory usage grows very quickly and linear with the amount of data stored into the database. (memory usage ends up a bit higher than 5GB). when not using MDB_WRITEMAP, however, memory usage stays very low. Does anyone have a suggestion what might be wrong and what causes such different behaviour with and without using the memorymap option?
FYI: - there are no long lived reads so that should not be the problem. - I am using the newest git version: ec97f49a6552a8ed599472d665ec7c16463b808c
Regards, Luc Vlaming
Luc Vlaming wrote:
Hi,
Currently I am creating support for using LMDB as a new storage backend for one of our products. At the moment I am testing import bulk data into lmdb using transactions that span a single record of 10MB. The total db size afterwards is 5GB. I also tested with records of 1MB.
I noticed a very odd thing: when using the MDB_WRITEMAP option, memory usage grows very quickly and linear with the amount of data stored into the database. (memory usage ends up a bit higher than 5GB). when not using MDB_WRITEMAP, however, memory usage stays very low. Does anyone have a suggestion what might be wrong and what causes such different behaviour with and without using the memorymap option?
There is nothing wrong. It is simply writing to the shared memory map.
Hi,
If it was simply writing to the memory map, shouldn't memory usage decrease as soon as everything is written? The memory usage continuous to be high for as long as the database is open, even if the program just waits afterwards. Is that to be expected as well? Because that would mean that the process would simply run out of memory if more data is writting than the machine has as ram.
Regards, Luc
Met vriendelijke groeten, Luc Vlaming KXA Software Innovations
voorheen Dysi Software Innovations
bezoekadres: Hoendiep Noordzijde 21 9843TG, Grijpskerk
Luc Vlaming tel: 06 16 353 426 email: vlaming@softwareinnovations.nl url: www.softwareinnovations.nl
On Wed, Jan 15, 2014 at 11:10 PM, Howard Chu hyc@symas.com wrote:
Luc Vlaming wrote:
Hi,
Currently I am creating support for using LMDB as a new storage backend for one of our products. At the moment I am testing import bulk data into lmdb using transactions that span a single record of 10MB. The total db size afterwards is 5GB. I also tested with records of 1MB.
I noticed a very odd thing: when using the MDB_WRITEMAP option, memory usage grows very quickly and linear with the amount of data stored into the database. (memory usage ends up a bit higher than 5GB). when not using MDB_WRITEMAP, however, memory usage stays very low. Does anyone have a suggestion what might be wrong and what causes such different behaviour with and without using the memorymap option?
There is nothing wrong. It is simply writing to the shared memory map.
-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Luc Vlaming wrote:
Hi,
If it was simply writing to the memory map, shouldn't memory usage decrease as soon as everything is written?
Why? Sounds to me like you don't understand how demand paging works in a virtual memory system. The key rule here - if there is no demand, then there is no paging.
The memory usage continuous to be high for as long as the database is open, even if the program just waits afterwards. Is that to be expected as well? Because that would mean that the process would simply run out of memory if more data is writting than the machine has as ram.
That's not how virtual memory works. Go read up on it.
Regards, Luc
Met vriendelijke groeten, Luc Vlaming KXA Software Innovations
voorheen Dysi Software Innovations
bezoekadres: Hoendiep Noordzijde 21 9843TG, Grijpskerk
Luc Vlaming tel: 06 16 353 426 email: vlaming@softwareinnovations.nl mailto:vlaming@softwareinnovations.nl url: www.softwareinnovations.nl http://www.softwareinnovations.nl/
On Wed, Jan 15, 2014 at 11:10 PM, Howard Chu <hyc@symas.com mailto:hyc@symas.com> wrote:
Luc Vlaming wrote: Hi, Currently I am creating support for using LMDB as a new storage backend for one of our products. At the moment I am testing import bulk data into lmdb using transactions that span a single record of 10MB. The total db size afterwards is 5GB. I also tested with records of 1MB. I noticed a very odd thing: when using the MDB_WRITEMAP option, memory usage grows very quickly and linear with the amount of data stored into the database. (memory usage ends up a bit higher than 5GB). when not using MDB_WRITEMAP, however, memory usage stays very low. Does anyone have a suggestion what might be wrong and what causes such different behaviour with and without using the memorymap option? There is nothing wrong. It is simply writing to the shared memory map. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/__project/ <http://www.openldap.org/project/>
Howard Chu hyc@symas.com schrieb am 15.01.2014 um 23:10 in Nachricht
Luc Vlaming wrote:
Hi,
Currently I am creating support for using LMDB as a new storage backend for one of our products. At the moment I am testing import bulk data into lmdb using transactions
that
span a single record of 10MB. The total db size afterwards is 5GB. I also tested with records of 1MB.
I noticed a very odd thing: when using the MDB_WRITEMAP option, memory usage grows very quickly and linear with the amount of data stored into the database. (memory usage ends up a bit higher than 5GB). when not using
Maybe for the future make a difference between virtual memory usage and real (resident) memory usage. Especially for Linux this makes a big difference, because a malloc(1GB) actually does not consume any memory until it is actually used.
There's also the "pmap" Utility that can show the detailed difference. For example my small (bdb) slapd has: # pmap 3668 3668: slapd START SIZE RSS PSS DIRTY SWAP PERM MAPPING [...] 00007f601a7f4000 8192K 120K 120K 120K 0K rw-p [anon] [...] 00007f603db4c000 18320K 184K 184K 84K 0K rw-s /var/lib/ldap/__db.003 [...] Total: 808004K 29768K 28657K 27016K 32040K
So of 800MB virtual memory there is only 30MB actually in use...
MDB_WRITEMAP, however, memory usage stays very low. Does anyone have a suggestion what might be wrong and what causes such different behaviour with and without using the memorymap option?
There is nothing wrong. It is simply writing to the shared memory map.
Off-topic: I can remember a statement of the late 80ies where a programmer claimed the 32-bit address space is so large that one does not have to care about garbage collection in virtual address space; just use new addresses. I think even with 64 bit one should always try not to waste address space.
Regards, Ulrich
-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Hi,
First of all, sorry for not being clear. I meant resident memory usage when mentioning the 5GB "memory usage". For clarity here are the pmap outputs
pmap with memorymap:
Address Kbytes RSS Dirty Mode Mapping ..... total kB 52687820 5153964 16992
pmap without memorymap Address Kbytes RSS Dirty Mode Mapping ..... total kB 52708320 52456 37488
This seems to suggest to me that with the memorymap option on, the written data stays in the resident set i.e. keeps "being used" (as far as i understand). Thanks for the help so far ;) I'd really like to know what causes this difference.
FYI: I simply recompiled a small test program of mine with and without the flag and then ran it twice, so there is nothing different between the runs other than the database flags.
Regards, Luc Vlaming
On Thu, Jan 16, 2014 at 8:37 AM, Ulrich Windl < Ulrich.Windl@rz.uni-regensburg.de> wrote:
Howard Chu hyc@symas.com schrieb am 15.01.2014 um 23:10 in Nachricht
Luc Vlaming wrote:
Hi,
Currently I am creating support for using LMDB as a new storage backend
for
one of our products. At the moment I am testing import bulk data into lmdb using transactions
that
span a single record of 10MB. The total db size afterwards is 5GB. I
also
tested with records of 1MB.
I noticed a very odd thing: when using the MDB_WRITEMAP option, memory
usage
grows very quickly and linear with the amount of data stored into the database. (memory usage ends up a bit higher than 5GB). when not using
Maybe for the future make a difference between virtual memory usage and real (resident) memory usage. Especially for Linux this makes a big difference, because a malloc(1GB) actually does not consume any memory until it is actually used.
There's also the "pmap" Utility that can show the detailed difference. For example my small (bdb) slapd has: # pmap 3668 3668: slapd START SIZE RSS PSS DIRTY SWAP PERM MAPPING [...] 00007f601a7f4000 8192K 120K 120K 120K 0K rw-p [anon] [...] 00007f603db4c000 18320K 184K 184K 84K 0K rw-s /var/lib/ldap/__db.003 [...] Total: 808004K 29768K 28657K 27016K 32040K
So of 800MB virtual memory there is only 30MB actually in use...
MDB_WRITEMAP, however, memory usage stays very low. Does anyone have a suggestion what might be wrong and what causes such different behaviour
with
and without using the memorymap option?
There is nothing wrong. It is simply writing to the shared memory map.
Off-topic: I can remember a statement of the late 80ies where a programmer claimed the 32-bit address space is so large that one does not have to care about garbage collection in virtual address space; just use new addresses. I think even with 64 bit one should always try not to waste address space.
Regards, Ulrich
-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Ulrich Windl wrote:
Off-topic: I can remember a statement of the late 80ies where a programmer claimed the 32-bit address space is so large that one does not have to care about garbage collection in virtual address space; just use new addresses. I think even with 64 bit one should always try not to waste address space.
Wow, good thing you're here to remind us of these little details, I would never have thought of that! What on earth will we do now?
Oh wait, right in the LMDB design paper, there's a discussion of garbage collection. Someone must have thought of it already, after all. You're the worst kind of fool, one who believes he knows more than he actually does, in a world where it's trivially easy to acquire the actual facts.
Howard Chu hyc@symas.com schrieb am 16.01.2014 um 11:03 in Nachricht
Ulrich Windl wrote:
Off-topic: I can remember a statement of the late 80ies where a programmer claimed the 32-bit address space is so large that one does not have to care about garbage collection in virtual address space; just use new addresses. I think even with 64 bit one should always try not to waste address space.
Wow, good thing you're here to remind us of these little details, I would never have thought of that! What on earth will we do now?
Oh wait, right in the LMDB design paper, there's a discussion of garbage collection. Someone must have thought of it already, after all. You're the worst kind of fool, one who believes he knows more than he actually does, in a world where it's trivially easy to acquire the actual facts.
Thanks!
-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Hi,
Thanks for the explanations....
Regards, Luc
On Thu, Jan 16, 2014 at 12:24 PM, Ulrich Windl < Ulrich.Windl@rz.uni-regensburg.de> wrote:
Howard Chu hyc@symas.com schrieb am 16.01.2014 um 11:03 in Nachricht
Ulrich Windl wrote:
Off-topic: I can remember a statement of the late 80ies where a
programmer
claimed the 32-bit address space is so large that one does not have to
care
about garbage collection in virtual address space; just use new
addresses.
I think even with 64 bit one should always try not to waste address
space.
Wow, good thing you're here to remind us of these little details, I would never have thought of that! What on earth will we do now?
Oh wait, right in the LMDB design paper, there's a discussion of garbage collection. Someone must have thought of it already, after all. You're
the
worst kind of fool, one who believes he knows more than he actually
does, in
a world where it's trivially easy to acquire the actual facts.
Thanks!
-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
openldap-technical@openldap.org