I need to better understand the best way to configure the various environment flags related to sync and map.
When we converted from BDB to LMDB, we instinctively kept #MDB_NOSYNC true. This proved to make the database subject to corruption if it was killed or stopped too harshly. It also had the effect on Windows at least to make Windows gradually use up all available RAM for the memory mapped file and to bring the server to its knees.
After carefully reading the documentation about leaving "the system with no hint for when to write transactions to disk" when MDB_NOSYNC was turned on, we completely went without setting any options. The results were that we can't corrupt the database, but the speed degradation makes this approach unusable, as we are very heavily write focused (at least in this part of our process, where we perform analysis and load the database with 100's millions of records).
In looking at the documentation I'm tempted to try MDB_WRITEMAP with MDB_MAPASYNC but do I then need to also issue some manual mdb_env_sync and if so at what frequency and what should trigger this?
What are best practices combinations of those flags here. We absolutely can't afford database corruption, but we can deal with one (or maybe more with some re-design) transactions that are lost.
Please guide us.
Thanks Alain
Alain wrote:
I need to better understand the best way to configure the various environment flags related to sync and map.
When we converted from BDB to LMDB, we instinctively kept #MDB_NOSYNC true. This proved to make the database subject to corruption if it was killed or stopped too harshly. It also had the effect on Windows at least to make Windows gradually use up all available RAM for the memory mapped file and to bring the server to its knees.
Windows as a server platform is a total joke anyway; and the Windows memory manager is still pure junk.
After carefully reading the documentation about leaving "the system with no hint for when to write transactions to disk" when MDB_NOSYNC was turned on, we completely went without setting any options. The results were that we can't corrupt the database, but the speed degradation makes this approach unusable, as we are very heavily write focused (at least in this part of our process, where we perform analysis and load the database with 100's millions of records).
Sounds like you need to separate the bulk loading phase from subsequent processing.
In looking at the documentation I'm tempted to try MDB_WRITEMAP with MDB_MAPASYNC but do I then need to also issue some manual mdb_env_sync and if so at what frequency and what should trigger this?
On Linux MAPASYNC is a no-op, the memory manager already knows that the pages are dirty and will automatically flush them itself anyway (at some unpredictable point in the future).
I have no recommendations for Windows, other than not to use it.
As for manual sync frequency - that's up to you to test and use a value that you can live with. You might try a simple counter and sync once per 100 commits, etc.
What are best practices combinations of those flags here. We absolutely can't afford database corruption, but we can deal with one (or maybe more with some re-design) transactions that are lost.
That's what NOMETASYNC is for.
Please guide us.
Thanks Alain
openldap-technical@openldap.org