A suggestion was made to use a read/write mmap (as an option), to allow writes to be performed with no syscall overhead. I'm thinking that might be ok as a completely separate version of the library, because a fair bit of the code would need to change to accommodate that update style, and it would push the library over the 32K boundary.
Also this isn't as cool a suggestion as it sounds - it completely gives up MDB's current immunity to corruption, and in fact makes reliability much less stable. When you write through an mmap, you have absolutely no idea when the OS is going to get around to flushing the data back to disk. You have no idea what order the flushes will occur in. You can force the OS's hand, by calling msync on every page you want to flush, in the order you want them flushed, but then you'll just get back to having syscall overhead again, and by calling msync in a particular order, you defeat the underlying filesystem's ability to schedule the writes for optimum seeks.
Currently, by using writev, we can push a lot of data to the OS, and then when we call fdatasync() at the end, the OS schedules those writes as it sees fit. Right now the only ordering dependency MDB has is that all of the data pages must be flushed successfully before flushing the meta page, so we can afford to let the OS schedule all of the data page writes, and then do an explicitly synchronous write of the meta page.
So, with a writable mmap, we're stuck with the choice of either (a) not knowing at all whether our data has been flushed, or (b) being forced to explicitly flush every page ourselves, in a predetermined order which we have no way of knowing whether or not it's optimal for the current disk layout.
It seems to me this can only be a viable mode of operation if you're always going to run asynch and don't care much about transaction durability or DB recoverability. Running in this mode offers absolutely zero crash resistance; the entire DB will almost always be irreparably damaged after a system crash.
Would you run like that, if it offered you the potential of maybe 10x faster write performance? (It could be useful for slapadd -q, certainly.)