On Sun, May 18, 2014 at 12:15:45PM -0700, Howard Chu wrote:
Luke Kenneth Casson Leighton wrote:
We fell for the fantasy of parallel writes with BerkeleyDB, but after a dozen+ years of poking, profiling, and benchmarking, it all becomes clear - all of that locking overhead+deadlock detection/recovery is just a waste of resources.
... which is why tdb went to the other extreme, to show it could be done.
But even tdb only allows one write transaction at a time. I looked into writing a back-tdb for OpenLDAP back in 2009, before I started writing LMDB. I know pretty well how tdb works...
Okay, transactioned, safe writes are slow. True. But the non-transactioned ones have significantly improved in the very recent past. We get a lot by mutexes (we had to find out how badly the linux fcntl locks really suck ...), and also by spreading the load from the freelist to the dead records in neighboring hash chains. I don't have any microbenchmarks, but larger-scale benchmark benefit a lot from those two.
I would like to give lmdb a try in Samba, really. I see that for 32-bit systems we will probably still need tdb for the future (pread/pwrite in lmdb anyone in the meantime? :-)). The other blocker when I last took a serious look is that crashed processes can have harmful effects. Has this changed in the meantime with automatic cleanup and/or robust mutexes? I know those might be a bit slower, but I would love to offer our users the choice at least.
Volker